advisor : dr. hsu graduate : ching-lung chen author : pabitra mitra student member

41
Intelligent Database Systems Lab Advisor Dr. Hsu Graduate Ching-Lung Ch en Author Pabitra Mitr a Student Member 國國國國國國國國 National Yunlin University of Science and Technology Unsupervised Feature Selection Using Feature Similarity

Upload: kaleb

Post on 25-Jan-2016

71 views

Category:

Documents


2 download

DESCRIPTION

國立雲林科技大學 National Yunlin University of Science and Technology. Unsupervised Feature Selection Using Feature Similarity. Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member. Outline. N.Y.U.S.T. I.M. Motivation Objective Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Advisor : Dr. Hsu

Graduate: Ching-Lung Chen

Author : Pabitra Mitra

Student Member

國立雲林科技大學National Yunlin University of Science and Technology

Unsupervised Feature Selection Using Feature Similarity

Page 2: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Outline

Motivation Objective Introduction Feature Similarity Measure Feature Selection method Feature Evaluation indices Experimental Results and Comparisons Conclusions Personal Opinion Review

N.Y.U.S.T.

I.M.

Page 3: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Motivation

Conventional method of feature selection have high-computational complexity problem in both dimension and size.

N.Y.U.S.T.

I.M.

Page 4: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Objective

Propose an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size.

N.Y.U.S.T.

I.M.

Page 5: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Introduction 1/3

The sequential floating searches provide better results, though at the cost of a higher computational complexity.

Broadly classified existing methods into two categories: Maximization of clustering performance

Sequential unsupervised feature selection、 maximum entropy、neuro-fuzzy approach…

Based on feature dependency and relevance Correlation coefficients、 measures of statistical redundancy、 li

near dependence

N.Y.U.S.T.

I.M.

Page 6: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Introduction 2/3

We propose an unsupervised algorithm which uses feature dependency/similarity for redundancy reduction, but requiring no search.

A new similarity measure call maximal information compression index, is used in clustering. Its comparison with correlation coefficient and least-square regression error is made.

N.Y.U.S.T.

I.M.

Page 7: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Introduction 3/3

The proposed algorithm is geared toward to two goals: Minimizing the information loss. Minimizing the redundancy present in the reduced feature

subset.

The feature selection algorithm unlike most conventional algorithms, search for best subset, its can be computed in much less time compared to many indices used in other supervised and unsupervised feature selection method.

N.Y.U.S.T.

I.M.

Page 8: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

There are two approaches for measuring similarity between two random variables:

1. To nonparametrically test the closeness of probability distributions of the variables.

2. To measure the amount of functional dependency between the variables.

We discuss below two existing linear dependency measures:1. Correlation Coefficient

2. Least Square Regression Error(e)

N.Y.U.S.T.

I.M.

Page 9: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

Correlation Coefficient ( )

var() the variance of a variable

cov() the covariance between two variables.

1. .

2. if x and y are linearly related.

3. (symmetric).

4. if and for some constants a,b,c,d,then the measure is invariant to scaling and translation of the variables

5. the measure is sensitive to rotation of the scatter diagram in (x,y) plane

N.Y.U.S.T.

I.M.

1|),(|10 yx 0|),(|1 yx

|),(|1|),(|1 xyyx

c

axu

d

byv

|),(|1|),(|1 vuyx

Page 10: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

Least Square Regression Error (e)

the error predicting y from the linear model y = a + bx. a and b are the regression coefficients obtained by minimizing the mean square error.

The coefficients are given by , and and the mean square error e(x,y) is given by

N.Y.U.S.T.

I.M.

ya )var(),(

cov

xyxb

)),(1)(var(),( 2yxyyxe

Page 11: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

Least Square Regression Error (e)

1. .

2. e(x,y)=0 if x and y are linearly related

3. (unsymmetric).

4. if u=x/c and v = y/d for some constant a,b,c,d, then e(x,y)=d2e(u,v). the measure e is sensitive to scaling of the variables.

5. the measure e is sensitive to rotation of the scatter diagram in x-y plane.

N.Y.U.S.T.

I.M.

)var(),(0 yyxe

),(),( xyeyxe

Page 12: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

maximal information compression index ( )Let be the covariance matrix of random variables x and y. Define maximal information compression index as smallest eigenvalue of

=0 when the features are linearly dependent and increases as the amount of dependency decreases

N.Y.U.S.T.

I.M.

2

),(2 yx

)),(1)(var()var(4))var()((var()var()(var(),(2 222 yxyxyxyxyx

2

Page 13: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

The corresponding loss of information in reconstruction of the pattern is equal to the eigenvalue along the direction normal to the principal component.

hence, is the amount of reconstruction error committed if the data is projected to a reduced dimension in the best possible way.

there fore , it’s a measure of the minimum amount of information loss or maximum amount of information compression.

N.Y.U.S.T.

I.M.

2

Page 14: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

the significance of can also be explained geometrically in terms of linear regression.

the value of is equal to the sum of the squares of the perpendicular distance of the points (x,y) to the best fit line

The coefficients of such a best fit line are given by and where

N.Y.U.S.T.

I.M.

2

2xbay ˆˆ

yxa cotˆcotˆ b

Page 15: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity Measure

has the following properties:

1. .

2. .

3. .

4. .

5. .

N.Y.U.S.T.

I.M.

2

Page 16: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Similarity MeasureN.Y.U.S.T.

I.M.

Page 17: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Selection method

The task of feature selection involves two step:1. partition the original feature set into a number of homogeneous

subsets (clusters)

2. selecting a representative feature from each such cluster

The partition of the features is based on K-NN principle

1. compute the k nearest features of each feature.

2. among them the feature having the most compact subset is selected, and its k neighboring features are discarded.

3. the process is repeated for the remaining features until all of them are either selected or discarded

N.Y.U.S.T.

I.M.

Page 18: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Selection method

Determining the k nearest-neighbors of features, we assign a constant error threshold ( ) which is set equal to the distance of the kth nearest-neighbor of the feature select in first iteration.

if greater than , then we decrease the value of k.

N.Y.U.S.T.

I.M.

2

Page 19: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Selection method

D : original number of features the original feature set be O={Fi, i=1,…,D}

the dissimilarity between features Fi and Fj represent by S(Fi,Fj). Let represent the dissimilarity between feature Fi and its kth near

est-neighbor feature in R.

N.Y.U.S.T.

I.M.

rk

i

Page 20: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Selection methodN.Y.U.S.T.

I.M.

Page 21: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Selection method

with respect to the dimension (D), the method has complexity O(D2)

evaluation of the similarity measure for a feature pair is of complexity O(l), thus, the feature selection scheme has overall complexity O(D2l)

k acts as a scale parameter which controls the degree of details in a more direct manner.

this algorithm is nonmetric nature of similarity measure.

N.Y.U.S.T.

I.M.

Page 22: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

Now se describe some indices below:

need class information1. class seperability

2. K-NN classification accuracy

3. naïve Bayes classification accuracy

do not need class information1. entropy

2. fuzzy feature evaluation index

3. representation entropy

N.Y.U.S.T.

I.M.

Page 23: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

Class Separability

Sw is the within class scatter matrix

Sb is the between class scatter matrix.

is the a priori probability that a pattern belongs to class wj.

is he sample mean vector of class wj.

N.Y.U.S.T.

I.M.

)(1

SS wbtraceS

jj

Page 24: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

K-NN Classification Accuracy

use the K-NN rule for evaluating the effectiveness of the reduced set for classification.

we randomly select 10% of data as training set and classify the remaining 90% point.

Ten such independent runs are performed and average accuracy on test set.

N.Y.U.S.T.

I.M.

Page 25: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

Naïve Bayes Classification Accuracy

Used Bayes maximum likelihood classifier ,assuming normal distribution of classes to evaluating the classification performance.

Mean and covariance of the classes are estimated from a randomly selected 10% training sample and the remaining 90% used as test set.

N.Y.U.S.T.

I.M.

Page 26: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

Entropy

xp,j denotes feature value for p along jth direction.

similarity between p,q is given by is a positive constant, a possible value of is is the average distance between data points computed over the entire data set.

if the data is uniformly distributed in the feature space, entropy is maximum.

N.Y.U.S.T.

I.M.

pqDeqpsim ),(

D

5.0ln

D

Page 27: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

Fuzzy Feature Evaluation Index (FFEI)

are the degree that both patterns p and q belong to the same cluster in the feature spaces respectively

membership function may be defined as

the value of FFEI decreases as the intercluster distances increase.

N.Y.U.S.T.

I.M.

Page 28: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Feature Evaluation indices

Representation Entropy

let the eigenvalues of the d*d covariance matrix of a feature set of size d be

has similar properties like probability, and

this is equivalent to the amount of redundancy present in that particular representation of the data set.

N.Y.U.S.T.

I.M.

.,...,1, djj

j~ 1~

0 j

d

j j11

Page 29: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and Comparisons

Three categories of real-life public domain data sets are used: low-dimensional (D<=10) medium-dimensional (10<D<=100) high-dimensional (D>100)

Use nine UCI data set include :1. Isolet

2. Multiple Features

3. Arrhythmia

4. Spambase

5. Waveform

N.Y.U.S.T.

I.M.

6. Ionosphere7. Forest Cover Type8. Wisconsin Cancer9. Iris

Page 30: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and Comparisons

We use four indices to measure classification and clustering performance:

1. Branch and Bound Algorithm (BB)

2. Sequential Forward Search (SFS)

3. Sequential Floating Forward Search (SFFS)

4. Stepwise Clustering (SWC) * using correlation coefficient

in our experiments, we have mainly used entropy as the feature selection criterion with first three search algorithm.

N.Y.U.S.T.

I.M.

Page 31: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 32: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 33: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 34: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 35: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 36: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 37: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 38: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Experimental Results and ComparisonsN.Y.U.S.T.

I.M.

Page 39: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Conclusions An algorithm for unsupervised feature selection using feature similarity m

easures is described.

our algorithm is based on pairwise feature similarity measure , which are fast to compute. It unlike other approaches, which are based on optimizing either classification or clustering performance explicitly .

We have defined a feature similarity measure called maximal information compression index.

It also demonstrated through extensive experiments that representation entropy can be used as an index for quantifying both redundancy reduction and information loss in a feature selection method.

N.Y.U.S.T.

I.M.

Page 40: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Personal Opinion

We can learning this method to help our experimental of feature selection.

This similarity measure is valid only for numeric features, we can think about how to use in categorical.

N.Y.U.S.T.

I.M.

Page 41: Advisor  : Dr. Hsu Graduate : Ching-Lung Chen Author    : Pabitra Mitra         Student Member

Intelligent Database Systems Lab

Review

1. compute the k nearest features of each feature.

2. Among them the feature having the most compact subset is selected, and its k neighboring features are discarded.

3. repeated this process for the remaining feature until all of them are either selected or discarded.

N.Y.U.S.T.

I.M.