exploratory analysis feature selection and...

35
Feature Selection Clustering Conclusion Exploratory Analysis Feature Selection and Clustering Davide Bacciu Computational Intelligence & Machine Learning Group Dipartimento di Informatica Università di Pisa [email protected] Introduzione all’Intelligenza Artificiale - A.A. 2012/2013

Upload: others

Post on 16-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Exploratory AnalysisFeature Selection and Clustering

Davide Bacciu

Computational Intelligence & Machine Learning GroupDipartimento di Informatica

Università di [email protected]

Introduzione all’Intelligenza Artificiale - A.A. 2012/2013

Page 2: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Lecture Outline

1 Feature SelectionIntroductionObjective FunctionsSearch Strategies

2 ClusteringIntroductionk-means ClusteringAdvanced Clustering

3 Conclusion

Page 3: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Exploratory Data Analysis

Discover structure in dataFind unknown patterns in the data that cannot be predictedusing current expert knowledgeFormulate new hypotheses about the causes of theobserved phenomena

Finding informative attributesFeature ExtractionFeature Selection

Finding natural groupsClustering

Page 4: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Feature Selection Vs Feature Extraction

Two approaches to dimensionality reductionFeature Extraction - Create a new, lower dimensional,representation of some input data by transforming theexisting features with a given functionFeature Selection - Select a subset of the existing featureswithout transforming the input data

Feature extraction generates new features by optimizingSignal representation (PCA)Signal classification (LDA)

Feature selection looks at dimensionality reduction from adifferent perspective

Different methodologies (e.g. computational costs, . . . )Different applications

Page 5: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Why Feature Selection?

Straightforward answer is: for several good reasonsdiscussed previously for feature extraction

Reduce problem complexityReduce noiseFind good data representations

So, why do not always use feature extraction?It can be computationally expensive to generate newfeaturesDon’t want to transform data canceling its semanticsData is not always numericData may fail to meet the theoretical assumptionsunderlying feature extractionIt allows to explicitly select good predictors, i.e. featuresthat behave well on a specific supervised task

Page 6: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

What About Projection?

Feature selection is rarely used for visualization

Nevertheless, it actually implements a projection

It is equivalent to projecting data onto lower-dimensional linearsubspace perpendicular to the removed feature

Page 7: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Some Use Cases

Case I - Lung cancer

Features are biomedical information or aspects of apatient’s medical historyWhich features best predict whether lung cancer willdevelop?

Case II -Image Understanding

Samples are images and features are pixelsWhich regions of an image is more likely to provideuseful/discriminative information?

We seek a compact data representation that is interpretableand that tells us which features are relevant

Page 8: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Feature Selection

Definition - A process that chooses a D′-dimensionalsubset of all the features according to an objective function

x =

x1x2. . .

xD

select i1,...,iD′−−−−−−−−→ y =

xi1xi2. . .xiD′

Three characterizing aspects

Subset⇒ no creation of new featuresObjective function⇒ measure of subset optimalityProcess⇒ need a subset search strategy

Page 9: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Search Strategy and Objective Function

Feature selection requiresA search strategy to selectcandidate subsetsAn objective function to evaluatethese candidates

Objective functionEvaluates candidate subsets andreturns a measure of theiroptimality that is used to selectnew candidates

Search StrategyExhaustive evaluation of all featuresubsets from N samples is O(2D)Need a smart strategy to direct thesubset selection process

Page 10: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Objective Functions - Two Approaches

Filters - Evaluate feature subsets by their information content,e.g. interclass distance, statistical dependence orinformation-theoretic measuresWrappers - Use a pattern classifier which evaluates featuresubsets by their predictive accuracy (e.g. recognition rate onvalidation data) using re-sampling or cross-validation

Page 11: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Filter Approach

Basic IdeaAssign an heuristic score to each feature to filter out theuseless ones

Separate feature selection phase from patternrecognition/classifier learningEvaluate features by measuring their informative content

Does the individual feature seems to help prediction?Is it redundant?Is it reliable?

The scoring metric can be unsupervised or exploitsupervised information

Page 12: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Measuring Feature Relevance

Distance metricsMeasure separability with respect to some target attribute(e.g. a class)Select those features which have the largest separability

Correlation metricsGood features are highly correlated with the class but areuncorrelated with each other

Information-theoretic measuresMeasure the reduction in uncertainty (entropy) between atarget variable (e.g. the class) and the featureE.g. Mutual information

Consistency measuresFind a minimum number of features that separate classesas consistently as the full setAn inconsistency are two samples from different classeshaving the same feature values

Page 13: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Wrapper Approach

Basic IdeaSelect those features that make my supervised learning modelperform best

Feature selection relies on the preselected supervisedlearning modelThe learning model is re-trained each time a new subset isselected

Quality of the feature subset is measured based on theempirical error of the learned modelNeed to use robust validation strategies to ensuregeneralization

Selected features are, tipically, effective class predictors forthe model

Page 14: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Filters vs. Wrappers (I)

Filter ApproachesPro

Fast execution - Non-iterative dataset processing which canexecute much faster than a classifier trainingGenerality - Evaluate intrinsic properties of the data, ratherthan interactions with a particular classifier, hence solutionswill be good for a larger family of classifiers

ConsSelect large subsets - Since the filter objective functions aregenerally monotonic, the filter tends to select the full featureset as the optimal solution. This forces the user to select anarbitrary cutoff on the number of features to be selected

Page 15: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Filters vs. Wrappers (II)

Wrapper ApproachesPro

Accuracy - Achieve better recognition rates since selectedfeature are tuned to the classifierTest generalization - Use cross-validation measures ofpredictive accuracy to avoid overfitting

ConsSlow - Must train a classifier for each feature subsetLack of generality - Solutions are tied to the bias of theclassifier used in the evaluation function

Page 16: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Characterization of a Search Routine

Search starting pointEmpty setFull setRandom subset

Search directionsSequential selection/eliminationRandom generation

Perform randomized exploration of the search space wherenext direction is sample from a given probabilityE.g. genetic algorithms, simulated annealing, . . .

Search StrategyExhaustive/complete searchHeuristic searchNondeterministic search

Page 17: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Sequential Feature Ranking

The simplest strategyWeight and rank eachfeatureSelect top-K ranked featuresNo need of subset search

AdvantagesO(D) complexityEasy to implement

DisadvantagesHow to determine top-K thethreshold?Does not consider featurecorrelation

Page 18: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Sequential Search (I)

Greedy search algorithms adding or removing single featuressequentially

Sequential Forward SelectionStarting from the empty set, sequentiallyadd the feature xd that results in thehighest objective function J(XD′ ∪ {xd})when combined with the features XD′ thathave already been selectedPerforms best when the optimal subsethas a small number of features

Page 19: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Sequential Search (II)

Sequential Backward EliminationStarting from the full set, sequentiallyremove the feature xd ∈ XD′ that results inthe smallest decrease of J(XD′ \ {xd})Performs best when the optimal subsethas a large number of features

Bi-directional SearchForward and backward searches areperformed in parallelThey are forced to converge to the samesolution by ensuring that

Features selected in forward pass are notremoved by backward searchFeatures removed in backward searchare not selected in the forward pass

Page 20: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

IntroductionObjective FunctionsSearch Strategies

Search Strategies Cookbook

Page 21: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

Unsupervised Feature Selection

So far feature selection approaches seem to exploit onlyrelevance measures relying on supervised classinformationIs there any unsupervised feature selection approach? Theanswer is... yes (surprised?)Feature redundancy and relevance is measured withrespect to clusters instead of classes

Definition (Clustering)

The process of organizing objects into natural groups whosemembers are similar in way determined by a given metrics

Page 22: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

Clustering = Seeking a natural grouping

What is a natural grouping?

Family School Females Males

Credit goes to Eamonn Keogh @ UCR

Page 23: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

The Clustering Problem (I)

Clustering objectiveMaximizing intra-cluster similarityMinimizing inter-cluster similarity

Prototype-based clusteringFind a representative ci(prototype) for each cluster iMinimizing the distance w.r.tthe cluster members

Results depend on the choiceof the distance metric

Euclidean ‖ci − xn‖2Mahalanobis(ci − xn)

T S−1(ci − xn). . .

Page 24: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

The Clustering Problem (II)

It is not always easy to define what is a cluster and what is not

Hierarchical ClusteringPrototype-basedFinds a hierarchy of nested clusters

Useful to convey a multi-resolution view of dataBio-medical dataClusters⇒ tumorsSub-clusters⇒ tumor subtypes

Page 25: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

The Clustering Problem (II)

As we have seen with the Swiss roll in feature extraction, somedata might lay on particularly nasty surfaces

Prototype-based clustering may not be effective withnon-linearly separable clusters

Page 26: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

K-means Clustering

The simplest clustering algorithmLots of limitationsStill widely used (easy to understand and implement)

The algorithm in briefReceives as input k , i.e. the number of clusters to seek inthe dataStarts by picking k points at random as cluster prototypes(centroids)Assigns each sample to the nearest prototype andrecomputes the cluster centroids

Page 27: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

The Algorithm

Algorithm k-means(X ,K )N ← |X |;for all i = 1 to K do

c[i]← rand();end forrepeat {Update prototypes}

for all n = 1 to N dowinner← arg min

i=1,...,K‖c[i]− xn‖2

tot [winner]← tot [winner] + 1cnew [winner]← cnew [winner] + xn

end forfor all i = 1 to K do

c[i]← cnew [i]tot [i]

;

end foruntil MaxIterations or ‖c − cnew‖ < εreturn Cluster prototypes c

Page 28: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

k-Means Example (I)

Input data X

Page 29: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

k-Means Example (II)

Initialize K = 3 prototypes

Page 30: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

k-Means Example (III)

Assign samples to the nearest prototype/cluster

Page 31: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

k-Means Example (IV)

Compute the updated prototypes c′i

Page 32: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

k-Means Example (V)

Update cluster assignment

Page 33: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

Cluster number estimate

In practice, we seldom know the cluster number KTrial-and-error to find the most suitable KUse algorithm that estimate the cluster number

Several approaches but no killer algorithmAgglomerative Clustering - Start with all samples being aseparate cluster and iteratively aggregate existing clusteruntil an error criterion is satisfiedPartitional Clustering - Start with a single cluster and keepsplitting it until an error criterion is satisfied

Error is often composed ofA similarity term favoring aggregation of samples intoclustersA penalization term discouraging the creation of too manyclusters

Page 34: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Introductionk-means ClusteringAdvanced Clustering

Clustering and Feature Selection

Cluster labels can be used for performing feature selectionwhen class information is not available

Filter approachStep 1 Partition the data using a clustering algorithmStep 2 Run a filter model evaluating feature

relevance w.r.t. how much an attribute helpsin separating clusters

Wrapper approachStep 1 Select a set of featuresStep 2 Evaluate if they optimize the error criterion set

by the clustering algorithm

Feature selection can be used to help clustering algorithms!

Page 35: Exploratory Analysis Feature Selection and Clusteringpages.di.unipi.it/.../2016/04/ia-lect3-exploratory-hand.pdfClustering Conclusion Exploratory Analysis Feature Selection and Clustering

Feature SelectionClustering

Conclusion

Take-home Messages

Feature selectionExtracts a subset of informative input featuresPreserves the data semanticsFaster than feature extraction

Two main strategiesFilters - Separate feature selection from pattern recognitionWrappers - Select features that work best with a givenlearning model

ClusteringFinding natural groups in the dataCluster number estimate⇒ no definitive answerCan be used to support feature selection when classinformation is not available