1 feature selection using mutual information syde 676 course project eric hui november 28, 2002

24
1 Feature Feature Selection Selection using Mutual Information using Mutual Information SYDE 676 Course Project SYDE 676 Course Project Eric Hui Eric Hui November 28, 2002 November 28, 2002

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

1

Feature SelectionFeature Selectionusing Mutual Informationusing Mutual Information

SYDE 676 Course ProjectSYDE 676 Course Project

Eric HuiEric Hui

November 28, 2002November 28, 2002

Page 2: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

2

OutlineOutline

Introduction Introduction … prostate cancer project… prostate cancer project

Definition of ROI and FeaturesDefinition of ROI and Features Estimation of PDFs Estimation of PDFs … using Parzen Density … using Parzen Density

EstimationEstimation

Feature Selection Feature Selection … using MI Based Feature … using MI Based Feature SelectionSelection

Evaluation of Selection Evaluation of Selection … using Generalized … using Generalized DivergenceDivergence

ConclusionsConclusions

Page 3: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

3

Ultrasound Image of Ultrasound Image of ProstateProstate

Page 4: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

4

Prostate OutlineProstate Outline

Page 5: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

5

““Guesstimated” Cancerous Guesstimated” Cancerous RegionRegion

Page 6: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

6

Regions of Interest (ROI)Regions of Interest (ROI)

Cancerous ROIs

Benign ROIs

Page 7: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

7

Features as Mapping Features as Mapping FunctionsFunctions

Cancerous ROIs

Benign ROIs

Mapping from Mapping from image space to image space to feature space…feature space…

X0

Page 8: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

8

Parzen Density Parzen Density EstimationEstimation

Histogram BinsHistogram Bins bad estimation with bad estimation with

limited data limited data available!available!

Parzen Density Parzen Density Est.Est. reasonable reasonable

approximation with approximation with limited data.limited data.

X0

X0

X0

Page 9: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

9

FeaturesFeatures

Gray-Level Gray-Level Difference Matrix Difference Matrix (GLDM)(GLDM) ContrastContrast MeanMean EntropyEntropy Inverse Difference Inverse Difference

Moment (IDM)Moment (IDM) Angular Second Angular Second

Moment (ASM)Moment (ASM)

Fractal DimensionFractal Dimension FDFD

Linearized PowerLinearized Power SpectrumSpectrum SlopeSlope Y-InterceptY-Intercept

Page 10: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

10

P(X|C=Cancerous),P(X|C=Cancerous), P(X|P(X|C=Benign)C=Benign), and , and P(X)P(X)

Page 11: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

11

Entropy and Mutual Entropy and Mutual InformationInformation

Mutual Information I(C;X) measures Mutual Information I(C;X) measures the degree of the degree of interdependenceinterdependence between X and C.between X and C.

Entropy H(C) measures the degree Entropy H(C) measures the degree of of uncertaintyuncertainty of C. of C.

I(X;C) = H(C) – H(C|X).I(X;C) = H(C) – H(C|X). I(X;C) ≤ H(C) is the upper bound.I(X;C) ≤ H(C) is the upper bound.

Page 12: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

12

Results:Results:Mutual Information I(C;X)Mutual Information I(C;X)

FeatureFeature I(C;X)I(C;X) % of H(C)% of H(C)

GLDM ContrastGLDM Contrast 0.511520.51152 87%87%

GLDM MeanGLDM Mean 0.511520.51152 87%87%

GLDM EntropyGLDM Entropy 0.572650.57265 98%98%

GLDM IDMGLDM IDM 0.327400.32740 56%56%

GLDM ASMGLDM ASM 0.580690.58069 99%99%

FDFD 0.021270.02127 4%4%

PSD SlopePSD Slope 0.274260.27426 47%47%

PSD Y-intPSD Y-int 0.386220.38622 66%66%

Page 13: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

13

Feature Images - GLDMFeature Images - GLDMContrast Mean Entropy

Inverse Difference Moment Angular Second Moment All features

Page 14: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

14

Feature Images – Fractal Feature Images – Fractal Dim.Dim.

Fractal Dimension

Page 15: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

15

Feature Images - PSDFeature Images - PSDLinearized PSD Slope (Horizontal) Linearized PSD y-intercept (Horizontal) Linearized PSD Slope (Vertical)

Linearized PSD y-intercept (Vertical) Linearized PSD Slope (Both) Linearized PSD y-intercept (Both)

All features

Page 16: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

16

Interdependence between Interdependence between FeaturesFeatures

Expensive to compute all features.Expensive to compute all features. Some features might be similar to Some features might be similar to

each other.each other.

Thus, need to measure the Thus, need to measure the interdependence between features: interdependence between features: I(XI(Xii; X; Xjj))

Page 17: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

17

Results:Results:Interdependence between Interdependence between

FeaturesFeatures

Contrast Mean Entropy IDM ASM FDPSD Slope

PSDY-int

Contrast n/a 0.1971 0.1973 0.8935 1.0261 0.0354 0.0988 1.1055

Mean 0.1971 n/a 0.1973 0.8935 1.0261 0.0354 0.0988 1.1055

Entropy 0.1973 0.1973 n/a 1.1012 1.5323 0.0335 0.0888 0.9615

IDM 0.8935 0.8935 1.1012 n/a 0.2046 0.2764 0.4227 0.1184

ASM 1.0261 1.0261 1.5323 0.2046 n/a 0.1353 0.4904 0.1355

FD 0.0354 0.0354 0.0335 0.2764 0.1353 n/a 0.0541 0.2753

PSD Slope 0.0988 0.0988 0.0888 0.4227 0.4904 0.0541 n/a 1.0338

PSD Y-int 1.1055 1.1055 0.9615 0.1184 0.1355 0.2753 1.0338 n/a

Page 18: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

18

Mutual Information BasedMutual Information BasedFeature Selection (MIFS)Feature Selection (MIFS)

1.1. Select first feature with highest Select first feature with highest I(C;X).I(C;X).

2.2. Select next feature with highest:Select next feature with highest:

3.3. Repeat until a desired number of Repeat until a desired number of features are selected.features are selected.

SelectedS

SXIXCI );();(

Page 19: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

19

Mutual Information BasedMutual Information BasedFeature Selection (MIFS)Feature Selection (MIFS)

This method takes into account both:This method takes into account both: the interdependence between the interdependence between classclass and and

featuresfeatures, and, and the interdependence between the interdependence between selected selected

featuresfeatures..

The parameter The parameter ββ controls the amount of controls the amount of interdependence between selected interdependence between selected features.features.

Page 20: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

20

Varying Varying ββ in MIFS in MIFS

{X1, X2, X3,…, X8}

S = {X2, X3} S = {X2, X7} S = {X2, X4}

β = 0

β = 0.5

β = 1

Page 21: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

21

Generalized Divergence JGeneralized Divergence J

If the features are “biased” towards If the features are “biased” towards a class, J is large.a class, J is large.

A good set of features should have A good set of features should have small J.small J.

),(

),(log),(),(

BenignXP

CancerousXPBenignXPCancerousXPEJ x

FXXXX ,...,, 21

Page 22: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

22

Results:Results:J with respect to J with respect to ββ

First feature selected: GLDM ASMFirst feature selected: GLDM ASM Second feature selected: …Second feature selected: …

ββ FeatureFeature JJ

00 GLDM EntropyGLDM Entropy 0.65530.6553

0.50.5 PSD Y-intPSD Y-int 0.29700.2970

11 PSD Y-intPSD Y-int 0.29700.2970

Page 23: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

23

ConclusionsConclusions

Mutual Info. Based Feature Mutual Info. Based Feature Selection (MIFS):Selection (MIFS):

Generalized Divergence:Generalized Divergence:

SelectedS

SXIXCI );();(

),(

),(log),(),(

BenignXP

CancerousXPBenignXPCancerousXPEJ x

X1

CX2

XN

maximizem

inim

ize

{X1, X2, X3,…, X8}

S = {X2, X3}S = {X2, X7} S = {X2, X4}

β = 0

β = 0.5

β = 1

Page 24: 1 Feature Selection using Mutual Information SYDE 676 Course Project Eric Hui November 28, 2002

24

Questions and CommentsQuestions and Comments

……