1 feature selection using mutual information syde 676 course project eric hui november 28, 2002

1

Feature SelectionFeature Selectionusing Mutual Informationusing Mutual Information

SYDE 676 Course ProjectSYDE 676 Course Project

Eric HuiEric Hui

November 28, 2002November 28, 2002

2

OutlineOutline

Introduction Introduction … prostate cancer project… prostate cancer project

Definition of ROI and FeaturesDefinition of ROI and Features Estimation of PDFs Estimation of PDFs … using Parzen Density … using Parzen Density

EstimationEstimation

Feature Selection Feature Selection … using MI Based Feature … using MI Based Feature SelectionSelection

Evaluation of Selection Evaluation of Selection … using Generalized … using Generalized DivergenceDivergence

ConclusionsConclusions

3

Ultrasound Image of Ultrasound Image of ProstateProstate

4

Prostate OutlineProstate Outline

5

““Guesstimated” Cancerous Guesstimated” Cancerous RegionRegion

6

Regions of Interest (ROI)Regions of Interest (ROI)

Cancerous ROIs

Benign ROIs

7

Features as Mapping Features as Mapping FunctionsFunctions

Cancerous ROIs

Benign ROIs

Mapping from Mapping from image space to image space to feature space…feature space…

X0

8

Parzen Density Parzen Density EstimationEstimation

Histogram BinsHistogram Bins bad estimation with bad estimation with

limited data limited data available!available!

Parzen Density Parzen Density Est.Est. reasonable reasonable

approximation with approximation with limited data.limited data.

X0

X0

X0

9

FeaturesFeatures

Gray-Level Gray-Level Difference Matrix Difference Matrix (GLDM)(GLDM) ContrastContrast MeanMean EntropyEntropy Inverse Difference Inverse Difference

Moment (IDM)Moment (IDM) Angular Second Angular Second

Moment (ASM)Moment (ASM)

Fractal DimensionFractal Dimension FDFD

Linearized PowerLinearized Power SpectrumSpectrum SlopeSlope Y-InterceptY-Intercept

10

P(X|C=Cancerous),P(X|C=Cancerous), P(X|P(X|C=Benign)C=Benign), and , and P(X)P(X)

11

Entropy and Mutual Entropy and Mutual InformationInformation

Mutual Information I(C;X) measures Mutual Information I(C;X) measures the degree of the degree of interdependenceinterdependence between X and C.between X and C.

Entropy H(C) measures the degree Entropy H(C) measures the degree of of uncertaintyuncertainty of C. of C.

I(X;C) = H(C) – H(C|X).I(X;C) = H(C) – H(C|X). I(X;C) ≤ H(C) is the upper bound.I(X;C) ≤ H(C) is the upper bound.

12

Results:Results:Mutual Information I(C;X)Mutual Information I(C;X)

FeatureFeature I(C;X)I(C;X) % of H(C)% of H(C)

GLDM ContrastGLDM Contrast 0.511520.51152 87%87%

GLDM MeanGLDM Mean 0.511520.51152 87%87%

GLDM EntropyGLDM Entropy 0.572650.57265 98%98%

GLDM IDMGLDM IDM 0.327400.32740 56%56%

GLDM ASMGLDM ASM 0.580690.58069 99%99%

FDFD 0.021270.02127 4%4%

PSD SlopePSD Slope 0.274260.27426 47%47%

PSD Y-intPSD Y-int 0.386220.38622 66%66%

13

Feature Images - GLDMFeature Images - GLDMContrast Mean Entropy

Inverse Difference Moment Angular Second Moment All features

14

Feature Images – Fractal Feature Images – Fractal Dim.Dim.

Fractal Dimension

15

Feature Images - PSDFeature Images - PSDLinearized PSD Slope (Horizontal) Linearized PSD y-intercept (Horizontal) Linearized PSD Slope (Vertical)

Linearized PSD y-intercept (Vertical) Linearized PSD Slope (Both) Linearized PSD y-intercept (Both)

All features

16

Interdependence between Interdependence between FeaturesFeatures

Expensive to compute all features.Expensive to compute all features. Some features might be similar to Some features might be similar to

each other.each other.

Thus, need to measure the Thus, need to measure the interdependence between features: interdependence between features: I(XI(Xii; X; Xjj))

17

Results:Results:Interdependence between Interdependence between

FeaturesFeatures

Contrast Mean Entropy IDM ASM FDPSD Slope

PSDY-int

Contrast n/a 0.1971 0.1973 0.8935 1.0261 0.0354 0.0988 1.1055

Mean 0.1971 n/a 0.1973 0.8935 1.0261 0.0354 0.0988 1.1055

Entropy 0.1973 0.1973 n/a 1.1012 1.5323 0.0335 0.0888 0.9615

IDM 0.8935 0.8935 1.1012 n/a 0.2046 0.2764 0.4227 0.1184

ASM 1.0261 1.0261 1.5323 0.2046 n/a 0.1353 0.4904 0.1355

FD 0.0354 0.0354 0.0335 0.2764 0.1353 n/a 0.0541 0.2753

PSD Slope 0.0988 0.0988 0.0888 0.4227 0.4904 0.0541 n/a 1.0338

PSD Y-int 1.1055 1.1055 0.9615 0.1184 0.1355 0.2753 1.0338 n/a

18

Mutual Information BasedMutual Information BasedFeature Selection (MIFS)Feature Selection (MIFS)

1.1. Select first feature with highest Select first feature with highest I(C;X).I(C;X).

2.2. Select next feature with highest:Select next feature with highest:

3.3. Repeat until a desired number of Repeat until a desired number of features are selected.features are selected.

SelectedS

SXIXCI );();(

19

Mutual Information BasedMutual Information BasedFeature Selection (MIFS)Feature Selection (MIFS)

This method takes into account both:This method takes into account both: the interdependence between the interdependence between classclass and and

featuresfeatures, and, and the interdependence between the interdependence between selected selected

featuresfeatures..

The parameter The parameter ββ controls the amount of controls the amount of interdependence between selected interdependence between selected features.features.

20

Varying Varying ββ in MIFS in MIFS

{X1, X2, X3,…, X8}

S = {X2, X3} S = {X2, X7} S = {X2, X4}

β = 0

β = 0.5

β = 1

21

Generalized Divergence JGeneralized Divergence J

If the features are “biased” towards If the features are “biased” towards a class, J is large.a class, J is large.

A good set of features should have A good set of features should have small J.small J.

),(

),(log),(),(

BenignXP

CancerousXPBenignXPCancerousXPEJ x

FXXXX ,...,, 21

22

Results:Results:J with respect to J with respect to ββ

First feature selected: GLDM ASMFirst feature selected: GLDM ASM Second feature selected: …Second feature selected: …

ββ FeatureFeature JJ

00 GLDM EntropyGLDM Entropy 0.65530.6553

0.50.5 PSD Y-intPSD Y-int 0.29700.2970

11 PSD Y-intPSD Y-int 0.29700.2970

23

ConclusionsConclusions

Mutual Info. Based Feature Mutual Info. Based Feature Selection (MIFS):Selection (MIFS):

Generalized Divergence:Generalized Divergence:

SelectedS

SXIXCI );();(

),(

),(log),(),(

BenignXP

CancerousXPBenignXPCancerousXPEJ x

X1

CX2

XN

maximizem

inim

ize

{X1, X2, X3,…, X8}

S = {X2, X3}S = {X2, X7} S = {X2, X4}

β = 0

β = 0.5

β = 1

24

Questions and CommentsQuestions and Comments

……

1 feature selection using mutual information syde 676 course project eric hui november 28, 2002

Documents