1 feature selection using mutual information syde 676 course project eric hui november 28, 2002
Post on 19-Dec-2015
217 views
TRANSCRIPT
1
Feature SelectionFeature Selectionusing Mutual Informationusing Mutual Information
SYDE 676 Course ProjectSYDE 676 Course Project
Eric HuiEric Hui
November 28, 2002November 28, 2002
2
OutlineOutline
Introduction Introduction … prostate cancer project… prostate cancer project
Definition of ROI and FeaturesDefinition of ROI and Features Estimation of PDFs Estimation of PDFs … using Parzen Density … using Parzen Density
EstimationEstimation
Feature Selection Feature Selection … using MI Based Feature … using MI Based Feature SelectionSelection
Evaluation of Selection Evaluation of Selection … using Generalized … using Generalized DivergenceDivergence
ConclusionsConclusions
3
Ultrasound Image of Ultrasound Image of ProstateProstate
4
Prostate OutlineProstate Outline
5
““Guesstimated” Cancerous Guesstimated” Cancerous RegionRegion
6
Regions of Interest (ROI)Regions of Interest (ROI)
Cancerous ROIs
Benign ROIs
7
Features as Mapping Features as Mapping FunctionsFunctions
Cancerous ROIs
Benign ROIs
Mapping from Mapping from image space to image space to feature space…feature space…
X0
8
Parzen Density Parzen Density EstimationEstimation
Histogram BinsHistogram Bins bad estimation with bad estimation with
limited data limited data available!available!
Parzen Density Parzen Density Est.Est. reasonable reasonable
approximation with approximation with limited data.limited data.
X0
X0
X0
9
FeaturesFeatures
Gray-Level Gray-Level Difference Matrix Difference Matrix (GLDM)(GLDM) ContrastContrast MeanMean EntropyEntropy Inverse Difference Inverse Difference
Moment (IDM)Moment (IDM) Angular Second Angular Second
Moment (ASM)Moment (ASM)
Fractal DimensionFractal Dimension FDFD
Linearized PowerLinearized Power SpectrumSpectrum SlopeSlope Y-InterceptY-Intercept
10
P(X|C=Cancerous),P(X|C=Cancerous), P(X|P(X|C=Benign)C=Benign), and , and P(X)P(X)
11
Entropy and Mutual Entropy and Mutual InformationInformation
Mutual Information I(C;X) measures Mutual Information I(C;X) measures the degree of the degree of interdependenceinterdependence between X and C.between X and C.
Entropy H(C) measures the degree Entropy H(C) measures the degree of of uncertaintyuncertainty of C. of C.
I(X;C) = H(C) – H(C|X).I(X;C) = H(C) – H(C|X). I(X;C) ≤ H(C) is the upper bound.I(X;C) ≤ H(C) is the upper bound.
12
Results:Results:Mutual Information I(C;X)Mutual Information I(C;X)
FeatureFeature I(C;X)I(C;X) % of H(C)% of H(C)
GLDM ContrastGLDM Contrast 0.511520.51152 87%87%
GLDM MeanGLDM Mean 0.511520.51152 87%87%
GLDM EntropyGLDM Entropy 0.572650.57265 98%98%
GLDM IDMGLDM IDM 0.327400.32740 56%56%
GLDM ASMGLDM ASM 0.580690.58069 99%99%
FDFD 0.021270.02127 4%4%
PSD SlopePSD Slope 0.274260.27426 47%47%
PSD Y-intPSD Y-int 0.386220.38622 66%66%
13
Feature Images - GLDMFeature Images - GLDMContrast Mean Entropy
Inverse Difference Moment Angular Second Moment All features
14
Feature Images – Fractal Feature Images – Fractal Dim.Dim.
Fractal Dimension
15
Feature Images - PSDFeature Images - PSDLinearized PSD Slope (Horizontal) Linearized PSD y-intercept (Horizontal) Linearized PSD Slope (Vertical)
Linearized PSD y-intercept (Vertical) Linearized PSD Slope (Both) Linearized PSD y-intercept (Both)
All features
16
Interdependence between Interdependence between FeaturesFeatures
Expensive to compute all features.Expensive to compute all features. Some features might be similar to Some features might be similar to
each other.each other.
Thus, need to measure the Thus, need to measure the interdependence between features: interdependence between features: I(XI(Xii; X; Xjj))
17
Results:Results:Interdependence between Interdependence between
FeaturesFeatures
Contrast Mean Entropy IDM ASM FDPSD Slope
PSDY-int
Contrast n/a 0.1971 0.1973 0.8935 1.0261 0.0354 0.0988 1.1055
Mean 0.1971 n/a 0.1973 0.8935 1.0261 0.0354 0.0988 1.1055
Entropy 0.1973 0.1973 n/a 1.1012 1.5323 0.0335 0.0888 0.9615
IDM 0.8935 0.8935 1.1012 n/a 0.2046 0.2764 0.4227 0.1184
ASM 1.0261 1.0261 1.5323 0.2046 n/a 0.1353 0.4904 0.1355
FD 0.0354 0.0354 0.0335 0.2764 0.1353 n/a 0.0541 0.2753
PSD Slope 0.0988 0.0988 0.0888 0.4227 0.4904 0.0541 n/a 1.0338
PSD Y-int 1.1055 1.1055 0.9615 0.1184 0.1355 0.2753 1.0338 n/a
18
Mutual Information BasedMutual Information BasedFeature Selection (MIFS)Feature Selection (MIFS)
1.1. Select first feature with highest Select first feature with highest I(C;X).I(C;X).
2.2. Select next feature with highest:Select next feature with highest:
3.3. Repeat until a desired number of Repeat until a desired number of features are selected.features are selected.
SelectedS
SXIXCI );();(
19
Mutual Information BasedMutual Information BasedFeature Selection (MIFS)Feature Selection (MIFS)
This method takes into account both:This method takes into account both: the interdependence between the interdependence between classclass and and
featuresfeatures, and, and the interdependence between the interdependence between selected selected
featuresfeatures..
The parameter The parameter ββ controls the amount of controls the amount of interdependence between selected interdependence between selected features.features.
20
Varying Varying ββ in MIFS in MIFS
{X1, X2, X3,…, X8}
S = {X2, X3} S = {X2, X7} S = {X2, X4}
β = 0
β = 0.5
β = 1
21
Generalized Divergence JGeneralized Divergence J
If the features are “biased” towards If the features are “biased” towards a class, J is large.a class, J is large.
A good set of features should have A good set of features should have small J.small J.
),(
),(log),(),(
BenignXP
CancerousXPBenignXPCancerousXPEJ x
FXXXX ,...,, 21
22
Results:Results:J with respect to J with respect to ββ
First feature selected: GLDM ASMFirst feature selected: GLDM ASM Second feature selected: …Second feature selected: …
ββ FeatureFeature JJ
00 GLDM EntropyGLDM Entropy 0.65530.6553
0.50.5 PSD Y-intPSD Y-int 0.29700.2970
11 PSD Y-intPSD Y-int 0.29700.2970
23
ConclusionsConclusions
Mutual Info. Based Feature Mutual Info. Based Feature Selection (MIFS):Selection (MIFS):
Generalized Divergence:Generalized Divergence:
SelectedS
SXIXCI );();(
),(
),(log),(),(
BenignXP
CancerousXPBenignXPCancerousXPEJ x
X1
CX2
XN
maximizem
inim
ize
{X1, X2, X3,…, X8}
S = {X2, X3}S = {X2, X7} S = {X2, X4}
β = 0
β = 0.5
β = 1
24
Questions and CommentsQuestions and Comments
……