methods for intelligent systems - polimi.it

19
Methods for Intelligent Systems Lecture Notes on Feature Projection 2009 - 2010 Simone Tognetti [email protected] Department of Electronics and Information Politecnico di Milano

Upload: others

Post on 17-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Methods for Intelligent Systems - polimi.it

Methods for Intelligent Systems Lecture Notes on Feature Projection

2009 - 2010

[email protected]

DepartmentofElectronicsandInformationPolitecnicodiMilano

Page 2: Methods for Intelligent Systems - polimi.it

FeatureProjection

•  Goal:ReducethenumberoffeaturesDtoimproveclassificationaccuracy•  Usealinearcombinationoforiginalfeatures

•  Projectsdataintodifferentspaceinwhichclassesmightbebestseparated

•  Resultingfeaturescouldlosetheirmeaning

x̄ = HxH : RD → RM

x̄ =

h11 h12 . . . h1N

h21 h22 . . . h2N

. . .hM1 hM2 . . . hMN

x

Hx

x̄x

f(x̄)

y

N

D M 1

2 Methods for Intelligent Systems A.A 2009‐2010 

Page 3: Methods for Intelligent Systems - polimi.it

SignalClassificationvsSignalRepresentation

•  SignalRepresentation(PCA)–  Informationassociatedwiththedatadistribution(i.e.meanandvariance)–  Norelationshipwiththeclassificationproblem–  Datashouldhavesimilarvariances

•  SignalClassification(LDA)–  Informationassociatedtodiscriminationcapabilities(inter‐classdistance)–  Tendencytoover‐fitthetrainingdatawithpoorgeneralizationabilities.

3 Methods for Intelligent Systems A.A 2009‐2010 

Page 4: Methods for Intelligent Systems - polimi.it

Principal Component Analysis (PCA)

•  Hp:Datafollowsamulti‐dimensionalGaussiandistribution

•  Goal:Findtheprincipalcomponentofthedistributionthataccountforthemaximumvarianceofdata

•  Covariancematrix

•  Decompositionof

–  Eigenvectorsareprincipalcomponentsofdistribution

–  Eigenvaluesarethevarianceofdataalongprincipalcomponents

Σx =

σ21 σ12 . . . σ1N

σ21 σ22 . . . σ2N

. . .σN1 σN2 . . . σ2

N

σij = E[(xi − µi)(xj − µj)]

Σx Σxϕ− λϕ = 0ϕ

λ

λi

ϕi

4 Methods for Intelligent Systems A.A 2009‐2010 

Page 5: Methods for Intelligent Systems - polimi.it

PCAandvectors

•  Samplesrepresentedina2‐Dvectorformat

areaxisvectors

•  SamplesinaD‐dimensionalspace

•  SupposewewanttorepresentthedatainM‐dimensional(M<D)space–  Replacethefeatureswithaconstantvalue

•  Wehaveanerror

•  Meansquarederror

x1

x2

x =∑D

i=1 xiϕi

x = x1ϕ1 + x2ϕ2

ϕ1, ϕ2

ϕ1

ϕ2

x̂(M) =∑M

i=1 xiϕi +∑D

i=M+1 biϕi

∆x(M) = x− x̂(M) =∑D

i=M+1(xi − bi)ϕi

ε2(M) = E[∆x(M)2] =∑D

i=M+1 E[(xi − bi)2]

5 Methods for Intelligent Systems A.A 2009‐2010 

Page 6: Methods for Intelligent Systems - polimi.it

PCAandmeansquarederror

•  Findthatminimize

•  Addingtheorthonormalityconstraint

•  Findthatminimize

bi ε2(M)

ε2(M) =∑D

i=M+1 ϕT Σxϕ +∑D

i=M+1 λi(1− ϕTi ϕ)

∂∂bi

E[(xi − bi)2] = −2(E[xi]− bi) = 0 bi = E[xi]

ε2(M) =D∑

i=M+1

E[(xi − E[xi])2]

=D∑

i=M+1

E[(xϕi − E[xϕi])T (xϕi − E[xϕi])]

=D∑

i=M+1

ϕT E[(x− E[x])(x− E[x])T ]ϕ =D∑

i=M+1

ϕT Σxϕ

ϕi ε2(M)∂

∂ϕiε2(M) = 2(Σxϕi − λiϕi = 0) Σxϕi = λiϕi

6 Methods for Intelligent Systems A.A 2009‐2010 

Page 7: Methods for Intelligent Systems - polimi.it

PCAStepbystep

1.  Giventhedataset

2.  Computethecovariancematrix

3.  Solvethecharacteristicequation

4.  ChosethefirstMeigenvectorscorrespondingtothelargesteigenvalues

5.  Projectthedata

x

Σx

Σxϕ− λϕ = 0

H = [ϕi|ϕ2| . . . |ϕM ] x̄ = [ϕi|ϕ2| . . . |ϕM ]x

7 Methods for Intelligent Systems A.A 2009‐2010 

Page 8: Methods for Intelligent Systems - polimi.it

PCAExample

•  Dataset•  Covariancematrix

•  Decompositionofcovariancematrix

x = {(1, 2), (3, 3), (3, 5), (5, 4), (5, 6), (6, 5), (8, 7), (9, 8)}

Σx =[

7.1429 4.85714.8571 4.0000

]

µ = (5, 5)

Σxϕ− λϕ = 0∣∣∣∣

7.1429− λ1 4.85714.8571 4.0000− λ2

∣∣∣∣ = 0

H = [ϕ1|ϕ2] =[−0.8086 −0.5883−0.5883 0.8086

]

λ1 = 10.6764, λ2 = 0.4664

8 Methods for Intelligent Systems A.A 2009‐2010 

Page 9: Methods for Intelligent Systems - polimi.it

PCAExample(2)

•  Projectionintoprincipalcomponents

•  RationofcomponentvarianceandtotalvarianceλiPD

i=1 λi

λ1PDi=1 λi

= 0.9581 λ2PDi=1 λi

= 0.0419

9 Methods for Intelligent Systems A.A 2009‐2010 

Page 10: Methods for Intelligent Systems - polimi.it

PCANotes

•  PCAaimstodecomposethecovariancematrix

•  isestimatedundertheassumptionofaGaussianDistributions

–  Ifdataarenotnormallydistributed,PCAde‐correlatesfeatures

•  PCAdoesnotuseclasslabelstoprojectdata

•  Projectiondependsonlyonthedatastructure

•  Bystretchingoneoftheprincipalcomponent,thedistributionofdatadoesnotchange

•  Thereisnoguaranteethatprincipalcomponentsbestseparateclasses

Σx

10 Methods for Intelligent Systems A.A 2009‐2010 

Page 11: Methods for Intelligent Systems - polimi.it

Fisher’sLinearDiscriminantAnalysis(LDA)

•  LDAisalinearprojection

•  theprojectionsmaximizesintra‐classseparability(amongdifferentclasses)andminimizesinter‐classseparability(inthesameclass).

•  Theprojectionisfindbysolvinganoptimizationproblem

–  Whichmeasureshouldbeminimized?

11 Methods for Intelligent Systems A.A 2009‐2010 

Page 12: Methods for Intelligent Systems - polimi.it

Fisher’sLinearDiscriminantAnalysis(LDA)

•  Apossiblemeasureofseparability:samplemean–  Samplemeanforclass

–  Projectedmean

–  Measureofseparation

•  thedistancebetweentheprojectedmeansisnotenough:itdoesnottakeinaccountthestandarddeviation

µi = 1Ni

∑x∈ωi

x

ωi

µ̃i = 1Ni

∑x∈ωi

WT x = WT µi

J(W ) = |µ̃1 − µ̃2|

12 Methods for Intelligent Systems A.A 2009‐2010 

Page 13: Methods for Intelligent Systems - polimi.it

Fisher’sLinearDiscriminantAnalysis(LDA)

•  Within‐classscattermatrix

•  Between‐classscattermatrix

•  Measureofseparation

–  Projectedmeansarewellseparated–  Projectedvariancearesmall

Sω =∑

ωi∈Y

∑x∈ωi

(x− µi)(x− µi)T

Sb =∑

ωi∈Y Ni(µ− µi)(µ− µi)T

s̃2i =

∑x∈ωi

(WT x−XT µi) =∑

x∈ωiWT (x− µi)(x− µi)T W = WT SωiW

J(W ) = |µ̃1−µ̃2|2s̃21+s̃2

2

s̃21 + s̃2

2 = WT SωW

J(W ) = W T SbWW T SwW

(µ̃1 − µ̃2)2 = (WT µ1 −WT µ2)2 = WT (µ1 − µ2)(µ1 − µ2)W = WT SbW

13 Methods for Intelligent Systems A.A 2009‐2010 

Page 14: Methods for Intelligent Systems - polimi.it

Fisher’sLinearDiscriminantAnalysis(LDA)

•  FindWthatmaximizesJ(W)

–  Dividingby

–  SolvingthegeneralizedeigenvectorproblemweobtainWthatmaximizeJ

∂∂W J(W ) = ∂

∂W

[W T SbWW T SwW

]= 0

(WT SwW ) ∂∂W (WT SbW )− (WT SbW ) ∂

∂W (WT SwW ) = 04

(WT SwW )

(WT SwW )2SbW − (WT SbW )2SwW = 0

W T SwWW T SwW SbW − W T SbW

W T SwW SwW = 0

SbW − JSwW = 0

S−1w SbW − JW = 0

14 Methods for Intelligent Systems A.A 2009‐2010 

Page 15: Methods for Intelligent Systems - polimi.it

LDAstepbystep

1.  Giventhedataset

2.  Computestheclassstatistic

3.  Computethewithin‐classscattermatrix

4.  Computethebetween‐classscattermatrix

5.  Solvethegeneralizedeigenvalueproblem

x

Sωi =∑

x∈ωi(x− µi)(x− µi)T

Sω =∑

i=1..|Y | Sωi

Sb =∑

ωi∈Y Ni(µ− µi)(µ− µi)T

S−1w SbW − JW = 0

15 Methods for Intelligent Systems A.A 2009‐2010 

Page 16: Methods for Intelligent Systems - polimi.it

LDAexample

1.  Dataset

2.  Computeclassstatistic

–  Class1

–  Class2

3.  Computethewithin‐classscattermatrix

x={(4,1), (2,4), (2,3), (3,6), (4,4) , (9,10), (6,8), (9,5), (8,7), (10,8)}y = {1, 1, 1, 1, 1, 2, 2, 2, 2, 2}

Sωi =∑

x∈ωi(x− µi)(x− µi)T

µ1 = [3, 3.6] Sω1 =[

4 −2−2 13.2

]

µ2 = [8.4, 7.6] Sω2 =[

9.2 −0.2−0.2 13.2

]

16 Methods for Intelligent Systems A.A 2009‐2010 

Sω =[

13.2 −0.2−0.2 26.4

]

Page 17: Methods for Intelligent Systems - polimi.it

LDAexample(2)

4.  Computethebetween‐classscattermatrix

5.  Solvethegeneralizedeigenvalueproblem

Sb =∑

ωi∈Y Ni(µ− µi)(µ− µi)T

Sb =[

72.9 5454 40

]

S−1w SbW − JW = 0

|S−1ω Sb − λI| = 0

∣∣∣∣5.9462− λ1 4.4046

2.5410 1.8822− λ2

∣∣∣∣ = 0

W =[

0.9196 −0.59520.3930 0.8036

]λ1 = 7.8284 λ2 = 0

17 Methods for Intelligent Systems A.A 2009‐2010 

Page 18: Methods for Intelligent Systems - polimi.it

NoteonLDA

•  ProducesonlyC‐1projections•  Hpofunimodaldistributionofdatawithineachclass.

–  Ifdataarehighlynonlineartheresultingprojectionissuboptimal•  Non‐parametricLinearDiscriminantAnalysisremovetheunimodalassumpion

–  iscomputedwithlocalinformationtroughaK‐NN–  resultinafullrankmatrixandprojectionismadeovermorethanc‐1

classes•  LDAfailswhentheinformationiscontainedinthevarianceratherthanthemean

–  Encodethevarianceasanewfeature!

Sb

Sb

18 Methods for Intelligent Systems A.A 2009‐2010 

Page 19: Methods for Intelligent Systems - polimi.it

Otherdimensionalityreductiontechniques

•  KernelPCA•  IndipendentComponentAnalysis(ICA)•  MultilayerPerceptron•  Selforganizingmaps(SOMs)•  Sammon'smap•  Supportvectorsmachine(SVM)

–  Marginmaximization

19 Methods for Intelligent Systems A.A 2009‐2010