69
5DiscriminantAnalysis
5.1Introduction
aExample:Irisspecies
Goal:Aruletoidenitifyforanyplantthecorrectspecies.
Examplebankbills:Identifyforgedbills!
bGeneralModelclasski,variables(characteristics)X(j)i
Xi∼Fki,Fk,parametric,usuallynormaldistribution
cTrueclasskifixed,unknownitem=incidentalparameter
d...orrandomvariableKi.ModelincludesP〈Ki=k〉=πk
Fk=conditionaldistr.ofXigivenKi=k.
705.1
eFk,πkgiven−→rulex7→K̂〈x〉„IdentificationAnalysis"
Inapplications:EstimateFkfromtrainingdata.
fSimplestModel:Xi∼Nm〈µki,|Σ〉
µ̂k=Xk=1nk
∑{i|ki=k}
Xi
|̂ Σ=1
n−g
∑g
k=1
∑{i|ki=k}
(Xi−Xk.)(Xi−Xk.)T
=1
n−g
∑i(Xi−Xki.)(Xi−Xki.)
T
71
5.2ClassificationAccordingtoKnownDistributions
aNewobservationx0.−→"estimate"k0!
Decisionbetweengpossibilitiesbasedondatax0.
bNormaldatawithequal|Σ,X0∼N〈µk,|Σ〉.Beginwithcase|Σ=I.
72
k̂=argmink
⟨d〈x0,µk;|Σ〉
⟩
735.2
cGeneralcase?−→Max.Likelihood−→k̂0=argmaxk
⟨fk〈x0〉
⟩
X0∼N〈µk,|Σ〉.−→
d2〈x0,µ2;|Σ〉−d
2〈x0,µ1;|Σ〉>0=⇒k̂0=2
h〈x0〉=d2〈x0,µ2;| Σ〉−d
2〈x0,µ1;| Σ〉
=(x0−µ2)T| Σ−1
(x0−µ2)−(x0−µ1)T| Σ−1
(x0−µ1)
=−2(µ2−µ1)T| Σ−1
x0+µT2| Σ−1
µ2−µT1| Σ−1
µ1
=α+βTx0
k̂〈x0〉={1ifh〈x0〉<0
2ifh〈x0〉>0
h:lineardiscriminantfunctionofFisher
74 5.2
dEstimateparameters!
R>library(MASS);lda(Species∼.,data=iris)
0.40.50.60.70.80.9
0.00.1
0.20.3
0.4
l.Petal.Length
l.Petal.W
idth
versicolorvirginica
versicolorvirginica
Diskriminanz−Funktion
Häufigkeit
02
46
810
12
−4−3−2−101234
755.2
eLogisticRegression.Lineardiscr.fn.↔linearregressionfunction.
Regression:"Predict"targetvariableYfromexplanatoryvariablesx!Here:
Y=classnumber=binaryvariable.
Random:Y,characterizedbyπ=P〈Y=1〉.π=functionofx.
Y=1:observationbelongstoclass2.
ProbabilitiesY=1andY=0proportionaltofk〈x〉!
log
⟨P〈Y=1〉
P〈Y=0〉
⟩=log
⟨f2(x〉
f1(x〉
⟩=h〈x〉=α+β
Tx.
="prob./complementaryprob."=odds.
LogisticRegression:logodds=linearfunctioninX(j).
76 5.2
f−→Directestimationofαandβ.
NoassumptionsaboutexplanatoryvariablesX(j)
−→asflexibleasmultipleregression!
Allowforfactors,transformations,interactions!
glm(Species∼.,data=d.iris,family=binomial)
g>2classes,equal|Σ−→minimalMahalanobisdist.d2〈x0,µk;|Σ〉
h|Σ=I−→Mahalanobisdistance=ordinary(Euklidean)distance.
3classes−→planethroughclasscenterssufficesforclassification.
(Distancefromthisplane:iflarge,observationdoesnotfitintoanyclass!)
gclasses:g−1-dimensionalspace−→g−1diskriminantfunctions.|Σ6=Ianalogous.
77
(i)(ii)
78 5.2
i
−10−505
−6−4
−20
2
discriminant function 1
discriminant f. 2
setosaversicolorvirginica
Coefficientsβ̂ofthediscr.functions:
SepalblätterPetalblätter
D.f.18.709.07-20.779-3.529D.f.2-9.85-15.18-0.7130.313
795.2
kUnequalcovariancematrices
d2〈x,µ2;| Σ2〉−d
2〈x,µ1;| Σ1〉−c=
(x−µ2)T| Σ−1
2(x−µ2)−(x−µ1)T| Σ−1
1(x−µ1)−c=0
quadraticeq.inx−→quadraticdiscriminantanalysis.
80
815.2
lModelSelection.
•Checkassumptions(multivar.normaldistribution!)
•selectexplanatoryvariables
•modelnon-linearrelationshipsandinteractions.
(Interactionscontradictnormaldistribution)
82
5.3ErrorRates
aDiagnostictestsinmedicine−→diseasedandhealthyindividuals.
2kindsoferror:
•healthysubjectsclassifiedasill
−→lossofconfidence,uselesstreatment
•Diseasedsubjectsclassifiedashealthy
−→maymisslifesavingtreatment!
83 5.3
bExample:Vesselconstriction,predictfromVolumeandrateofheartbeat.
−0.4−0.3−0.2−0.10.00.10.20.30.40.50.6
−0.4−0.2
0.00.2
0.40.6
log(Vol)
log(Rate)
verengtgesund
84 5.3
cTypesofError.Test„positive"−→classifiedasill
testresulttruthhealthy,"negative"ill,"positive"
healthyo.k.wrongpositivefalsealarm
illwrongnegativeo.k.missedalarm
dSensitivityandSpezificity.
Sensitivity=#ill&positive
#ill
Spezificity=#healynegatives
#healthySensitivity=(condit.)prob.ofadiseaseds.tobeclassifiedassuch
Specificity=(condit.)prob.ofahealthys.tobeclassifiedassuch
85 5.3
eVariablethreshold.
Sensitivity=P〈k̂=2|K=2〉=P〈h〈X〉>c|K=2〉
Spezificity=P〈k̂=1|K=1〉=P〈h〈X〉<c|K=1〉
−3−2−1012
0.00.2
0.40.6
0.81.0
discriminant function
status
sensitivityspecificity
86 5.3
fChoosethresholdpragmatically!
Refinedecision:Allowforregionofindeterminacy!
gAssumeyougeta"positive"testresultHowdoyoureact?
rateofwrongpositives=#wrongpositives
#positives
rateofwrongnegatives=#wrongnegatives
#negatives
Conditionalprob.ofbeinghealthy,givenapositivetest.
−→„falsealarm".Prob.P〈k=1|k̂=2〉
Conditionalprob.ofhavingthedisease,givenanegativetest..
−→„missedalarm".Prob.P〈k=2|k̂=1〉
Theseprob.sareonlywelldefinedifkisrandom!
875.3
h−→Newmodel,Kinsteadofk.
Distr.ofKgivenbyπ1=P〈K=1〉and
„prevalence"π2=P〈K=2〉=1−π1.
P〈K=1|k̂=2〉tobecalculatedbyBayes’theorem:
P〈K=1|k̂=2〉=P〈K=1undk̂=2〉
P〈k̂=2〉
=P〈k̂=2|K=1〉π1
P〈k̂=2|K=1〉π1+P〈k̂=2|K=2〉π2
885.3
iExample:Given:
•thesensitivityP〈k̂=2|K=2〉=0.95
•thespecificityP〈k̂=1|K=1〉=0.9
•theprevalenceπ2=0.01.
P〈K=1|k̂=2〉=0.1∗0.99
0.1∗0.99+0.95∗0.01
=0.099
0.099+0.0095=0.912
Forrarediseasesthefalsealarmrateishigh!
895.3
jErrorrates.„Theoretical":
Q=π1P〈k̂〈X〉=2|X∼F〈θ1〉〉+π2P〈k̂〈X〉=1|X∼F〈θ2〉〉
Errorratetobeestimated!(Generalconsideration...)
kXi∼Nm〈µKi,I〉mitµ1=−∆/2[1,0]T,µ2=+∆/2[1,0]T
90
−2024
−4−2
02
4
−20246
−4−2
02
4π1=π2=1/2:k̂=1,ifX
(1)<0,k̂=2otherwise.
−→Q=Φ〈−∆/2〉.General:∆2=(µ2−µ1)T|Σ−1
(µ2−µ1)
915.3
lEstimateparamtersofthemodel
−→param.estimatederrorrateQ̂=Φ〈−∆̂/2〉.
mApparentErrorRate.relativefrequencyintrainingdata,
Qapp=(#{i|k̂〈X1i〉=2}+#{i|k̂〈X2i〉=1})/n.
toooptimistic!
nDetermineerrorrateusingnewdata("testdata")!
or:Splitdatasetrandomlyintotrainingandtestdata.
Adequateifdatasourceis"endless"(datamining).
925.3
oCrossvalidation
Estimatedecisionrulewithoutusingobservationi
−→Prob.ofamisclassificationofiisunbiased.
Obs.icorrectlyclassified?
Repeatthisforallobs.−→ratesofclassificationerrors.
Qcv1=(#{i|k̂[−i]〈X1i〉=2}+#{i|k̂[−i]〈X2i〉=1})/n
−→Comparetoresampling,jackknife!
93
MessagesDiscriminantAnalysis
•Classicalmethodsofdiscriminantanalysis:
•2classes,equal|Σ−→lineareDA,onediscr.function
−→logisticregression
•g≥3classes,equal|Σ−→linearDA,g−1discr.functions
−→multinomialregression
•unequal|Σ−→quadraticdiscr.analysis
•2classes,variablethresholdcforlin.discr.function
−→Sensitivityundspezificityusedforchoosingc.
•Errorrates:Naiveappearenterrorrateistoooptimistic.
−→crossvalidation!
94
5.4FurtherMethods
aTheassumptionwasX∼Nm
⟨µh,|Σh
⟩
Inappropriateforlargedatasets!Usemoredetailedinformation!
bNearestNeighborsForanewobservationX0tobeclassified
find`≥1nearestneighbors(intrainingdata).
Simplerule:Majorityvoteoftheclassnumbersamongnearestn.
Problem:Whichmetricisappropriate?
R>library(class);knn(...);knn1(...)
95
Caluvulg
012345678901234567
01
23
45
67
Caluvulg
Nardstri
Festrubr
Caluvulg
02
46
8
96
−4−3−2−101234
−4−3
−2−1
01
LD1
LD2
44
2
4 4
3
3
3
3
4
3
3
3
3 3
3
3
3
4
2
3
4
3
4
3
3
2
34
4
4
4
4
4
43
3
4
3
4
3
33
3
3
44 4
2
3
3
4
3
2
4 44
3
22
22
2
222
22
2
22
2
2
3
2
975.4
cNeuralNetworks
Generalregressionproblem:InputXi,outputYi.
Onehiddenlayerfeed-forwardneuralnetwork:
Y(k)
=gk
⟨αk+∑
`w`kg̃`
⟨α̃`+∑
jw̃j`X(j)⟩⟩
Usethelogisticfunctionforgandg̃!
Fordiscriminantanalysis:NeedruletoconvertoutputYintok̂
e.g.K=argmaxk〈Y(k)〉
R>library(nnet);nnet(...)
98
-
-
-
-
-
l
l
l
l
l
������������1
��
��
��
��
��
��3
��
��
��
��
��
���
- ������������1
��
��
��
��
��
��3
PPPPPPPPPPPPq
- ������������1Q
QQQs
PPPPPPPPPPPPq
-
@@
@@
@@
@@
@@
@@R
QQs
PPPPPPPPPPPPq
α0+Σ;h̃
α0+Σ;h̃
α0+Σ;h̃
������������1-
PPPPPPPPPPPPqβ0+Σ;
γ0+γ1h̃-
x(j)
α(k)j
z(k)
βk
y
inputneurons
hiddenneurons
outputneuron
99 5.4
dFlexibleDiscriminantAnalysisPreliminaryRemark:For2groups,consider
logisticregression.
Lessappropriate:UseLeastSquaresevenwithbiniaryY.
−→estimatedregressionfunction=lin.discriminantfunction!
Severalclasses:same,formultinomialregression.
−→ThesubspaceofβTXequals
thesubspaceofthediscr.functions.
FlexibleDA:Useanymoreflexibleregressionmethod
insteadofthelinear(withoriginalX’s)
R>library(mda);fda(...,method=mars)oder
method=bruto
Literatur:Hastie,TibshiraniandBuja(1994)
1005.4
eMixtureDiscriminantAnalysisR>library(mda);mda(...)HastieandTibshirani(1994)
fClassificationandRegressionTrees(CART)(2groups)
Splittheobservationsonthebasisofthe
mostdiscriminatingsinglevariableinto2groups.
Spliteachgroupagain(ifsuccessful)into2groups
usingasinglevariable.Repeat!
−→DecisionTree
R>library(tree);tree(...)oder
R>library(rpart);rpart(...)
1015.4
gGeneralRemark
NeuralNetworksandothers
aregeneralandflexible.BUT:
•Dangerofover-fittingthedata–exceptinlargedatasets.
Tricktoavoidthat:Selectmodelonthebasesofofcrossvalidation.
•nodirectgraphicaldisplayandinterpretation,„blackbox".
102 5.4
hBoostingIdea:A(too)simpleclassif.methodcanbeimprovedby"recycling":
1.Estimatetheruleasgiven−→Classificationk̂(0)〈xi〉.
2.Determinewronglyclassifiedobservations.
Re-estimatetheruleusingincreasedweightsforthemis-class.obs.
−→Classificationk̂(1)
〈xi〉.
Repeat2asoftenasuseful.
Boostedrule:(weighted)majorityvoteamongtheclassificationsk̂(`).Fried-
man,HastieandTibshirani(2000)
iBaggingBootstrapaggregating.
Determinetherulemanytimesbybootstrappingthetrainingdata.
−→Majorityvote.
LLiteratur:Ripley(1996)Treatsthesemethodsbutthelast2withfocuson
applications.Sometimeslachsprecision.