center for big data analytics and discovery informatics artificial … · 2018. 9. 9. · center...

96
Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory Fall 2018 Vasant G Honavar Evaluating Classifier Performance Vasant Honavar Artificial Intelligence Research Laboratory Informatics Graduate Program Computer Science and Engineering Graduate Program Bioinformatics and Genomics Graduate Program Neuroscience Graduate Program Center for Big Data Analytics and Discovery Informatics Huck Institutes of the Life Sciences Institute for Cyberscience Clinical and Translational Sciences Institute Northeast Big Data Hub Pennsylvania State University [email protected] http://faculty.ist.psu.edu/vhonavar http://ailab.ist.psu.edu

Upload: others

Post on 01-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Evaluating Classifier Performance

VasantHonavarArtificialIntelligenceResearchLaboratory

InformaticsGraduateProgramComputerScienceandEngineeringGraduateProgram

BioinformaticsandGenomicsGraduateProgramNeuroscienceGraduateProgram

CenterforBigDataAnalyticsandDiscoveryInformaticsHuckInstitutesoftheLifeSciences

InstituteforCyberscienceClinicalandTranslationalSciencesInstitute

NortheastBigDataHubPennsylvaniaStateUniversity

[email protected]://faculty.ist.psu.edu/vhonavar

http://ailab.ist.psu.edu

Page 2: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

WhyEvaluateclassifiers?

•  Toknowhowwellaclassifiercanbeexpectedtoperformwhenitisputtouse

•  Tochoosethebestmodelfromamongasetofalternatives

Page 3: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

EvaluatingaClassifier

•  Howcanwemeasureperformanceofclassifiers?•  Howwellcanaclassifierbeexpectedtoperformonnoveldata,i.e.,

datanotseenduringtraining?•  Wecanestimatetheperformance(e.g.,accuracy,sensitivity)ofthe

classifierusinganevaluationdataset(notusedfortraining)•  Howcloseistheestimatedperformancetothetrueperformance?

Page 4: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Classificationerror

•  Error=classifyingarecordasbelongingtooneclasswhenitbelongstoanotherclass.

•  Errorrate=percentofmisclassifiedsamplesoutofthetotalsamplesinthevalidationdata

Page 5: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

NaïveBaseline

•  Wehopetodobetterthanthenaïvebaseline•  Whenthegoalistoidentifyhigh-valuebutrare

outcomes,wemaydowellbydoingworsethanthenaïvebaselineintermsofaccuracy

Naïvebaseline:classifyallsamplesasbelongingtothemostprevalentclass

Page 6: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

EstimatingClassifierPerformance

N:TotalnumberofinstancesinthedatasetTPj: Numberof Truepositivesforclass j FPj : Numberof Falsepositivesforclass j TNj: Numberof TrueNegativesforclass j FNj: Numberof FalseNegativesforclass j

( )jj

jjj

clabelcclassPNTNTP

Accuracy

=∧==

+=

PerfectclassifierßàAccuracy=1PopularmeasureBiasedinfavorofthemajorityclass!Shouldbeusedwithcaution!

Page 7: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ClassifierLearning--MeasuringPerformanceClassLabel

C1 ¬ C1

C1 TP=55 FP=5¬ C1 FN=10 TN=30

355

5305

10085

1003055

6055

55555

6555

105555100

1

1

1

1

=+

=+

=

=+

=+

=

=+

=+

=

=+

=+

=

=+++=

FPTNFPfalsealarm

NTNTPaccuracy

FPTPTPyspecificit

FNTPTPysensitivit

FPTNFNTPN

Page 8: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

WhenOneClassisMoreImportantthananother

–  Taxfraud–  Creditdefault–  Responsetopromotionaloffer–  Detectingelectronicnetworkintrusion–  Predictingdelayedflights–  Diagnosingcancer–  Predictingnuclearreactormeltdown

Inmanycasesitismoreimportanttoidentifymembersofaspecifictargetclass

Insuchcases,wemaytolerategreateroverallerror,inreturnforbetterpredictionsofthemoreimportantclass

Page 9: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MeasuringClassifierPerformance:Sensitivity

( )( )

( )jj

j

jj

jj

jj

c classclabelP c classCount

c classclabelCountFNTP

TPensitivityS

===

=

=∧==

+=

|

PerfectclassifieràSensitivity=1ProbabilityofcorrectlylabelingmembersofthetargetclassAlsocalledrecallorhitrate

Page 10: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ClassifierLearning--MeasuringPerformanceClassLabel

C1 ¬ C1

C1 TP=55 FP=5¬ C1 FN=10 TN=30

355

5305

10085

1003055

6055

55555

6555

105555100

1

1

1

1

=+

=+

=

=+

=+

=

=+

=+

=

=+

=+

=

=+++=

FPTNFPfalsealarm

NTNTPaccuracy

FPTPTPyspecificit

FNTPTPysensitivit

FPTNFNTPN

Page 11: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MeasuringClassifierPerformance:Specificity ( )

( )( ) |

jj

j

jj

jj

jj

clabelcclassP clabelCount

c classclabelCountFPTP

TPpecificityS

===

=

=∧==

+=

PerfectclassifieràSpecificity=1AlsocalledprecisionProbabilitythatapositivepredictioniscorrect

Page 12: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MeasuringPerformance:Precision,Recall,andFalseAlarmRate

jj

jjj FPTP

TPySpecificitPrecision

+==

jj

jjj FNTP

TPySensitivitRecall

+==

( )( )

( )jj

j

jj

jj

jj

cclassclabelPclabelCount

cclassclabelCountFPTN

FPFalseAlarm

¬===

¬=

¬=∧==

+=

|

PerfectclassifieràPrecision=1PerfectclassifieràRecall=1

PerfectclassifieràFalseAlarmRate=0

Page 13: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ClassifierLearning--MeasuringPerformanceClassLabel

C1 ¬ C1

C1 TP=55 FP=5¬ C1 FN=10 TN=30

355

5305

10085

1003055

6055

55555

6555

105555100

1

1

1

1

=+

=+

=

=+

=+

=

=+

=+

=

=+

=+

=

=+++=

FPTNFPfalsealarm

NTNTPaccuracy

FPTPTPyspecificit

FNTPTPysensitivit

FPTNFNTPN

Page 14: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MeasuringPerformance–CorrelationCoefficient

CC j =TPj ×TN j( ) − FPj × FN j( )

TPJ + FN j( ) TPj + FPj( ) TN j + FPj( ) TN j + FN j( ) −1≤ CC j ≤1

CC j =jlabeli − jlabel( ) jclassi − jclass( )

σ JLABELσ JCLASSdi∈D∑

where jlabeli =1 iff the classifier assigns di to class c jjclassi =1 iff the true class of di is class c j

Page 15: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Bewareofterminologicalconfusionintheliterature!•  Somebioinformaticsauthorsuse“accuracy”incorrectlytorefer

torecalli.e.sensitivityorprecisioni.e.specificity•  Inmedicalstatistics,specificitysometimesreferstosensitivity

forthenegativeclassi.e.•  Someauthorsusefalsealarmratetorefertotheprobabilitythat

apositivepredictionisincorrecti.e.Whenyouwrite•  providetheformulaintermsofTP, TN, FP, FN Whenyouread•  checktheformulaintermsofTP, TN, FP, FN

jj

j

FPTNTN+

jjj

j PrecisionTPFP

FP−=

+1

Page 16: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MeasuringClassifierPerformance•  TP,FP,TN,FNprovidetherelevantinformation•  Nosinglemeasuretellsthewholestory•  Aclassifierwith98%accuracycanbeuselessif98%ofthe

populationdoesnothavecancerandthe2%thatdoaremisclassifiedbytheclassifier

•  Useofmultiplemeasuresrecommended•  Bewareofterminologicalconfusion!

Page 17: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Micro-averagedperformancemeasuresPerformanceonarandomsample

⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛×⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛×⎟⎟⎠

⎞⎜⎜⎝

=

∑∑∑∑∑∑∑∑

∑∑∑∑

jj

jj

jj

jj

jj

jj

jj

jj

jj

jj

jj

jj

FNTNFPTNFPTPFNTP

FNFPTNTPCCgeMicroAvera

∑∑

∑+

=

jj

jj

jj

FPTP

TPPrecision geMicroAvera ∑∑

∑+

=

jj

jj

jj

FNTP

TPRecall geMicroAvera

PrecisiongeMicroAveraFalseAlarmgeMicroAvera 1 −=

•  Microaveraginggivesequalimportancetoeachsample•  Classeswithlargenumberofinstancesdominate

N

TPAccuracygeMicroAvera j

j∑= Etc.

Page 18: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Macro-averagedperformancemeasures

∑=j

jnCoeffCorrelatioM

ionCoeffgeCorrelatMacroAvera 1

∑=j

jpecificitySM

ty SpecificigeMacroAvera 1

∑=j

jensitivitySM

ty SensitivigeMacroAvera 1

MacroaveraginggivesequalimportancetoeachoftheMclasses

Page 19: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

CutoffforclassificationMostmachinelearningalgorithmsclassifyviaa2-stepprocess:Foreachsample,

1.  Computeprobabilityofbelongingtoclass“1”2.  Comparetocutoffvalue,andclassifyaccordingly

•  Defaultcutoffvalueis0.50If>=0.50,classifyas“1”If<0.50,classifyas“0”

•  Canusedifferentcutoffvaluesfortradingoffonemeasureagainstanother(moreonthislater)

•  Question:HowwouldthisworkinthecaseofKnearestneighbor?

Page 20: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

•  Ifcutoffis0.50:12samplesareclassifiedas“1”•  Ifcutoffis0.80:sevensamplesareclassifiedas“1”

ActualClass Prob.of"1" ActualClass Prob.of"1"1 0.996 1 0.5061 0.988 0 0.4711 0.984 0 0.3371 0.980 1 0.2181 0.948 0 0.1991 0.889 0 0.1491 0.848 0 0.0480 0.762 0 0.0381 0.707 0 0.0251 0.681 0 0.0221 0.656 0 0.0160 0.622 0 0.004

CutoffTable

Page 21: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ReceiverOperatingCharacteristic(ROC)Curve

•  Theconfusionmatrix,andhencethepreviousmeasuresofclassifierperformancearethresholddependent

•  Wecanoftentradeoffrecallversusprecision–e.g.,byadjustingclassificationthresholdθ

•  Isthereathreshold-independentmeasureofclassifierperformance?– ROCcurveisaplotofSensitivityagainstFalseAlarm

Ratewhichissameas(1-Specificity)whichcharacterizesthistradeoffforagivenclassifier

– ROCcurveisobtainedbyplottingsensitivityagainst(1-specificity)byvaryingtheclassificationthreshold

Page 22: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Receiveroperatingcharacteristic(ROC)Curve

Page 23: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MeasuringPerformanceofClassifiers–ROCcurves

•  ROCcurvesofferamorecompletepictureoftheperformanceoftheclassifierasafunctionoftheclassificationthreshold

•  AclassifierhisbetterthananotherclassifiergifROC(h)dominatestheROC(g)

•  ROC(h)dominatesROC(g)àAreaROC(h)>AreaROC(g)

1

1

0

0

Page 24: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ROCCurve

Page 25: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

MisclassificationCostsMayDiffer

•  Thecostofmakingamisclassificationerrormaybehigherforoneclassthantheother(s)

•  Lookedatanotherway,thebenefitofmakingacorrectclassificationmaybehigherforoneclassthantheother(s)

Page 26: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Example–ResponsetoPromotionalOffer

•  “Naïverule”(classifyeveryoneas“0”)haserrorrateof1%(seemsgood)

•  Usingmachinelearningsupposewecancorrectlyclassifyeight1’sas1’s

•  Butatthecostofmisclassifyingtwenty0’sas1’sandtwo1’sas0’s.

•  Supposewesendanofferto1000people,with1%averageresponserate

•  “1”=response,“0”=nonresponse

Page 27: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Errorrate=(2+20)=2.2%(higherthannaïverate)

ConfusionMatrix

Predictas1 Predictas0Actual1 8 2Actual0 20 970

Page 28: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

IntroducingCosts&BenefitsSuppose:•  Profitfroma“1”is$10•  Costofsendingofferis$1Then:•  Undernaïverule,allareclassifiedas“0”,sono

offersaresent:nocost,noprofit•  UnderDMpredictions,28offersaresent.

8respondwithprofitof$10each20failtorespond,cost$1each972receivenothing(nocost,noprofit)

Page 29: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ProfitMatrix

Predictas1 Predictas0Actual1 $80 0Actual0 ($20) 0

Page 30: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

EvaluatingaClassifier

•  Whatwehavedonesofaristoestimatetheclassifier’sperformanceonsomeavailabledata.

•  Howwellcanaclassifierbeexpectedtoperformonnoveldata?

•  Performanceestimatedontrainingdataisoftenoptimisticrelativetoperformanceonnoveldata

•  Wecanestimatetheperformance(e.g.,accuracy,sensitivity)oftheclassifierusingevaluationdata(notusedfortraining)

•  Howcloseistheestimatedperformancetothetrueperformance?

Page 31: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Evaluationofaclassifierwithlimiteddata

•  Holdoutmethod–usepartofthedatafortraining,andtherestfortesting

•  Wemaybeluckyorunlucky–trainingdataortestdatamaynotberepresentative

•  Solution–Runmultipleexperimentswithdisjointtrainingandtestdatasetsinwhicheachclassisrepresentedinroughlythesameproportionasintheentiredataset

Page 32: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ClassifierevaluationData Label

0

0

1

1

0

1

0

Trainingdata

Testingdata

Labe

led

data

Page 33: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

ClassifierevaluationData Label

0

0

1

1

0

1

0

Trainingdata

Testingdata

trainaclassifier

model

Labe

led

data

Page 34: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Classifierevaluation

Data Label

1

0

Pretendlikewedon’tknowthelabels

Page 35: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Classifierevaluation

Data Label

1

0

model

Classify

1

1

Pretendlikewedon’tknowthelabels

Page 36: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Classifierevaluation

Data Label

1

0

model

Pretendlikewedon’tknowthelabels

Classify

1

1

Comparepredictedlabelstoactuallabels

Page 37: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingalgorithms

Data Label

1

0

model1 1

1

model2 10

Ismodel2betterthanmodel1?

Page 38: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingalgorithms

model1 1

1

model2 1

0

Predicted

1

0

Label

1

0

LabelPredicted

Evaluation

score1

score2

model2betterifscore2>score1

Whenwouldwewanttodothistypeofcomparison?

Page 39: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Ismodel2better?Model1:85%accuracyModel2:80%accuracy

Model1:85.5%accuracyModel2:85.0%accuracy

Model1:0%accuracyModel2:100%accuracy

Page 40: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingscores:significance•  Justcomparingscoresononedatasetisn’t

enough!•  Wedon’tjustwanttoknowwhichsystemis

betterononeparticulardataset,wewanttoknowifmodel1isbetterthanmodel2ingeneral

•  Putanotherway,wewanttobeconfidentthatthedifferenceisrealandnotjustduetorandomchance

Page 41: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howdoweknowhowvariableamodel’saccuracyis?

Variance

Page 42: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Varianceofperformance

•  Weneedmultipleaccuracyscores!•  Howcanwegetthem?

Page 43: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

RepeatedexperimentationData Label

0

0

1

1

0

1

0

Trainingdata

Testingdata

Labe

led

data

Insteadofoneevaluationwithaparticularsplitoftrainingandtestdata,runmultipleevaluations,withdifferentsplitsoftrainingandtestdata

Page 44: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Repeatedexperimentation

Data Label

0

0

1

1

0

1

Trai

ning

dat

a

Data Label

0

0

1

1

0

1

0

0

1

1

0

1

Data Label

=evaluation=train

Page 45: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

K-foldcrossvalidationTr

aini

ngd

ata

breakintonequal-sizedparts

repeatforallparts/splits:trainonK-1partsevaluateontheother

split1 split2

split3

Page 46: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

K-foldcrossvalidation

split

1

split

2

…sp

lit3

evaluate

score1

score2

score3

Page 47: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

K-foldcrossvalidation

•  Betterutilizationoflabeleddata•  Morerobust:don’tjustrelyononeevaluationsetto

evaluatetheapproach(orforoptimizingparameters)•  MultipliesthecomputationaloverheadbyK(haveto

trainKmodelsinsteadofjustone)•  10isthemostcommonchoiceofK

Page 48: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

EstimatingtheperformanceofaclassifierK-foldcross-validationPartitionthedata(multi)setSintoKequalpartsS1..SK

withroughlythesameclassdistributionasS.Errorc=0

Fori=1toKdo

;iTrain SSS −←iTest SS ←)( TrainSLearn←α

}

{

),( TestSErrorErrorcErrorc α+←

( )ErrorOutputK

ErrorcError ;⎟⎠

⎞⎜⎝

⎛←

Page 49: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Estimatingclassifierperformance

Recommendedprocedure•  UseK-foldcross-validation(K=5or10)forestimating

performanceestimates(accuracy,precision,recall,pointsonROCcurve,etc.)and95%confidenceintervalsaroundthemean

•  Computemeanvaluesofperformanceestimatesandstandarddeviationsofperformanceestimates

•  Reportmeanvaluesofperformanceestimatesandtheirstandarddeviationsor95%confidenceintervalsaroundthemean

•  Beskeptical–repeatexperimentsseveraltimeswithdifferentrandomsplitsofdataintoKfolds!

Page 50: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Leave-one-outcrossvalidation•  K-foldcrossvalidationwhereK=numberof

samples•  aka“jackknifing”•  pros/cons?•  whenwouldweusethis?

Page 51: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Leave-one-outcross-validation

•  K-foldcrossvalidationwithK=nwherenisthetotalnumberofsamplesavailable

•  nexperiments–usingn-1samplesfortrainingandtheremainingsamplefortesting

•  Leave-one-outcross-validationdoesnotguaranteethesameclassdistributionintrainingandtestdata!

Extremecase:50%class1,50%class2PredictmajorityclasslabelinthetrainingdataTrueerror–50%;

Leave-one-outerrorestimate–100%!!!!!

Page 52: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Leave-one-outcrossvalidation•  Canbeveryexpensiveiftrainingisslowand/or

iftherearealargenumberofexamples•  Usefulindomainswithlimitedtrainingdata:

maximizesthedatawecanusefortraining•  Someclassifierspermittheestimationof

leave-1-outperformancemeasurewithoutactuallyhavingtotrainKmodels

Page 53: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample1split model1 model2

1 87 882 85 843 83 844 80 795 88 896 85 857 83 818 87 869 88 8910 84 85

average: 85 85

Ismodel2betterthanmodel1?

Page 54: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample2split model1 model2

1 87 872 92 883 74 794 75 865 82 846

79 877 83 818 83 929 88 8110 77 85avg 82 85

Ismodel2betterthanmodel1?

Page 55: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample3split model1 model2

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

Ismodel2betterthanmodel1?

Page 56: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystemssplit model1 model2

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

split model1 model2

1 87 872 92 883 74 794 75 865 82 846 79 877 83 818 83 929 88 8110 77 85

average: 82 85

What’sthedifference?

Page 57: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystemssplit model1 model2

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

stddev 2.3 1.7

split model1 model2

1 87 872 92 883 74 794 75 865 82 846 79 877 83 818 83 929 88 8110 77 85

average: 82 85

stddev 5.9 3.9

Page 58: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample4

split model1 model2

1 80 822 84 873 89 904 78 825 90 916 81 837 80 808 88 899 76 7710 86 88

average 83 85

stddev 4.9 4.7

Ismodel2betterthanmodel1?

Page 59: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample4

split model1

model2 model2–model

11 80 82 22 84 87 33 89 90 14 78 82 45 90 91 16 81 83 27 80 80 08 88 89 19 76 77 110 86 88 2

average 83 85stddev 4.9 4.7

Ismodel2betterthanmodel1?

Page 60: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample4

split model1 model2 model2–model1

1 80 82 22 84 87 33 89 90 14 78 82 45 90 91 16 81 83 27 80 80 08 88 89 19 76 77 110 86 88 2

average 83 85stddev 4.9 4.7

Model2isALWAYSbetter

Page 61: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample4

split model1 model2 model2–model1

1 80 82 22 84 87 33 89 90 14 78 82 45 90 91 16 81 83 27 80 80 08 88 89 19 76 77 110 86 88 2

average: 83 85

stddev 4.9 4.7

Howdowedecideifmodel2isbetterthanmodel1?

Page 62: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

StatisticaltestsSetup:

–  Assumesomedefaulthypothesisaboutthedatathatyou’dliketodisprove,calledthenullhypothesis

–  e.g.model1andmodel2arenotstatisticallydifferentinperformance

Test:–  Calculateateststatisticfromthedata(oftenassuming

somethingaboutthedata)–  Basedonthisstatistic,withsomeprobabilitywecan

rejectthenullhypothesis,thatis,showthatitdoesnothold

Page 63: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

t-test

Determineswhethertwosamplescomefromthesameunderlyingdistributionornot

?

Page 64: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

t-testNullhypothesis:model1andmodel2accuraciesarenodifferent,i.e.comefromthesamedistributionResult:probabilitythatthedifferenceinaccuraciesisduetorandomchance(lowvaluesarebetter)

Page 65: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Calculatingt-testForoursetup,we’lldowhat’scalleda“pairt-test”

–  Thevaluescanbethoughtofaspairs,wheretheywerecalculatedunderthesameconditions

–  Inourcase,thesametrain/testsplit– Givesmorepowerthantheunpairedt-test(wehave

moreinformation)

Foralmostallexperiments,we’lldoa“two-tailed”versionofthet-testhttp://en.wikipedia.org/wiki/Student's_t-test

Page 66: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

p-value•  Theresultofastatisticaltestisoftenap-value•  p-value:theprobabilitythatthenullhypothesis

holds.Specifically,ifwere-ranthisexperimentmultipletimes(sayondifferentdata)whatistheprobabilitythatwewouldrejectthenullhypothesisincorrectly(i.e.theprobabilitywe’dbewrong)

•  Commonvaluestoconsider“significant”:0.05(95%confident),0.01(99%confident)and0.001(99.9%confident)

Page 67: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample1split model1 model2

1 87 882 85 843 83 844 80 795 88 896 85 857 83 818 87 869 88 8910 84 85

average: 85 85

Ismodel2betterthanmodel1?

Theyarethesamewith:p=1

Page 68: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample2split model1 model2

1 87 872 92 883 74 794 75 865 82 846 79 877 83 818 83 929 88 8110 77 85

average: 82 85

Ismodel2betterthanmodel1?

Theyarethesamewith:p=0.15

Page 69: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample3split model1 model2

1 84 872 83 863 78 824 80 865 82 846 79 877 83 848 83 869 85 8310 83 85

average: 82 85

Ismodel2betterthanmodel1?

Theyarethesamewith:p=0.007

Page 70: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Comparingsystems:sample4split model1 model2

1 80 822 84 873 89 904 78 825 90 916 81 837 80 808 88 899 76 7710 86 88

average: 83 85

Ismodel2betterthanmodel1?

Theyarethesamewith:p=0.001

Page 71: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Statisticaltestsontestdata

LabeledData

(datawithlabels)

AllTraining

Data

TestData

TrainingData

DevelopmentData

cross-validationwitht-test

Canwedothathere?

Page 72: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Bootstrapresamplingtestsettwithnsamplesdomtimes:-  samplenexampleswithreplacementfromthe

testsettocreateanewtestsett’-  evaluatemodel(s)ont’

calculatet-test(orotherstatisticaltest)onthecollectionofmresults

Page 73: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Bootstrapresampling

Test’1

sam

ple

with

re

plac

emen

tTestData

Test’m

Test’2

Page 74: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Bootstrapresampling

modelA

Test’1

Test’2

Test’m

eval

uate

m

odel

on

data

Ascore1

Ascore2

Ascorem

Page 75: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Bootstrapresampling

modelB

Test’1

Test’2

Test’m

eval

uate

m

odel

on

data

Bscore1

Bscore2

Bscorem

Page 76: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Bootstrapresampling

Ascore1

Ascore2

Ascorem

Bscore1

Bscore2

Bscorem

pairedt-test(orotheranalysis)

Page 77: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Experimentationgoodpractices

Neverlookatyourtestdata!Duringdevelopment

–  Comparedifferentmodels/hyperparametersondevelopmentdata

–  usecross-validationtogetmoreconsistentresults–  Ifyouwanttobeconfidentwithresults,useat-test

andlookforp=0.05(orevenbetter)Forfinalevaluation,usebootstrapresamplingcombinedwithat-testtocomparemodels

Page 78: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Estimatingtheperformanceofaclassifier

ThetrueerrorofahypothesishwithrespecttoatargetfunctionfandaninstancedistributionDis

[ ])()(Pr)( xhxfhErrorDxD ≠≡

ThesampleerrorofabinaryclassifierhwithrespecttoatargetfunctionfandaninstancedistributionDis

otherwise 0),( ; iff 1),(

))()((||

1)(

=≠=

≠≡ ∑∈

bababa

xhxfS

hErrorSx

S

δδ

δ

Page 79: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Estimatingclassifierperformance

( )( )

( ) [ ]

41

81

81

00110110

41

81

21

81

=+=

=+==

≠=

⎭⎬⎫

⎩⎨⎧

=

=

)()()()(Pr

,,,)(

},,,{)(

cXDaXDxfxhherror

xfxh

dcbax

XD

dcbaXDomain

DD

Page 80: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Evaluatingtheperformanceofaclassifier

•  Sampleerrorestimatedfromtrainingdataisanoptimisticestimate

•  Foranunbiasedestimate,hmustbeevaluatedonanindependentsampleS(whichisnotthecaseifSisthetrainingset!)

•  Evenwhentheestimateisunbiased,itcanvaryacrosssamples!•  Ifhmisclassifies8outof100samples

[ ] )()( hErrorhErrorEBias DS −=

0801008 .)( ==hErrorS

Howcloseisthesampleerrortothetrueerror?

Page 81: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseistheestimatederrortothetrueerror?•  ChooseasampleSofsizenaccordingtodistributionD•  Measure

)(hErrorS

)(hErrorS isarandomvariable(outcomeofarandomexperiment)

?)( about conclude wecan what,)( Given hErrorhError DS

Moregenerally,giventheestimatedperformanceofahypothesis,whatcanwesayaboutitsactualperformance?

Page 82: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Evaluatingperformancewhenwecanaffordtotestonalargeindependenttestset

ThetrueerrorofahypothesishwithrespecttoatargetfunctionfandaninstancedistributionDis

[ ])()(Pr)( xhxfhErrorDxD ≠≡

The sample error of a classifier hwith respect to a target function fand an instance distribution D is

otherwise 0),( ; iff 1),(

))()((||

1)(

=≠=

≠≡ ∑∈

bababa

xhxfS

hErrorSx

S

δδ

δ

Page 83: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

EvaluatingClassifierperformance

Sampleerrorestimatedfromtrainingdataisanoptimisticestimate

Foranunbiasedestimate,hmustbeevaluatedonanindependentsampleS(whichisnotthecaseifSisthetrainingset!)

Evenwhentheestimateisunbiased,itcanvaryacrosssamples!Ifhmisclassifies8outof100samples

[ ] )()( hErrorhErrorEBias DS −=

0801008 .)( ==hErrorS

Howcloseisthesampleerrortothetrueerror?

Page 84: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseisestimatederrortoitstruevalue?ChooseasampleSofsizenaccordingtodistributionDMeasure )(hErrorS

)(hErrorS isarandomvariable(outcomeofarandomexperiment)

?)( about conclude wecan what,)( Given hErrorhError DS

Moregenerally,giventheestimatedperformanceofaclassifier,whatcanwesayaboutitsactualperformance?

Page 85: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseisestimatedaccuracytoitstruevalue?

Question:Howcloseisp(thetrueprobability)to ?Thisproblemisaninstanceofawell-studiedprobleminstatistics•  Theproblemofestimatingtheproportionofapopulationthat

exhibitssomeproperty,giventheobservedproportionoverarandomsampleofthepopulation.

•  Inourcase,thepropertyofinterestisthathcorrectly(orincorrectly)classifiesasample.

•  TestinghonasinglerandomsamplexdrawnaccordingtoDamountstoperformingarandomexperimentwhichsucceedsifhcorrectlyclassifiesxandfailsotherwise.

Page 86: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseisestimatedaccuracytoitstruevalue?

TheoutputofaclassifierwhosetrueerrorispasabinaryrandomvariablewhichcorrespondstotheoutcomeofaBernoullitrialwithasuccessratep(theprobabilityofcorrectprediction)

ThenumberofsuccessesrobservedinNtrialsisarandom

variableYwhichfollowstheBinomialdistribution

rnr pprnr

nrP −−−

= )()!(!

!)( 1

Page 87: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Probabilityofobservingrmisclassifiedexamplesinasampleofsizen:

ErrorS(h)isaRandomVariable

rnr pprnr

nrP −−−

= )()!(!

!)( 1r

Page 88: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Recallbasicstatistics

ConsiderarandomexperimentwithdiscretevaluedoutcomesTheexpectedvalueofthecorrespondingrandomvariableYisThevarianceofYisThestandarddeviationofYis

Myyy ,..., 21

)Pr()( i

M

ii yYyYE =≡ ∑

=1

[ ]2])[()( YEYEYVar −≡

)(YVarY ≡σ

Page 89: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseisestimatedaccuracytoitstruevalue?

ThemeanofaBernoullitrialwithsuccessratep=pVariance=p(1-p)IfNtrialsaretakenfromthesameBernoulliprocess,the

observedsuccessratehasthesamemeanpandvarianceForlargeN,thedistributionoffollowsaGaussiandistribution

Npp )1( −

Page 90: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

BinomialProbabilityDistribution

rnr pprnr

nrP −−−

= )()!(!

!)( 1

ProbabilityP(r)ofrheadsinncoinflips,ifp=Pr(heads)• Expected,ormeanvalueofX,E[X],is

∑=

=≡N

inpiiPXE

0)(][

• VarianceofXis

• StandarddeviationofX,σX,is

)(]])[[()( pnpXEXEXVar −=−≡ 12

)(]])[[( pnpXEXEX −=−≡ 12σ

Page 91: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Estimators,Bias,Variance,ConfidenceInterval

npp

hErrorS

)()(

−=

phErrornrhError

D

S

=

=

)(

)(

nhErrorhError SS

hErrorS

))()(()(

−≈

AnN%confidenceintervalforsomeparameterpthatistheintervalwhichisexpectedwithprobabilityN%tocontainp

nhErrorhError DD

hErrorS

))()(()(

−=

Page 92: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Normaldistributionapproximatesbinomial

ErrorS(h)followsaBinomialdistribution,with•  mean•  standarddeviation

nhErrorshError

hErrorsDD

S

))()(()(

−= 1σ

WecanapproximatethisbyaNormaldistributionwiththesamemeanandvariancewhennp(1-p)≥5

)()( hErrorDhErrorS=µ

Page 93: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Normaldistribution2

21 )(1

22)( σ

µ

πσ

−−=x

exp

Expected,ormeanvalueofXisgivenbyE[X]=µVarianceofXisgivenbyVar(X)=σ2StandarddeviationofXisgivenbyσX=σ

TheprobabilitythatXwillfallintheinterval(a,b)isgivenby∫

b

adxxp )(

Page 94: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseistheestimatedaccuracytoitstruevalue?LettheprobabilitythataGaussianrandomvariableX,withzero

mean,takesavaluebetween–zandz,Pr[-z≤X≤z]=c

Pr[X≥z] z

0.001 3.09

0.005 2.58

0.01 2.33

0.05 1.65

0.10 1.28

Page 95: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseistheestimatedaccuracytoitstruevalue?

Butdoesnothavezeromeanandunitvariancesowenormalizetoget

cz

nppppz =

⎥⎥⎥⎥

⎢⎢⎢⎢

<−

−<−

)(ˆPr1

Page 96: Center for Big Data Analytics and Discovery Informatics Artificial … · 2018. 9. 9. · Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research

Center for Big Data Analytics and Discovery Informatics Artificial Intelligence Research Laboratory

Fall2018 VasantGHonavar

Howcloseistheestimatedaccuracytoitstruevalue?

Tofindconfidencelimits:Givenaparticularconfidencefigurec,usethetabletofindthezcorrespondingtotheprobability½(1-c).Uselinearinterpolationforvaluesnotinthetable

⎥⎦

⎤⎢⎣

⎡+

⎥⎥⎦

⎢⎢⎣

⎡+−±+

=

nz

nz

np

npz

nzp

p2

2

222

1

42ˆˆˆ