evaluang methods - tandy warnowtandy.cs.illinois.edu/method-evaluation.pdf ·...
TRANSCRIPT
Evalua&ngMethods
TandyWarnow
You’vedesignedanewmethod!Nowwhat?
Toevaluateanewmethod:• Establishtheore&calproper&es.• Evaluateondata.• Comparethenewmethodtoothermethods.Howdoyoudothis?
GeneralIssues
• Sofarwehavecomputedtreesandwehavecomputedalignments.
• Howcanwequan&fyaccuracyorerror?Whatdatasetsshouldweuse?
• Whataretheissues?
Basiccriteria
• Sensi&vity=trueposi&verate=recallrate= TP/(TP+FN)
• Precision=posi&vepredic&vevalue= TP/(TP+FP)
• Specificity=truenega&verate= TN/(TN+FP)
• FalseDiscoveryRate=1-PPV
Trueposi&ves,falseposi&ves,etc.
• Forthesecriteria,weneedtounderstandtheconceptsof– trueposi&ve,– falseposi&ve,– truenega&ve,and– falsenega&ve
• Inotherwords,weneedtohavea“yes/no”classifier.
Simpleexample:HIVtes&ng
• Samplespace:HIVtests(Eliza)– Trueposi&ve:thetestcomesoutposi&veandthepersondoeshaveHIV
– Truenega&ve:thetestcomesoutnega&veandthepersondoesnothaveHIV
– Falseposi&ve:thetestcomesoutposi&vebutthepersondoesnothaveHIV
– Falsenega&ve:thetestcomesoutnega&veandthepersondoeshaveHIV
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– TP=9,FP=11,TN=979,FN=1– Sensi&vity=TP/(TP+FN)=9/10=90%– Specificity=TN/(TN+FP)=979/990=98.9%– Precision=TP/(TP+FP)=9/20=45%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalseposi&verate?– Whatisthefalsenega&verate?
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalseposi&verate?– FPrate=#falseposi&vesdividedbythenumberoftotalposi&ves,soFP/(FP+TP)=11/20=55%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalsenega&verate?– FNrate=#falsenega&vesdividedbythenumberoftotalnega&ves,soFN/(FN+TN)=1/990=0.1%
Hypothe&calExample
• Thepopula&onis1,000samples• 10ofthemhavethedisease,990donot• Thetestisposi&veon20:9ofthe10withthedisease,and11ofthe990whodonothavethedisease– Whatisthefalsenega&verate?– FNrate=#falsenega&vesdividedbythenumberoftotalnega&ves,soFN/(FN+TN)=1/990=0.1%
GeneralIssues
• Sofarwehavecomputedtreesandwehavecomputedalignments.
• Howcanwequan&fyaccuracyorerror?Whatdatasetsshouldweuse?
• Whataretheissues?
Performance criteria • Running time • Space • Statistical performance issues (e.g., statistical
consistency and sequence length requirements) • “Topological accuracy” with respect to the underlying
true tree, typically studied in simulation. • Accuracy with respect to a mathematical score (e.g.
tree length or likelihood score) on real data
Sta&s&calConsistency
error
Data
FN: false negative (missing edge) FP: false positive (incorrect edge)
FN
FP
50% error rate
AlignmentError/Accuracy
• SPFN:percentageofhomologiesinthetruealignmentthatarenotrecovered(falsenega&vehomologies)
• SPFP:percentageofhomologiesinthees&matedalignmentthatarefalse(falseposi&vehomologies)
• TC:totalnumberofcolumnscorrectlyrecovered• SP-score:percentageofhomologiesinthetruealignmentthatarerecovered
• Pairsscore:1-(avgofSP-FNandSP-FP)
OtherAlignmentEs&ma&onCriteria
• Treetopologyerror• Treebranchlengtherror
• Gaplengthdistribu&on• Inser&on/dele&onra&o• Alignmentlength• Numberofindels
StudyingMethods
• Thepointistoevaluateanewmethodincomparisontopriormethods.
• Youneedtodothisondata,notjustusingtheorems.
• Howdoyoudothis?
Benchmarks
• Simula&ons:cancontroleverything,andtruealignmentisnotdisputed– Differentsimulators
• Biological:can’tcontrolanything,andreferencealignmentandreferencetreemightnotbecorrect.Alignmentbenchmarksarealsosomewhatproblema&c,forvariousreasons:– BAliBASE,HomFam,Prefab– CRW(Compara&veRibosomalWebsite)
24 Brief introduction to phylogenetic estimation
Simulation Studies
S1 S2
S3 S4
S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-CT-------GACCGC-- S4 = -------TCAC--GACCGACA
S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC S4 = TCACGACCGACA
S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-C--T-----GACCGC-- S4 = T---C-A-CGACCGA----CA
Compare
True tree and alignment
S1 S4
S3 S2
Estimated tree and alignment
Unaligned Sequences
Figure 1.6 A simulation study protocol. Sequences are evolved down a model tree under a processthat includes insertions and deletions; hence, the true alignment and true tree are known. An align-ment and tree are estimated on the generated sequences, and then compared to the true alignmentand true tree.
distance) between two trees is the number of non-trivial bipartitions that are present in oneor the other tree but not in both trees.
Each of these ways of quantifying error in an estimated tree can be normalized to pro-duce a proportion between 0 and 1 (equivalently, a percentage between 0 and 100). Forexample, the FN error rate would be the percentage of the non-trivial model tree biparti-tions that are not present in the estimated tree, and the FP error rate would be the percentageof the non-trivial bipartitions in the estimated tree that are not present in the model tree.Finally, the Robinson-Foulds error rate is the RF distance divided by 2n� 6, where nis the number of leaves in the model tree; note that 2n� 6 is the maximum possible RFdistance between two trees on the same set of n leaves.
Figure 1.7 provides an example of this comparison; note that the model tree (called thetrue tree in the figure) is rooted, but the inferred tree is unrooted. To compute the tree error,we unroot the true tree, and treat it only as an unrooted tree. Since both trees are binary(i.e., each non-leaf node has degree three), there are only two internal edges. Each of thetwo trees have the non-trivial bipartition separating S1,S2 from S3,S4,S5, but each tree alsohas a bipartition that is not in the other tree. Hence, the RF distance between the two treesis 2, out of a maximum possible of 4, and so the RF error rate is 50%. Note also that thereis one true positive edge and one false positive edge in the inferred tree, so that the inferredtree has FN and FP rates of 50%.
Designingasimula&onstudy
• Considertherealismofthesimulator.• Considerwhetherthecondi&onsaretooeasyortoodifficulttobehelpful.
• Considerthecompe&ngmethodstoexplore.• Considersta&s&calsignificance.• Beconcernedwithrepeatability.
Data
• Biologicaldata:– Howreliablearethereferencealignmentsandtrees?
• Simulateddata: – Howrealis&carethesimula&oncondi&ons?
Simulators
• Sequenceevolu&ondownatree:– Indels?Ifso,whatlengths?– Subs&tu&onsunderwhatmodel?– Howmanysubs&tu&ons?Howmanyindels?– Howisthetreetopologyandsetofbranchlengthsdefined?
– Isthetreeultrametric?– Howmanyleavesinthetree(i.e.,#sequences)?– Howlongarethesequences?
Methods
• Areyoupickingthebestcompe&ngmethods?• Areyourunningtheminthebestway?
Criteria
• Areyouusingcriteriathatareconsideredappropriatebytheresearchcommunity?
• Ifyouareusingnewcriteria,jus&fythesecriteria(andprobablyusethestandardcriteriaanyway).
Repeatability
• Providefulldetailsabouthowyourantheanalysessothatthesameexperimentcouldbedonebythepersonreadingthepaper.
• Saveyourdataandmakethemavailabletothereaders.
Wri&ngPapers
Read• AppendixCinComputa&onalPhylogene&csforguidelinesaboutwri&ngpapersaboutcomputa&onalmethods.
• “Howtowriteyourfirstpaper”–onmyhomepage
• “Commonlyencounteredchallengesinresearchethics”–onmyhomepage