AssessmentofFMandFM/TBMmodelinginCASP13
LucianoA.AbriataandMatteoDalPeraro
LaboratoryforBiomolecular Modeling– LBMInstituteofBioengineering,SchoolofLifeSciencesEcole Polytechnique Fédérale deLausanne- EPFL
Acknowledgements
• CASP13(andCASP12) organizers– Andriy,John,Torsten,Krzysztof,MayaandAnna!– PredictionCenter– Previousassessorsandallpredictors
§ Greatandconstructiveexperience§ Excitingtowitness2consecutivehugetransformationsinproteinstructureprediction
CASP13tertiarystructuretrack:32FM&13FM/TBM(+4FM-special)
DetailsonclassificationbyLisaKinch &Andriy Kryshtafovych
#groups/servers:107/39#models(1/5):7542/35982
Part1:EU-specificevaluations
StrategyforEU-specificevaluations
TargetEUs,modelsandtablesfromthePredictionCenter
~halfofthemodelsforinitialinspection,representedbytopGDTTSmodels
Clusteringat3Å
WebAppforinteractivenavigationofmodelclusters:6mainscores,othersavailabletoo
JS+HTML
Designationofbestcluster(s) foreachEU
VisualInspection
Model(s)designatedbestforeachEU
Furtherevaluationofmodelsinbestcluster,ifworth
Part1:EU-specificevaluationsChin-HsienTai,Hongjun Bai,ToddJ.Taylor,andByungkook Lee*Proteins 2013
CASP12-likewebapp:facilitatesassessment,andiseasilyopenedtothepublichttp://lucianoabriata.altervista.org/papersdata/casp12fmassessment/casp12-fm-fmtbm-assessment-3Aclusters.html
NEW:morescores,showserversindistinctcolor,andbuiltauxiliarywebappsalsoformodels
clusteredat1Å andforanalysiswithnosplitting
Part1:EU-specificevaluations
Clearbest
Theselookbestbut
theyaren’t
Unclearbest,visualinspectionofmultiplemodels
Verybad
Verygood
Excellent
Part1:EU-specificevaluations
Examplesofcorrelationplots
à GDTTS&QCSturnouttobethetwomostinformativescores,inourexperience*ForQCSseeCongetalBioinformatics2011
Manygoodmodels
Alwayshighr
Differentscoresproposedifferentbestmodels
GDTTS&QCSindeedgroupedseparatelyinanalysisbyOlechnovic etal.Bioinformatics2018
Examplesofcorrelationplots
TopGDTTS TargetTopQCS,secondGDTTS
HHscore 13.98LGA 73.5
Neff/LHHblits 0.01
T0991-D1(FM)
TS366_3(topbyDFM,
designatedbest)
Part1:EU-specificevaluations
Importanceofguidingvisualassessmentbymultiplescores
GDTTS 37.4QCS 68.8DFM 0.82
Target
T1010-D1(FM)
TS117_1
T0990-D3(FM)
Target TS043_1(designatedbest)
HHscore 2.76LGA 23.8Neff/LHHblits 0.2
HHscore 0.69
LGA 39.5
Neff/LHHblits 0.07
Part1:EU-specificevaluations
Severalveryhardtargetswithfoldscaptured
GDTTS 50QCS 80
GDTTS 50QCS 80
OnlytwoverydifficultEUswithnobestmodel
T0981-D2(FM)
Allscoreslow;heremodelofhighestGDTTSlooksreasonablebutismissingthelast
strandwhichisseparatedinsequence.Andmodelsthatarecompletearetoobad…
T0989-D2(FM)
LongextendedN-terminusandC-terminalbetahairpin,noneiswellpositioned;butthecentral
betasheetisquitegoodinsomemodels.
HHscore 14LGA 63.8Neff/LHHblits 0.07 HHscore 4.9LGA 55.1Neff/LHHblits 0.01
Part1:EU-specificevaluations
ImpactofprogressinCASP13:
Examplesof“FM-special”targetsforwhichfullmodelswereverygood
Part1:EU-specificevaluations
Example:T0953s2(D1:FM/TBM,D2&D3:FM)
TS117_4(TopbyTM,2.53ÅRMSDover61%ofsequence)
TS224_3(TopbyGDTTS)TargetbyEU(D1,D2,D3)
Example:T1000(D1:TBMnoteval.,D2:FM)TS043_1(TopbyGDTTS,scoresquitegoodbyallmetrics)
GDTTS 69.5QCS 90.7
Partsmissinginexp targetstructure
There’sanx-raystructureofD1
89%res<2Å
NotableprogressinCASP13:
12hardEUsthatreachednearatomisticresolutionbymanygroups
Part1:EU-specificevaluations
T0968s2-D1(FM)
TS043_1-D1(12models)
2.33Åoverfullsequence(115residues)
HHscore 19LGA50Neff/LHHblits 1.23
GDTTS80&QCS90
T0970-D1(FM/TBM)
TS043_2-D1(5modelsplus4fromTS347)
2.78Åover89%ofsequence(total96residues)
HHscore 17LGA67 GDTTS80&QCS90Neff/LHHblits 1.61
T1001-D1(FM)
TS222_4-D1(106models)
2.32Åoverfullsequence(139residues)
HHscore 11LGA55Neff/LHHblits 0.04GDTTS74&QCS93
T1008-D1(FM/TBM)
TS281_1-D1(126models)
1.14Åoverfullsequence(77residues)
HHscore 61LGA74Neff/LHHblits 0.01GDTTS91QCS95
NMR/MD
Part1:EU-specificevaluations
Part2:Rankings
RankingbasedonZ-scoresofGDTTS&QCS
Ranking=sumZ-scorescombinedfromGDTTS&QCS(asthesearebyfarthetwomostinformativescorestoguidevisualassessment)onallmodelssubmittedas#1,forTBM/FM,FMandFM_sptargetEUs,andconsideringsumofZ-score>-2.
Rankingisveryrobust:scoreswithGDTTSonlyorQCSonlyreturnthesametopgroups.
Allgroups
Servers
Part2:Ranking
Notablehighlights:groupsnotintop5whoprovidedtheonlybestmodelsforsometargets(uponvisualevaluation)
• ZHOU-SPOTforT0998-D1:alone&quitebetterthanrunners-up
• Jones-UCLforT1010-D1:alone&quitebetterthanrunners-up
• RaptorX-DeepModeller forT0949
• KIAS-GdanskforT0957s1-D1
• BAKERforT0975-D1
• Venclovas forT0991-D1Part2:Ranking
QCS78
QCS81
Part3:Progress
ProgressinFreeModeling(FM/TBMnotconsidered)
Notes:- ExactdefinitionofFMEUsmightvaryfromyeartoyear- CASP12andCASP13EUsofroughlyofsimilardifficulty
Median+/- MedianDeviationforGDTTSofbestmodels
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12
GDT
_TSofallbe
stm
odels
19941996 1998200020022004200620082010201220142016
BoxplotsofGDTTSdistributionsforbestmodels
10
15
20
25
30
35
40
45
50
55
60
1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016
Med
ianGDT
_TS
Year
Coevolution-basedcontactprediction
methodsinliterature
Med
ianGDT
TSofb
estm
odels
CASP13
Globalanalyses2- Progress
MachineLearningformolecularmodeling
Possiblesourcesofimprovement:alignmentdepth,existingtemplates,domainsize?
Possiblesourcesofimprovement:alignmentdepth,existingtemplates,domainsize?
CASP12
• FromCASP12toCASP13significantimprovementinperformance
• Dosomepredictorshaveaccesstospecial,closemetagenomicsdatabases?
CASP13
KeyconclusionsfromCASP13onthetertiarystructurepredictiontrack
• Yetanothersignificantimprovementinpredictionquality,mainlyduetotheriseofmachinelearningmethodscombinedwithcoevolution-basedcontactprediction
• ReachingnearlyatomisticresolutionofthebackboneforsomeverydifficultEUs(<150residues)bymanygroups!
• PredictionsaresogoodthatsplittingEUsisinsomecasesnotnecessary
• AlignmentdepthallowsforbettertopmodelsthaninCASP12,butnowseemtoneedlowernumbersofsequences
• TemplatesofpoorsequencesimilaritymightbebetteridentifiedthaninCASP12
• Remaininglimitations:domainsizeandalignmentdepth