casp 13 predicting contacts - predictioncenter.org · • it is difficult to directly correlate...

Post on 17-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CASP13

PredictingContacts

Assessor:AndrásFiserDepartmentofSystemsandComputationalBiologyDepartmentofBiochemistry

1

Possible questions

•  Doescontactpredictionaccuracycorrelatewiththatofstructuremodeling?•  Howwellyoudidamongyourselves?•  HowwellyoudidcomparedtopreviousCASPs?

•  Someinsightanalysis:–  Areyoucapturingthesamesetofcontacts?–  Arethereparticulartypesofcontactsthatyouaregettingaccurately?–  Howimportantisthequalityofsequenceinformation?

2

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044.

FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044.

XXG089(20)XG224(11)G498(1)XXXXXXXXXXXXXXXXX.

ContactsonlyG498(6)G032*G180*G323*G491*G106*G164(46)G189*G352*G125*G224(5)G036*

G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*.

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044.

FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044.

XX(G036)(12)G089(20)X(G032)(2)G224(11)G498(1)X(G180,G32)(3,2)X(G229)(39)XX(G498)XXXXXX(G491)XXXXXXX.

ContactsonlyG498(6)G032*(2)G322,G180*(2)G322G323*(2)G322G491*(16)G117G106*G164(46)G189*G352*G125*G224(5)G036*(2)or(50)G116G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*.

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044

FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044

XXG089(20)XG224(11)G498(1)XXXXXXXXXXXXXXXXX

ContactsonlyG498(6)G032*G180*G323*G491*G106*G164(46)G189*G352*G125*G224(5)G036*G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*

Difficulttoestablishclearrelationbetweencontactandstructureprediction->wedonotknowhowwellonecouldperformwitha“top”contactprediction

Sometopperformingstructurepredictiongroupsdidnotsubmitcontactprediction->wedonotknowiftheyhaveabettercontactpredictionthanothers

Amonggroupsthathavesubmittedbothstructureandcontactprediction:Surprisinginconsistencies!!Itisimportanttoknowhowtousecontactinformation!And/OrContactinformationisnotasimportantasonethought

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

ContactsXX089(20)X224(11)498(1)XXXXXXXXXXXXXXXXX

Contactsonly498(6)032*180*323*491*106*164(46)189*352*125*224(5)036*392*351(54)122(67)386*475*154*292*089(3)430*041(63)091*

89submitted:30/31targets…)

Are we predicting different contacts? Jaccard distance (“1-Intersection over Union”)

dj =A B − A B∩∪

A B∪0(same)<dj<1(different)

TopL/5numberofcontacts,Listhelengthofsequence

Contactsonly498(6)032*180*323*491*106*164(46)189*352*125*224(5)036*392*351(54)122(67)386*475*154*292*089(3)430*041(63)091*

(RRMD)

(RRMD-plus)

Deltacontact

Gammacontact

Tripletres

Tripletres_AT

Performance using different criteria

Models(FMorFM+TBM)XContacts(top10orL/5orL/2orLorFL)Xprobability(0or0.5)Xcontactdefinition(medium/long;long;extralong)=>60combinationsevaluatedbyeither:usingF1;Precision/Recall;Z-scoresumorZ-scoreaverageetc.

Long/medium contacts, FM only, Zscore >0

9

Long/medium contacts (FM only), sum Zscore (>0)

Long/medium:top10

Long/medium:L5

Long/medium:L2

Long/medium:L

032and323arethesame

Long contacts, (FM only), sum Zscore (>0)

11

Long:L

Long:L/2

Long:L/5

Long:top10

Extra long contacts only, (FM only), Zscore>0

12

ExtraLong:top10

ExtraLong:L/5

ExtraLong:L/2

ExtraLong:L

13

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP10

Improvement in contact prediction accuracy over CASP10-13 meetings

CASP10:23groups,15non-redundant

14

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP11 CASP10

Improvement in contact prediction accuracy over CASP10-13 meetings

CASP10:23groups,15non-redundantCASP11:28groups,22non-redundant

15

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP12 CASP11

CASP10

Improvement in contact prediction accuracy over CASP10-13 meetings

CASP10:23groups,15non-redundantCASp11:28groups,22non-redundantCASP12:31groups,22non-redundant

Improvement in contact prediction accuracy over CASP10-13 meetings

16

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP13 CASP12

CASP11 CASP10

CASP10:23groups,15non-redundantCASP11:28groups,22non-redundantCASP12:31groups,24non-redundantCASP13:44groups,34non-redundant

T0953s1d1

17

GoodFscore63.4

PoorFscore:14.64

BestTSmodel(G43),Cyan,GDT_TS54.48Contactmodel(G164),Green,GDT_TS41.05

Relationship between sequence profile depth and success (F-score) of predicting contacts

•  Lessreliantonsequenceprofiles. 18

20 30 40 50 60

020

040

060

080

010

00

F−Score

Num

ber o

f hits

●●● ●●

● ●

● ●

● ●

●●

●●

●●

● PsiblastHHBlits

Limited signal coming from sequence

19

20.7 23 10 36032.9 252 25 13845.7 14 1 4917 46 3728 50 17 33

31.6 40 22 3739.5 46 37 3621.9 669 37 4034.4 591 4131 89 46 168

33.3 38 20 6020.5 3021 1905 51132 172 172 457

25.4 609 183 46725 6130 129 465

34.7 132 31 20134.7 91 111 20116 30 6 917 30 6 55

35.7 278 17 34723 19 9 35

20.6 38 23 16218.3 38 23 16936 194 1 300

24.4 194 1 2746.4 58 31 45

18 58 31 5425 58 31 51

53.7 1 1 029 14 14 36

24.2 1266 1028 47843.7 1 1 219.1 3752 50032.1 4 3 923 584 110 43036 1 1 019 7 4 18

51.4 21 6 12351.4 77 13 12330 231 126 267

64.3 302 85 44164.3 545 343 44127.2 1730 68 47018.2 629 1163 5018.2 3755 38 5032 1380 53 465

Fscoree-5e-20Neff Fscoree-5e-20NeffBlastBlast+HHblits BlastBlast+HHblits

What is what?

20Green:parallel(parallelwithdiagonal)+diffuse(helical)Blue:Anti-parallel(orthogonalwithdiagonal)+compact(strand)

Performance vs. secondary structure interactions

21

E−E H−H E−H C−C E/H−C

Fsco

re0

1020

3040

5060

70

β-β

α-α

Coil-coil β-α

β/α-coil

Randommodel

Coil-coil

β-α

α-αβ-β

β/α-coil

Topology dependence of success rates, Class level

22

0

5

10

15

20

25

30

E H M

<F-score>

all-β all-α α/β

Correlation with size

23

0

10

20

30

40

50

60

0 50 100 150 200 250 300 350 400 450 500

Proteinlength

F-score*100accuracy

R=0.32

Withoutthissinglepoint:R=0.19

Conclusions

•  Contactpredictionmethodsmadeamajoradvanceforthelasttwoyears•  Alotofdifferentsubsetsofcorrectcontactscanbemadeandused

successfullyin3Dmodeling•  Itisdifficulttodirectlycorrelatepredictedcontactswith3Dpredictions

becauseofambiquityandlackofoverlapbetweencategoriesbut:–  Best3Dpredictorshaveeitherevensuperiorcontactpredictionsorbetterwaystouse

contactinformation–  Fromthefewexampleswhenbothcontactsand3Dstructureswerepredictedwesee

stronginconsistencies:itisimportanttoknowhowtousecontactinformation

•  Oftenveryfewhomologoussequenceswereavailable,butverygoodcontactpredictionsweremade

–  Lessemphasisonco-variancebasedmethods(supportedbytheabstractofinvitedgroups)

24

Acknowledgement

25

CASP and Predictioncenter at UC Davies, Davies, USA: Andriy Kryshtafovych Bohdan Monastyrskyy Krzysztof Fidelis CASP organizers Albert Einstein College of Medicine, New York, USA: Rojan Shrestha Eduardo Fajardo Nelson Gil

top related