casp 13 predicting contacts - predictioncenter.org · • it is difficult to directly correlate...

25
CASP 13 Predicting Contacts Assessor: András Fiser Department of Systems and Computational Biology Department of Biochemistry 1

Upload: others

Post on 17-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

CASP13

PredictingContacts

Assessor:AndrásFiserDepartmentofSystemsandComputationalBiologyDepartmentofBiochemistry

1

Page 2: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Possible questions

•  Doescontactpredictionaccuracycorrelatewiththatofstructuremodeling?•  Howwellyoudidamongyourselves?•  HowwellyoudidcomparedtopreviousCASPs?

•  Someinsightanalysis:–  Areyoucapturingthesamesetofcontacts?–  Arethereparticulartypesofcontactsthatyouaregettingaccurately?–  Howimportantisthequalityofsequenceinformation?

2

Page 3: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044.

FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044.

XXG089(20)XG224(11)G498(1)XXXXXXXXXXXXXXXXX.

ContactsonlyG498(6)G032*G180*G323*G491*G106*G164(46)G189*G352*G125*G224(5)G036*

G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*.

Page 4: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044.

FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044.

XX(G036)(12)G089(20)X(G032)(2)G224(11)G498(1)X(G180,G32)(3,2)X(G229)(39)XX(G498)XXXXXX(G491)XXXXXXX.

ContactsonlyG498(6)G032*(2)G322,G180*(2)G322G323*(2)G322G491*(16)G117G106*G164(46)G189*G352*G125*G224(5)G036*(2)or(50)G116G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*.

Page 5: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

G043G322G089G145G224G261G354G498G197G460G324G135G196G055G418G117G208G274G086G192G071G222G044

FMandTBM/FM FM ContactsG043G322G089G145G224G498G261G354G197G324G196G208G460G135G055G117G418G366G192G274G086G457G044

XXG089(20)XG224(11)G498(1)XXXXXXXXXXXXXXXXX

ContactsonlyG498(6)G032*G180*G323*G491*G106*G164(46)G189*G352*G125*G224(5)G036*G392*G351(54)G122(67)G386*G475*G154*G292*G089(3)G430*G041(63)G091*

Difficulttoestablishclearrelationbetweencontactandstructureprediction->wedonotknowhowwellonecouldperformwitha“top”contactprediction

Sometopperformingstructurepredictiongroupsdidnotsubmitcontactprediction->wedonotknowiftheyhaveabettercontactpredictionthanothers

Page 6: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Amonggroupsthathavesubmittedbothstructureandcontactprediction:Surprisinginconsistencies!!Itisimportanttoknowhowtousecontactinformation!And/OrContactinformationisnotasimportantasonethought

Best structure prediction (out of 98) vs. Best contact predictions (out of 46)

ContactsXX089(20)X224(11)498(1)XXXXXXXXXXXXXXXXX

Contactsonly498(6)032*180*323*491*106*164(46)189*352*125*224(5)036*392*351(54)122(67)386*475*154*292*089(3)430*041(63)091*

89submitted:30/31targets…)

Page 7: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Are we predicting different contacts? Jaccard distance (“1-Intersection over Union”)

dj =A B − A B∩∪

A B∪0(same)<dj<1(different)

TopL/5numberofcontacts,Listhelengthofsequence

Contactsonly498(6)032*180*323*491*106*164(46)189*352*125*224(5)036*392*351(54)122(67)386*475*154*292*089(3)430*041(63)091*

(RRMD)

(RRMD-plus)

Deltacontact

Gammacontact

Tripletres

Tripletres_AT

Page 8: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Performance using different criteria

Models(FMorFM+TBM)XContacts(top10orL/5orL/2orLorFL)Xprobability(0or0.5)Xcontactdefinition(medium/long;long;extralong)=>60combinationsevaluatedbyeither:usingF1;Precision/Recall;Z-scoresumorZ-scoreaverageetc.

Page 9: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Long/medium contacts, FM only, Zscore >0

9

Page 10: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Long/medium contacts (FM only), sum Zscore (>0)

Long/medium:top10

Long/medium:L5

Long/medium:L2

Long/medium:L

032and323arethesame

Page 11: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Long contacts, (FM only), sum Zscore (>0)

11

Long:L

Long:L/2

Long:L/5

Long:top10

Page 12: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Extra long contacts only, (FM only), Zscore>0

12

ExtraLong:top10

ExtraLong:L/5

ExtraLong:L/2

ExtraLong:L

Page 13: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

13

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP10

Improvement in contact prediction accuracy over CASP10-13 meetings

CASP10:23groups,15non-redundant

Page 14: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

14

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP11 CASP10

Improvement in contact prediction accuracy over CASP10-13 meetings

CASP10:23groups,15non-redundantCASP11:28groups,22non-redundant

Page 15: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

15

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP12 CASP11

CASP10

Improvement in contact prediction accuracy over CASP10-13 meetings

CASP10:23groups,15non-redundantCASp11:28groups,22non-redundantCASP12:31groups,22non-redundant

Page 16: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Improvement in contact prediction accuracy over CASP10-13 meetings

16

0

10

20

30

40

50

60

70

AveragePrecision

Longcontacts,L/5lists

CASP13 CASP12

CASP11 CASP10

CASP10:23groups,15non-redundantCASP11:28groups,22non-redundantCASP12:31groups,24non-redundantCASP13:44groups,34non-redundant

Page 17: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

T0953s1d1

17

GoodFscore63.4

PoorFscore:14.64

BestTSmodel(G43),Cyan,GDT_TS54.48Contactmodel(G164),Green,GDT_TS41.05

Page 18: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Relationship between sequence profile depth and success (F-score) of predicting contacts

•  Lessreliantonsequenceprofiles. 18

20 30 40 50 60

020

040

060

080

010

00

F−Score

Num

ber o

f hits

●●● ●●

● ●

● ●

● ●

●●

●●

●●

● PsiblastHHBlits

Page 19: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Limited signal coming from sequence

19

20.7 23 10 36032.9 252 25 13845.7 14 1 4917 46 3728 50 17 33

31.6 40 22 3739.5 46 37 3621.9 669 37 4034.4 591 4131 89 46 168

33.3 38 20 6020.5 3021 1905 51132 172 172 457

25.4 609 183 46725 6130 129 465

34.7 132 31 20134.7 91 111 20116 30 6 917 30 6 55

35.7 278 17 34723 19 9 35

20.6 38 23 16218.3 38 23 16936 194 1 300

24.4 194 1 2746.4 58 31 45

18 58 31 5425 58 31 51

53.7 1 1 029 14 14 36

24.2 1266 1028 47843.7 1 1 219.1 3752 50032.1 4 3 923 584 110 43036 1 1 019 7 4 18

51.4 21 6 12351.4 77 13 12330 231 126 267

64.3 302 85 44164.3 545 343 44127.2 1730 68 47018.2 629 1163 5018.2 3755 38 5032 1380 53 465

Fscoree-5e-20Neff Fscoree-5e-20NeffBlastBlast+HHblits BlastBlast+HHblits

Page 20: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

What is what?

20Green:parallel(parallelwithdiagonal)+diffuse(helical)Blue:Anti-parallel(orthogonalwithdiagonal)+compact(strand)

Page 21: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Performance vs. secondary structure interactions

21

E−E H−H E−H C−C E/H−C

Fsco

re0

1020

3040

5060

70

β-β

α-α

Coil-coil β-α

β/α-coil

Randommodel

Coil-coil

β-α

α-αβ-β

β/α-coil

Page 22: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Topology dependence of success rates, Class level

22

0

5

10

15

20

25

30

E H M

<F-score>

all-β all-α α/β

Page 23: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Correlation with size

23

0

10

20

30

40

50

60

0 50 100 150 200 250 300 350 400 450 500

Proteinlength

F-score*100accuracy

R=0.32

Withoutthissinglepoint:R=0.19

Page 24: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Conclusions

•  Contactpredictionmethodsmadeamajoradvanceforthelasttwoyears•  Alotofdifferentsubsetsofcorrectcontactscanbemadeandused

successfullyin3Dmodeling•  Itisdifficulttodirectlycorrelatepredictedcontactswith3Dpredictions

becauseofambiquityandlackofoverlapbetweencategoriesbut:–  Best3Dpredictorshaveeitherevensuperiorcontactpredictionsorbetterwaystouse

contactinformation–  Fromthefewexampleswhenbothcontactsand3Dstructureswerepredictedwesee

stronginconsistencies:itisimportanttoknowhowtousecontactinformation

•  Oftenveryfewhomologoussequenceswereavailable,butverygoodcontactpredictionsweremade

–  Lessemphasisonco-variancebasedmethods(supportedbytheabstractofinvitedgroups)

24

Page 25: CASP 13 Predicting Contacts - predictioncenter.org · • It is difficult to directly correlate predicted contacts with 3D predictions because of ambiquity and lack of overlap between

Acknowledgement

25

CASP and Predictioncenter at UC Davies, Davies, USA: Andriy Kryshtafovych Bohdan Monastyrskyy Krzysztof Fidelis CASP organizers Albert Einstein College of Medicine, New York, USA: Rojan Shrestha Eduardo Fajardo Nelson Gil