prediction of structural and functional features in ... · chou- fasman algorithm conformational...

36
Prediction of structural and functional features in proteins starting from the residue sequence

Upload: others

Post on 09-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Prediction of structural and functional features in proteins starting from the

residue sequence

Page 2: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Computational Approach

From Sequence to StructureMNPNQKIITIGSVCMTIGMANLILQIGNIISIWISHSIQLGNQN

QIETCNQSVITYENNTWVNQTYVNISNTNFAAGQSVVSVKLAGNSSLCPVSGWAIYSK

DNSVRIGSKGDVFVIREPFISCSPLECRTFFLTQGALLNDKHSNGTIKDRSPYRTLMS

CPIGEVPSPYNSRFESVAWSASACHDGINWLTIGISGPDNGAVAVLKYNGIITDTIKS

WRNNILRTQESECACVNGSCFTVMTDGPSNGQASYKIFRIEKGKIVKSVEMNAPNYHY

EECSCYPDSSEITCVCRDNWHGSNRPWVSFNQNLEYQIGYICSGIFGDNPRPNDKTGS

CGPVSSNGANGVKGFSFKYGNGVWIGRTKSISSRNGFEMIWDPNGWTGTDNNFSIKQD

IVGINEWSGYSGSFVQHPELTGLDCIRPCFWVELIRGRPKENTIWTSGSSISFCGVNS

DTVGWSWPDGAELPFTID

Tertiary Predictions:

1. Comparative/Homology

Modeling

2. Fold Recognition

3. De Novo Protein

Structure Prediction

Page 3: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

From Sequence to Structure: Summary

Comparative/Homology modelling requires:1) the availability of a template2) high sequence identity between target and template

Multiple sequence alignment and HMM are able to extend the applicability domain of comparative modelling (remote homology)

Example from the practicum: starting from the seed you adopted for modelling the Kunitz domain, how many similar domain can you recognize in SwissProtwith simple sequence search? How many with your (or the PFAM) HMM?

Page 4: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

From Sequence to Structure: going a step further

What if similarity methods (simple or profile-based) fail (i.e. no suitable template can be detected in the PDB) ?

What are the possible scenarios?

Page 5: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

From Sequence to Structure: going a step further

What if similarity methods (simple or profile-based) fail (i.e. no suitable template can be detected in the PDB) ?

What are the possible scenarios?

1) Suitable templates DO NOT EXIST in the PDB

2) There are possible templates in the PDB, but they CANNOT BE RECOGNIZED.

Page 6: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

From Sequence to Structure: going a step further

What if similarity methods (simple or profile-based) fail (i.e. no suitable template can be detected in the PDB) ?

What are the possible scenarios?

1) Suitable templates DO NOT EXIST in the PDB Ab Initio Methods are required

2) There are possible templates in the PDB, but they CANNOT BE RECOGNIZED.

Fold recognition/Threading methods can be adopted

Page 7: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Ab initio prediction of protein structure – concept

Difficult because search space is huge. Much larger conformational space

Goal: Predict Structure only given its amino acid sequenceIn theory: Lowest Energy Conformation

• Go from sequence to structure by sampling the conformational space in a reasonable manner and select a native-like conformation using a good discrimination function

Difficult for sequences larger that 150aa

Rosetta (David Baker lab) one of best (CASP evaluation)

Page 8: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Marc Berjanskii

Page 9: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Marc Berjanskii

Page 10: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Fold recognition

• Proteins that do not have similar sequences sometimes have similar three-dimensional structures (such as B-barrel TIM fold)

• A sequence whose structure is not known is fitted directly (or “threaded”) onto a known structure and the “goodness of fit” is evaluated using a discriminatory function

3.6 Å5% ID

NK-lysin (1nkl) Bacteriocin T102/as48 (1e68)

Page 11: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Generalization of comparative modeling method

• Homology Modeling: Align sequence to sequence• Threading: Align sequence to structure (templates)For each alignment, the probability that that each amino acid residue would occur in such an environment is calculated based on observed preferences in determined structures.

Rationale:• Limited number of basic folds found in nature• Amino acid preferences for different structural environments provides sufficient information to choose the best-fitting protein fold (structure)

Protein Threading, Fold Recognition

Page 12: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

From Sequence to Structure: going a step further

1) Suitable templates DO NOT EXIST in the PDB Ab Initio Methods are required

2) There are possible templates in the PDB, but they CANNOT BE RECOGNIZED.

Fold recognition/Threading methods can be adopted

Any idea on how to recognize a fold in the PDB despite the undetectable sequence similarity?

Page 13: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Secondary Structure Prediction

Even if sequence looses any detectable similarity, secondary structure (and other features such as solventaccessibility profile, disulfide bonds…) should be more conserved

Page 14: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Prediction of different features can help fold recognition and threading

Does the sequence

“fit” on any of a

library of known

3D structures?

>C562_RHOSH

TQEPGYTRLQITLHWAIAGL…

Page 15: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Covalent structure

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

Ct

3D structure

Secondary structure

EEEE..HHHHHHHHHHHH....HHHHHHHH.EEEE...........

MAPPING PROBLEMS: Secondary structure

Page 16: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

position of Trans Membrane Segments along the sequenceTopography

Porin (Rhodobacter capsulatus)

Bacteriorhodopsin(Halobacterium salinarum)

-barrel -helices

Outer Membrane Inner Membrane

ALALMLCMLTYRHKELKLKLKK ALALMLCMLTYRHKELKLKLKK ALALMLCMLTYRHKELKLKLKK

MAPPING PROBLEMS: Topology of transmembrane proteins

Page 17: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

First generation methodsSingle residue statistics

Propensity scales

For each residue

•The association between each residue and the different features is statistically evaluated

•Physical and chemical features of residues

A propensity value for any structure can be associated to any residue

HOW?

Page 18: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Secondary structure: Chou-Fasman propensity scale

Given a set of known structures we can count how manytimes a residue is associated to a structure.

Example: ALAKSLAKPSDTLAKSDFREKWEWLKLLKALACCKLSAAL

hhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhh

N(A,h) = 7, N(A,c) =1, N= 40

P(A,h) = 7/40, P(A,h) = 1/40

Is that enough for estimating a propensity?

Page 19: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Secondary structure: Chou-Fasman propensity scale

Given a set of known structures we can count how manytimes a residue is associated to a structure.

Example: ALAKSLAKPSDTLAKSDFREKWEWLKLLKALACCKLSAAL

hhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhh

N(A,h) = 7, N(A,c) =1, N= 40

P(A,h) = 7/40, P(A,h) = 1/40

We need to estimate how much independent the residue-to-structure association is.

P(h) = 27/40, P(c) = 13/40 P(A) = 8/40

Page 20: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Secondary structure: Chou-Fasman propensity scale

Given a set of known structures we can count how manytimes a residue is associated to a structure.

Example: ALAKSLAKPSDTLAKSDFREKWEWLKLLKALACCKLSAAL

hhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhh

N(A,h) = 7, N(A,c) =1, N= 40

P(A,h) = 7/40, P(A,c) = 1/40

P(h) = 27/40, P(c) = 13/40, P(A) = 8/40

If the structure is independent of the residue:P(A,h) = P(A)P(h)

The ratio P(A,h)/P(A)P(h) is the propensity

Page 21: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =
Page 22: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Assignment of Amino Acids

Page 23: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Given a LARGE set of examples, a propensity value can becomputed for each residue and each structure type

Name P(H) P(E)

Alanine 1,42 0,83

Arginine 0,98 0,93

Aspartic Acid 1,01 0,54

Asparagine 0,67 0,89

Cysteine 0,70 1,19

Glutamic Acid 1,51 0,37

Glutamine 1,11 1,10

Glycine 0,57 0,75

Histidine 1,00 0,87

Isoleucine 1,08 1,60

Leucine 1,21 1,30

Lysine 1,14 0,74

Methionine 1,45 1,05

Phenylalanine 1,13 1,38

Proline 0,57 0,55

Serine 0,77 0,75

Threonine 0,83 1,19

Tryptophan 1,08 1,37

Tyrosine 0,69 1,47

Valine 1,06 1,70

Secondary structure: Chou-Fasman propensity scale

Page 24: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Given a new sequence a secondary structure prediction canbe obtained by plotting the propensity values for eachstructure, residue by residue

Considering three secondary structures (H,E,C), the overall accuracy, as evaluated on an uncorrelated set of sequences with known structure, is very lowQ3 = 50/60 %

T S P T A E L M R S T GP(H) 69 77 57 69 142 151 121 145 98 77 69 57

P(E) 147 75 55 147 83 37 130 105 93 75 147 75

Secondary structure: Chou-Fasman propensity scale

Page 25: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

http://www.expasy.ch/cgi-bin/protscale.pl

Secondary structure: Chou-Fasman propensity scale

Page 26: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

CHOU- FASMAN ALGORITHMConformational parameter: Pα ,Pβ and Pt for each amino acid i

Pi,x = f i,x / < f x > = (n i,x / n i )/ (n x / N)

Nucleation sites and extension

Clusters of four helical formers out of six propagated by four residues

4

if < Pα > = ∑ Pα / 4 1.001

Clusters of three β-formers out of five propagated by four residues

4

if < Pβ > = ∑ Pβ / 4 1.001

Clusters of four turn residues

if Pt = f j ☓ f j+1 ☓ f j+2☓ f j+3 > 0.75 ☓ 10 –4

Specifics thresholds for < Pα > , < Pβ > and < Pt > and their relatives

values decide for the prediction

Page 27: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Chou-Fasman

• First widely used procedure• If propensity in a window of six residues (for a

helix) is above a certain threshold the helix is chosen as secondary structure.

• If propensity in a window of five residues (for a beta strand) is above a certain threshold then beta strand is chosen.

• The segment is extended until the average propensity in a 4 residue window falls below a value.

• Output-helix, strand or turn.

Page 28: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Transmembrane alpha-helices: Kyte-Doolittle scale

It is computed taking into consideration the octanol-waterpartition coefficient, combined with the propensity of theresidues to be found in known transmembrane helices

Ala: 1.800 Arg: -4.500

Asn: -3.500 Asp: -3.500

Cys: 2.500 Gln: -3.500

Glu: -3.500 Gly: -0.400

His: -3.200 Ile: 4.500

Leu: 3.800 Lys: -3.900

Met: 1.900 Phe: 2.800

Pro: -1.600 Ser: -0.800

Thr: -0.700 Trp: -0.900

Tyr: -1.300 Val: 4.200

Page 29: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Second generation methods: GOR

The structure of a residue in a protein strongly depends on the sequence context

It is possible to estimate the influence of a residue in determining the structure of a residue close along the sequence. Usually windows from -8/8 to -13/13 are considered.

Coefficients P(A,s,i) estimate the contribution of the residue A in determining the structure s for a residue that is i positions apart along the sequence

Page 30: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

GOR method

• Garnier, Osguthorpe & Robson

• Assumes amino acids up to 8 residues on each side influence the ss of the central residue.

• Frequency of amino acids at the central position in the window, and at -1, .... -8 and +1,....+8 is determined for a, b and turns (later other or coils) to give three 17 x 20 scoring matrices.

• Calculate the score that the central residue is one type of ss and not another.

• Correctly predicts ~64%.

Page 31: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Scoring matrix

( | )log , 8, ,8

( )

i i jij

ss

i

P ss aaS j

p ss

T R G Q L I R E A Y E D Y R H F S S E C P F I P

i-4 i-3 i-2 i-1 i i+1 i+2 i+3 i+4….

- 4 -3 -2 -1 0 1 2 3 4 …

A .. .. .. .. .. .. .. .. ..

B .. .. .. .. .. .. .. .. ..

Page 32: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

GOR : Information function

• Information function, I(Sj;Rj) :

( | )( ; ) log

( )

j j

j j

j

P S RI S R

p S

Information that sequence Rj contains about structure Sj

I = 0 : no information

I > 0 : Rj favors Sj

I < 0 : Rj dislikes Sj

= one of three secondary structure (H, E,C) at position jS j

= one of the 20 amino acids at position jR j

( | ) = conditional probability for observing having j j j jp S R S R

( ) = prior probability of having j jp S S

Page 33: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

GOR: Formulation(1)• Secondary structure should depend on the

whole sequence, R

• Simplification (1) : only local sequences (window size = 17) are considered

8 8( ; ) ( ; , , , , )j i j j jI S I S R R R R

Simplification (2) : each residue position is statistically independent

For independent event, just add up the information

8

8 8

8

( ; , , , , ) ( ; )i j j j j j m

m

I S R R R I S R

=

Page 34: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

I(Sj;R1,R2,…..Rlast) ≃ ∑ I(Sj;Rj+m)m = +8

m = – 8

Page 35: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =
Page 36: Prediction of structural and functional features in ... · CHOU- FASMAN ALGORITHM Conformational parameter: Pα , Pβ and P t for each amino acidi P i,x = f i,x / < f x > =

Second generation methods: GOR

Q3 = 60-65 % (Considering three secondary structures (H,E,C), and evaluating the overall accuracy on an uncorrelated set of sequences with known structure)

The contribution of each position in the window is independent of the other ones. No correlation among the positions in the window is taken in to account.