protein secondary structure prediction based on position-specific scoring matrices yan liu sep 29,...

14
Protein Secondary Structure Prediction Based on Position- specific Scoring Matrices Yan Liu Sep 29, 2003

Upload: holly-farmer

Post on 16-Jan-2016

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices

Yan Liu

Sep 29, 2003

Page 2: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Protein Secondary Structure Dictionary of Secondary Structure

Prediction (DSSP) based on hydrogen bonding patterns and

geometrical constraints 7 DSSP labels for PSS:

Helix types: H(alpha-helix) G (3/10 helix)

Sheet types: B(extended strand, participates in beta ladder)

E (isolated beta-bridge strand) Coil types: T _ S (Coil)

Page 3: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Protein Secondary Structure Prediction

Given a protein sequence: APAFSVSPASGA

Predict its secondary structure sequence: CCEEEEECCCC

Application Provide constraints for tertiary structure

predictions or as part of fold recognition

Page 4: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Related Work Standard SS prediction methods: PHD

(Rost & Sander 1993) Multiple sequence profiles

Based on the observations that conserved regions are functional important, and (or) buried in the protein core

Benner & Gerloff demonstrated that the degree of solvent accessibility can be predicted with reasonable accuracy

Two-layered feed-forward Neural networks

Page 5: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

PSIPRED: Generation of a sequence profile

Position-specific score matrices Prediction of initial secondary

structure Standard feed-forward back-

propagation networks Filtering the predicted structures

Page 6: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Position-specific scoring matrices (PSSM) -1 PSSM (Altschul et al., 1997), or profiles

Given a protein sequence with length N, together with its multiple sequence alignment

Construct a Nx20 matrix Score definition

Different methods for estimating Qi Alpha = Nc-1, beta = 10

Fi: weighted observed frequencies

Other estimation:

i

ii P

QS log

 

iii

gfQ

j

ijj

ji q

P

fg

1, ijiS

jiij qePPq ij

Page 7: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Position-specific scoring matrices (PSSM) -2 Advantage

A more sensitive scoring system Improved estimation of the probabilities of which amino

acids occur at pattern position Relatively precise definition of the boundaries of

important motifs

Disadvantage Too sensitive to biases in the sequence data

banks Prone to erroneously incorporating repetitive

sequences into the profiles

Page 8: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

PSSM in PSIPRED Input to neural networks:

The PSSM from PSI-BLAST after three iterations

Set to window size to 15 Scaled to the 0-1 range by standard

logistic function

Page 9: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Neural network architecture-1 Two stage neural networks

1st stage: Sequence to structure mapping 315 inputs: 21 * 15 75 hidden units: 3 * 15

2nd stage: Structure to structure mapping 60 inputs: 4 * 15 60 hidden variable: 4 * 15 (extra input to indicate the

window spans a chain terminus)

Page 10: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Neural network architecture-2 Training parameters

Momentum term: 0.9 Learning rate: 0.005 Prevent overfitting: leave 10% of the

training set for validation

Page 11: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Experimental results Training and testing data

Collected to remove structural similarity Apply CATH to detect homologous protein

sequences

A total of 187 protein sequences: 62, 62, 63

Three-way cross-validation

Page 12: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Experimental results Per-chain results

Distribution of Q3

and SOV (left) Avg Q3: 76.0% Avg SOV: 73.5%

Per-residue results Q3: 76.5%

Page 13: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Experimental results Rank top 1 in CASP –3

Avg Q3: 73.4% (69.0% by top 2, 66.7% by PHD) Avg SOV: 71.9% (65.7% by top 2, 63.8% by PHD)

Also rank top 1 in CASP –4 (Dec, 2000)

Page 14: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003

Conclusion PSIPRED is by far the best method

for secondary structure prediction The difference between PHD and

PSIPRED: Position-specific scoring matrices Training data