protein secondary structure prediction based on position-specific scoring matrices yan liu sep 29,...

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices

Yan Liu

Sep 29, 2003

Protein Secondary Structure Dictionary of Secondary Structure

Prediction (DSSP) based on hydrogen bonding patterns and

geometrical constraints 7 DSSP labels for PSS:

Helix types: H(alpha-helix) G (3/10 helix)

Sheet types: B(extended strand, participates in beta ladder)

E (isolated beta-bridge strand) Coil types: T _ S (Coil)

Protein Secondary Structure Prediction

Given a protein sequence: APAFSVSPASGA

Predict its secondary structure sequence: CCEEEEECCCC

Application Provide constraints for tertiary structure

predictions or as part of fold recognition

Related Work Standard SS prediction methods: PHD

(Rost & Sander 1993) Multiple sequence profiles

Based on the observations that conserved regions are functional important, and (or) buried in the protein core

Benner & Gerloff demonstrated that the degree of solvent accessibility can be predicted with reasonable accuracy

Two-layered feed-forward Neural networks

PSIPRED: Generation of a sequence profile

Position-specific score matrices Prediction of initial secondary

structure Standard feed-forward back-

propagation networks Filtering the predicted structures

Position-specific scoring matrices (PSSM) -1 PSSM (Altschul et al., 1997), or profiles

Given a protein sequence with length N, together with its multiple sequence alignment

Construct a Nx20 matrix Score definition

Different methods for estimating Qi Alpha = Nc-1, beta = 10

Fi: weighted observed frequencies

Other estimation:

QS log

1, ijiS

jiij qePPq ij

Position-specific scoring matrices (PSSM) -2 Advantage

A more sensitive scoring system Improved estimation of the probabilities of which amino

acids occur at pattern position Relatively precise definition of the boundaries of

important motifs

Disadvantage Too sensitive to biases in the sequence data

banks Prone to erroneously incorporating repetitive

sequences into the profiles

PSSM in PSIPRED Input to neural networks:

The PSSM from PSI-BLAST after three iterations

Set to window size to 15 Scaled to the 0-1 range by standard

logistic function

Neural network architecture-1 Two stage neural networks

1st stage: Sequence to structure mapping 315 inputs: 21 * 15 75 hidden units: 3 * 15

2nd stage: Structure to structure mapping 60 inputs: 4 * 15 60 hidden variable: 4 * 15 (extra input to indicate the

window spans a chain terminus)

Neural network architecture-2 Training parameters

Momentum term: 0.9 Learning rate: 0.005 Prevent overfitting: leave 10% of the

training set for validation

Experimental results Training and testing data

Collected to remove structural similarity Apply CATH to detect homologous protein

sequences

A total of 187 protein sequences: 62, 62, 63

Three-way cross-validation

Experimental results Per-chain results

Distribution of Q3

and SOV (left) Avg Q3: 76.0% Avg SOV: 73.5%

Per-residue results Q3: 76.5%

Experimental results Rank top 1 in CASP –3

Avg Q3: 73.4% (69.0% by top 2, 66.7% by PHD) Avg SOV: 71.9% (65.7% by top 2, 63.8% by PHD)

Also rank top 1 in CASP –4 (Dec, 2000)

Conclusion PSIPRED is by far the best method

for secondary structure prediction The difference between PHD and

PSIPRED: Position-specific scoring matrices Training data

protein secondary structure prediction based on position-specific scoring matrices yan liu sep 29,...

secondary structure

structure mapping315

structure mapping60

sequence data banksprone

tertiary structure predictions

pattern position

stage neural

protein corebenner gerloff

Documents

pairwise sequence alignment and scoring matrices xiaole...

scoring matrices identity pam blosum. scoring matrices types...

fa05cse 182 cse182-l5: position specific scoring matrices...

alignment and algorithms scoring matrices: models of...

cse182-l4: scoring matrices, dictionary matching

scoring the alignment of amino acid sequences constructing...

bioinformatica t3-scoring matrices

position-specific scoring matrices (pssm) - github...

scoring matrices - bioinformatics & biological computing...

scoring matrices scoring matrices · 2017-06-28 · the...

scoring of - goethe-universität · scoring of alignments....

scoring matrices - rutgers...

sequence-based data mining - cornell university...scoring...

scoring matrices for sequence alignment

blosum scoring matrices -...

scoring matrices

bioinformatics scoring matrices - brunel university...

bioinformatica t3-scoring matrices-wim_vancriekinge_v2013

u.osu.edu · web viewnext was concept screening and...

lecture 3: scoring matrices and multiple sequence alignment ...