secondary structure prediction protein analysis workshop 2006 bioinformatics group institute of...
TRANSCRIPT
![Page 1: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/1.jpg)
Secondary Structure Prediction
Protein Analysis Workshop 2006
Bioinformatics groupInstitute of BiotechnologyUniversity of helsinki
Alain Schenkel
Chris Wilton
![Page 2: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/2.jpg)
Overview
Review of protein structure. Introduction to structure prediction:
• Different approaches.• Prediction of 1D strings of structural elements.
Server/soft review:• COILS, MPEx, …• The PredictProtein metaserver.
![Page 3: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/3.jpg)
ProteinsProteins
Proteins play a crucial role in virtually all biological processes with a broad range of functions.
The activity of an enzyme or the function of a protein is governed by the three-dimensional structure.
H11_MOUSEhistocompatibility antigen
VE2_BPV1Bovine DNA-binding domain
![Page 4: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/4.jpg)
20 amino acids - the building blocks20 amino acids - the building blocks
Clickable map at: http://www.russell.embl-heidelberg.de/aas/
![Page 5: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/5.jpg)
The Amino Acids - hydrophobicThe Amino Acids - hydrophobic
![Page 6: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/6.jpg)
The Amino Acids - polarThe Amino Acids - polar
![Page 7: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/7.jpg)
The Amino Acids - chargedThe Amino Acids - charged
![Page 8: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/8.jpg)
Secondary StructureSecondary Structure: -helix-helix
Very seldom: 310, 516 (Pi-helix)
Alpha-helix: 413
![Page 9: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/9.jpg)
3.6 residues per turn
Axial dipole moment
Hydrogen-bonded
Protein surfaces Typically, no Proline nor
Glycine (“helix-breaker”)
Secondary StructureSecondary Structure: -helix-helix
![Page 10: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/10.jpg)
Secondary StructureSecondary Structure: -sheets-sheets
![Page 11: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/11.jpg)
Secondary StructureSecondary Structure: -sheets-sheets
Parallel or antiparallel
Alternating side-chains
Connecting loops often have polar amino acids
![Page 12: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/12.jpg)
Secondary StructureSecondary Structure: -sheets-sheets
![Page 13: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/13.jpg)
Terminology
Primary structure: The sequence of amino acid residues
FTPAVHAFLDKFLAS …
![Page 14: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/14.jpg)
Secondary structure:• A first level of structural organization.• Provides rigidity.• The structural form adopted by each amino-
acid residue: H: helix ( alpha ) E: extended ( beta strand ) T: turn ( often Proline ) C: coil ( random, unstructured )
TerminologyTerminology
![Page 15: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/15.jpg)
• Stretches of residues in H conformation are helical SSEs.
• Stretches of residues in E conformation are beta-strand SSEs.
• Stretches of residues in C conformation are loops or coil.
• Turns (T) are isolated residues, usually Proline or Glycine.
• Other notation (in 3 states): L for all but H,E.
TerminologyTerminology
Secondary structure elements (SSE):
![Page 16: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/16.jpg)
Example:one helix, one beta strand, three loops
Primary: MSEGEDDFPRKRTPWCFDDEHMC
Secondary: CCHHHHHHCCCCEEEEEECCCCC
Secondary Structure ElementsSecondary Structure Elements
![Page 17: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/17.jpg)
• The full 3D structure of a single polypeptide chain.
• Secondary structure elements pack together to form a structural core.
• Called a protein “fold”.
TerminologyTerminology
Tertiary structure:
![Page 18: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/18.jpg)
• How several fully folded protein chains pack together to form a fully functional protein.
• Example: 1jch (ribosome inhibitor).
TerminologyTerminology
Quaternary structure:
PDB identifierThe Protein Data Bank is the principal repository for solved structures.
![Page 19: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/19.jpg)
Example: 1jch has 4 chains
The elongated 2-helix structures in the center are called coiled-coils.
![Page 20: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/20.jpg)
Structural classification of folds
For example (CATH): alpha beta alpha+beta alpha/beta irregular
More on structural classification next week.
![Page 21: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/21.jpg)
Globular proteins:• in aqueous environment,• compact fold,
• hydrophobic core and polar surfaces. Membrane proteins:
• attached to or across the cell membrane,
• hydrophobic surface within membrane. Fibrous proteins:
• structural role,
• repeat of regular/atypical SSE or irregular structure.
Biochemical classification of foldsBiochemical classification of folds
![Page 22: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/22.jpg)
Fibrous
Globular(2 domains)
Transmembrane
![Page 23: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/23.jpg)
INTRODUCTION TO
STRUCTURE PREDICTION
![Page 24: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/24.jpg)
A pre-requisite for understanding function• processes of molecular recognition,• eg DNA recognition by 2bop.
Catalytic mechanisms of enzymes• often require key residues to be close together in 3D
space.
Structure is often preserved under evolution when sequence is not.
Drug design.
Why is 3D Structure Important?Why is 3D Structure Important?
![Page 25: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/25.jpg)
Structure PredictionStructure Prediction
GPSRYIVDL… ?
![Page 26: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/26.jpg)
Approaches to structure prediction
Ab initio: from physical principles only. De novo: knowledge-based potentials from PDB. Fold recognition: thread sequence through known
structures for compatibility.
Homology modeling: use sequence alignment to infer
possible template structure.
More on homology modeling next week.
![Page 27: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/27.jpg)
Prediction in One-Dimension
Simplification: project 3D structure onto stringsof structural assignments. Eg:
• coiled-coils• membrane helices• solvent accessibility: residue is buried or exposed
…eeebbbbeebbbbee…
• secondary structure elements: …HHHLLLEEEEEELLEEE…
If accurate: can be used to improve predictionsof 3D structures (eg, in fold recognition).
![Page 28: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/28.jpg)
http://speedy.embl-heidelberg.de/gtsp/flowchart2.html
A Flow Chart for Structure PredictionA Flow Chart for Structure Prediction
![Page 29: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/29.jpg)
Structure PredictionStructure Prediction
• Many degrees of freedom: atoms of all residues and solvent.
• Problem increases exponentially per residue.
• Remote noncovalent interactions complicate matters.
• A delicate problem of stability.
• Cannot exhaustively search all possible conformations.
A folding protein does not try all conformations !! (Levinthal paradox)
Why is structure prediction, and in particular ab initio prediction, a difficult problem?
![Page 30: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/30.jpg)
Hydrophobic residues predominantly within a central structural core. Tight packing (crystal-like).
Hydrophilic residues predominantly on the protein surface, exposed to solvent.
Basic Principle of Folding Basic Principle of Folding (globular protein)(globular protein)
Pack hydrophobic side chains into the interiorof the molecule, away from solvent. So,
Core residues tend to be in SSEs. Loops are on the outside of the protein.
But main chain is highly polar. This forces the formation of SSEs in the core. So,
![Page 31: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/31.jpg)
Rate of evolution of genomic DNA sequence reflects degree of functional constraint.
Protein coding regions evolve much more slowly than non-coding regions:• need to maintain stable 3D protein structure,• need to maintain vital biological function.
Protein Structure and EvolutionProtein Structure and Evolution
![Page 32: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/32.jpg)
Sequences of highly constrained structures evolve very slowly (eg: histones).
Less constrained ones evolve more quickly (eg: immunoglobulins).
In general: response to mutation is structural change, but many mutations will not (or only slightly) change the structure
=>
Structure is better conserved than sequence.
Rates of Protein Sequence EvolutionRates of Protein Sequence Evolution
![Page 33: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/33.jpg)
Residues in the hydrophobic core (SSEs) are constrained by the need for tight packing:• changes rarely accepted - evolution is slow.
Residues on the surface (loops) are less constrained (simply need to be hydrophilic):• aa substitution less restricted – evolution is quicker.
Evolution of SSEs and LoopsEvolution of SSEs and Loops
![Page 34: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/34.jpg)
Residues with key functional roles will be conserved. • Eg: active site residues involved in catalysis.
• BUT: gene duplication can lead to change of function without changing structure.
Residues with key structural role also tend to be conserved. Eg:
• GLY: high conformational flexibility => tight turns,…
• PRO: side-chain bounds back to backbone => tight turns.
• CYS: disulfide bridges.
Evolution of Key ResiduesEvolution of Key Residues
![Page 35: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/35.jpg)
Multiple sequence / structure alignments measure differences in evolutionary rates of residues, and thus
Structure Prediction by HomologyStructure Prediction by Homology
Contain more information than a single sequence for applications such as homology modeling and secondary structure prediction,
Give location of conserved regions and motifs, residues buried in the protein core or exposed to solvent, plus important secondary structures.
More on homology modeling next week.
![Page 36: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/36.jpg)
Secondary Structure Prediction
Single residue statistical analysis:• For each amino acid type, assign its
‘propensity’ to be in a helix, sheet, or coil.• Limited accuracy: ~55-60% on average.• Eg: Chou-Fasman (1974), not used any more.
Three generations:
![Page 37: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/37.jpg)
Segment-based statistics:• Look for correlations (within 11-21 aa windows).• Many algorithms have been tried.• Most performant: Neural Networks:
• Input: a number of protein sequences with their known secondary structure.
• Output: a trained network that predicts secondary structure elements for given query sequences.
• Accuracy < 70%. • Eg: GORII, COMBINE.
Secondary Structure PredictionSecondary Structure Prediction
![Page 38: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/38.jpg)
Neural Networks
(picture from B.Rost, 1999)
trained networkquery
3 states outputprediction for
this residue
prediction
![Page 39: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/39.jpg)
Using information from evolution:• Compute a sequence profile from a multiple
sequence alignment.• Use profile instead of query as input to Neural
Network.• 6-8 % points increase in accuracy over Neural
Network only.• Eg:
• PHD/PROF: alignments by MaxHom (B. Rost, 1996/2000)• PSI-PRED: alignments from Psi-Blast (D.T. Jones, 1999)
• Accuracy: 72% ± 11%.
Secondary Structure PredictionSecondary Structure Prediction
Accuracy measured as Q3=# of correctly predicted 2ndary str. states
total # of residues
![Page 40: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/40.jpg)
Accuracy Illustration
In particular, accuracy can be as low as 50% for a given query =>Use many different methods and compare answers.
Psi-Pred benchmark on set of 187 chains.(D.T. Jones, 1999)
Your query could be here !!
![Page 41: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/41.jpg)
Other Structural Features
coiled-coils, membrane helices, solvent accessibility, globularity, disulfide bridges, confomational switches, …
There are other structural features that one can try to predict:
![Page 42: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/42.jpg)
POPULAR SERVERS
FOR DEALING WITH
SECONDARY STRUCTURES
• Coiled-coils• Transmembrane helices• Secondary structure • Metaservers
![Page 43: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/43.jpg)
Prediction of coiled-coilsPrediction of coiled-coils
Coiled-coils are generally solvent exposed multi-stranded helix structures:
Helix periodicity and solvent exposure imposespecial pattern of heptad repeat:
… abcdefg … hydrophobic residues hydrophilic residues
two-stranded
(From Wikipedia Leucine zipper article)
Helical diagram of2 interacting helices:
![Page 44: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/44.jpg)
Compares a sequence to a database of known, parallel two-stranded coiled-coils, and derives a similarity score.
By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation.
Options:• scoring matrices,• window size (score may vary),• weighting options.
The COILS server at EMBnetThe COILS server at EMBnet
![Page 45: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/45.jpg)
The program works well for parallel two-stranded structures that are solvent-exposed but runs progressively into problems with the addition of more helices, their antiparallel orientation and their decreasing length.
The program fails entirely on buried structures.
COILS LimitationsCOILS Limitations
![Page 46: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/46.jpg)
COILS DemoCOILS Demo
Let us submit the sequence
to the COILS server at EMBnet:
http://www.ch.embnet.org/software/COILS_form.html
>1jch_AVAAPVAFGFPALSTPGAGGLAVSISAGALSAAIADIMAALKGPFKFGLWGVALYGVLPSQIAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVPMSVPVVDAKPTERPGVFTASIPGAPVLNISVNNSTPAVQTLSPGVTNNTDKDVRPAFGTQGGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNYERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPMAGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSSAMESRKKKEDKKRSAENNLNDEKNKPRKGFKDYGHDYHPAPKTENIKGLGDLKPGIPKTPKQNGGGKRKRWTGDKGRKIYEWDSQHGELEGYRASDGQHLGSFDPKTGNQLKGPDPKRNIKKYL
![Page 47: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/47.jpg)
mtidk matrix, no weights, all window lengths
![Page 48: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/48.jpg)
• Frame probabilities at each residue.
• Columns: window size of 14, 21, 28 aa.
high probability heptads
![Page 49: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/49.jpg)
Transmembrane regions: Usually contain residues with hydrophobic side
chains (surface must be hydrophobic). Usually ~20 residues long, can be up to 30 if
not perpendicular through membrane.
Methods: Hydropathy plots (historical, better methods now available)
Threading (TMpred, MEMSAT), Hidden Markov Model (TMHMM), Neural Network (PHDhtm).
Transmembrane Region PredictionTransmembrane Region Prediction
![Page 50: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/50.jpg)
Hydropathy Plots (Kyte-Doolittle) compute an average hydropathy value for each
position in the query sequence, window length of 19 usually chosen for
membrane-spanning region prediction.
•Peaks between scales 1-2?
![Page 51: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/51.jpg)
>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK
Hydropathy Plot ServersHydropathy Plot Servers
Let us submit the sequence
to
Membrane Explorer (also as standalone MPEx), Grease (http://fasta.bioch.virginia.edu/fasta/grease.htm)
http://blanco.biomol.uci.edu/mpex/ (Membrane Explorer)
![Page 52: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/52.jpg)
![Page 53: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/53.jpg)
![Page 54: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/54.jpg)
Scans a candidate sequence for matches to a sequence scoring matrix, obtained by aligning the sequences of all transmembrane alpha-helical regions that are known from structures.
These sequences are collected in a database called TMBase.
TM PredTM Pred
Method summary:
Remark: Authors do not suggest this method for genomic sequences. Automatic methods recommended, eg, TMHMM, PHDhtm.
![Page 55: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/55.jpg)
TM Pred ServerTM Pred Server
>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK
Let us submit RCEM_RHOVI again
to the TMPred server at EMBnet:
http://www.ch.embnet.org/software/TMPRED_form.html
![Page 56: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/56.jpg)
![Page 57: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/57.jpg)
![Page 58: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/58.jpg)
![Page 59: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/59.jpg)
Annotation for RCEM_RHOVI Uniprot entry for RCEM_RHOVI:
• Chain M of photosynthetic reaction center.• Integral membrane protein.
Can we see the predicted helices in the structure?
Let´s try at SCOP.
![Page 60: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/60.jpg)
The Psi-Pred Server
Let´s submit
to http://bioinf.cs.ucl.ac.uk/psipred/
>uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGCNVTRKPTVFTRVSAYISWINNVIASN
• Secondary structure prediction (PSIPRED)
• Transmembrane topology prediction (MEMSAT)
• Fold recognition (GenTHREADER)
![Page 61: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/61.jpg)
![Page 62: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/62.jpg)
(see later for comparison with solved structure)
PSIPRED PREDICTION RESULTS
Key
Conf: Confidence (0=low, 9=high)Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence
# PSIPRED HFORMAT (PSIPRED V2.5 by David Jones)
Conf: 978999999997404555676678816988988788877499999934884158982897Pred: CHHHHHHHHHHHHHCCCCCCCCCCCCEECCEECCCCCCCCEEEEEEECCCCCEEEEEEEE AA: MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTL 10 20 30 40 50 60
Conf: 138734320122478742368754345663179827995679998026888865344411Pred: CCCCEEEEECCCCCCCCCEEEEEEEEEEEECCCCCEEEEEEEEEEECCCCCCCCCCCCCH AA: IRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDI 70 80 90 100 110 120
Conf: 010005863201367530113433210010268995234110254467622168863110Pred: HHEECCCCCCEEEEEEEECCCCCCCCCCCCEEEEEEECCCCCCCCCCCCCCEEEEEEEEE AA: ALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVD 130 140 150 160 170 180
Conf: 024554202566567752773344343221110467438998993899999972376889Pred: CHHHHHHHCCCCCCCCCEEEECCCCCCCCCEEECCCCEEEEECCEEEEEEEEEECCCCCC AA: YAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGC 190 200 210 220 230 240
Conf: 88988779999687678899886049Pred: CCCCCCEEEEEHHHHHHHHHHHHHCC AA: NVTRKPTVFTRVSAYISWINNVIASN
250 260
![Page 63: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/63.jpg)
allows you to obtain predictions from different parallel methods under one browser window, eg:• PredictProtein: http://predictprotein.org
or makes predictions based on several methods (consensus), eg:• 3D-Jury: http://bioinfo.pl/meta• GeneSilico: http://www.genesilico.pl/meta
Meta-ServersMeta-Servers
A server which
![Page 64: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/64.jpg)
Sequence motif search:• ProSite, ProDom, SEG.
One-Dim structure prediction:• secondary structure,• transmembrane helices, • solvent accessibility,• globularity,• disulfide bridge,• conformational switch.
Links to a multitude of other servers (numerous links also from 3D-Jury).
The PredictProtein meta-server
![Page 65: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/65.jpg)
SEG: finds low complexity regions. ProSite: database of functional motifs, ie,
biologically relevant short patterns. ProDom: a comprehensive set of protein domain
families automatically generated from the SWISS-PROT and TrEMBL sequence databases.
Motif Search at PPMotif Search at PP
More on domains and protein family classification next week (ADDA, Pfam etc.).
ProSite: http://au.expasy.org/prosite/
ProDom: http://protein.toulouse.inra.fr/prodom/current/html/home.php
![Page 66: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/66.jpg)
Use information from evolution:• Sequence database is scanned for similar sequences
(Blast, Psi-Blast).• Multiple sequence alignment profiles are generated
by weighted dynamic programming (MaxHom).
The PROF (improved PHD) series:• PROFsec (PHDsec): secondary structure,• PROFacc (PHDacc): solvent accessibility,• PHDhtm: transmembrane helices.
One-Dim predictions at PP
![Page 67: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/67.jpg)
Meta-PP
Secondary structure prediction:• Psi-Pred, SAM-T02, Jpred, …
Membrane helices prediction:• TMHMM, …
Tertiary structure prediction:• Homology: Swiss-Model, 3D-Jigsaw, …• Threading: Superfamily, AGAPE, …• Inter-residue contact prediction: CMAPpro, …
PredictProtein allows to automatically submit a query to other servers:
![Page 68: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/68.jpg)
PredictProtein Demo
Let´s submit again
to http://predictprotein.org/
>uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGCNVTRKPTVFTRVSAYISWINNVIASN
For a list of mirror sites: http://predictprotein.org/newwebsite/doc/mirrors.html
![Page 69: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/69.jpg)
![Page 70: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/70.jpg)
![Page 71: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/71.jpg)
![Page 72: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/72.jpg)
Let´s explore the results here.
![Page 73: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/73.jpg)
![Page 74: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/74.jpg)
![Page 75: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/75.jpg)
Comparison with solved structure
DSSP: ??????????????????????????CBTCEECCTTTCTTEEEEEEEETTEEEEEEEEEEEETTEEEECSGGGCSCCCEEPSIP: .HHHHHHHHHHHHH............EE..EE........EEEEEEE.....EEEEEEEE....EEEEE.........EEPROF: ..HHHHHHHHHHH............EEEE.EE.......EEEEEEEE......EEEEEEEE...EEEEEEEEE.....EE
DSSP: EEESCSBTTSCCSCCEEEEEEEEEECTTCCTTCGGGCCCCEEEEESSCCCCBTTBCCCCCCCTTCCCCTTCCEEEEESCBPSIP: EEEEEEEEEE.....EEEEEEEEEEE.............HHHEE......EEEEEEEE............EEEEEEE...PROF EEEEEEE........EEEEEEEEEEE.............EEEEEE........EEEEEE............EEEEEEEE.
DSSP: SSTTCCBCSBCEEEECCEECHHHHTSTTTTGGGSCTTEEEECCSSSSBCCTTCTTCEEEEEETTEEEEEEEEEECBTTBSPSIP: ...........EEEEEEEEE.HHHHHHH.........EEEE.........EEE....EEEEE..EEEEEEEEEE......PROF: ..........EEEEEEEEE..................EEEE...............EEEEEE...EEEEEEEE.......
DSSP: SBTTBCEEEEEGGGSHHHHHHHHHTCPSIP: ......EEEEEHHHHHHHHHHHHH..PROF: .......EEEEHHHHHHHHHHHH...
ELA1_PIG Elastase-1 has a solved structure: 1EST
DSSP: secondary structure assignment from PDB (Kabsch-Sander, 1983) • H = alpha helix• B = residue in isolated beta-bridge• E = extended strand, participates in beta ladder• G = 3-helix (3/10 helix)• I = 5 helix (pi helix)• T = hydrogen bonded turn
• S = bend
![Page 76: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/76.jpg)
Conclusions
Both predictions agree quite well and are quite accurate.
But: it may not be as good next time.
=> Compare predictions from different methods to
check whether there is a consensus. Use servers that automatically combine different
methods (3D-Jury, ...).
![Page 77: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/77.jpg)
Benchmarks
LiveBench http://bioinfo.pl/meta/livebench.pl
CASP (critical assessment of structure prediction) http://predictioncenter.gc.ucdavis.edu/
CAFASP (ca of fully automated structure prediction) http://www.cs.bgu.ac.il/~dfisher/CAFASP5/index.html
![Page 78: Secondary Structure Prediction Protein Analysis Workshop 2006 Bioinformatics group Institute of Biotechnology University of helsinki Alain Schenkel Chris](https://reader038.vdocuments.net/reader038/viewer/2022110403/56649e7d5503460f94b80c8b/html5/thumbnails/78.jpg)
Documentation:• COILS: http://www.ch.embnet.org/software/coils/COILS_doc.html • TMPred: http://www.ch.embnet.org/software/tmbase/TMBASE_doc.html • MPEx: http://blanco.biomol.uci.edu/mpex/MPEXdoc.html
Articles: B. Rost: Evolution teaches neural networks. In Scientific applications of neural nets. Ed.
J.W.Clark, T.Lindenau, M.L. Ristig, 207-223 (1999).
D.T Jones: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J.Mol.Biol. 292, 195-202 (1999).
B. Rost: Prediction in 1D: Secondary Structure, Membrane Helices, and Accessibility. In Structural Bioinformatics (reference below).
Books: P.E. Bourne, H. Weissig: Structural Bioinformatics. Wiley-Liss, 2003.
A. Tramontano: Protein Structure Prediction. Wiley-VCH, 2006.
References