chapter 15
DESCRIPTION
Chapter 15. Structure Prediction: Threading. Motivation. Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want accuracy You could use nucleotide alignment, but what do you do with the gapped regions? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/1.jpg)
Chapter 15Chapter 15
Structure Prediction: ThreadingStructure Prediction: Threading
![Page 2: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/2.jpg)
MotivationMotivation
Given a protein, can you predict molecular structure
Want to avoid repeated x-ray crystallography, but want accuracy
You could use nucleotide alignment, but what do you do with the gapped regions?
More complex methods are only justified if they can be shown to perform better than simpler methods
Simpler methods are only justified if they can perform better than basic sequence alignment
Given a protein, can you predict molecular structure
Want to avoid repeated x-ray crystallography, but want accuracy
You could use nucleotide alignment, but what do you do with the gapped regions?
More complex methods are only justified if they can be shown to perform better than simpler methods
Simpler methods are only justified if they can perform better than basic sequence alignment
![Page 3: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/3.jpg)
First StepFirst Step
Some structure comparison methods use secondary structures of the new sequence
Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial
Supervised learning has been shown to perform well in this task
Some structure comparison methods use secondary structures of the new sequence
Predict location of secondary structure elements along the protein’s backbone and the degree of residue burial
Supervised learning has been shown to perform well in this task
![Page 4: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/4.jpg)
Artificial Neural NetworkArtificial Neural Network
PredictsStructure at this point
PredictsStructure at this point
![Page 5: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/5.jpg)
DangerDanger
You may train the network on your training set, but it may not generalize to other data
Perhaps we should train several ANNs and then let them vote on the structure
You may train the network on your training set, but it may not generalize to other data
Perhaps we should train several ANNs and then let them vote on the structure
![Page 6: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/6.jpg)
Profile network from HeiDelbergProfile network from HeiDelberg family (alignment is used as input) instead of just the
new sequence On the first level, a window of length 13 around the
residue is used The window slides down the sequence, making a
prediction for each residue The input includes the frequency of amino acids
occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment)
The second level takes these predictions from neural networks that are centered on neighboring proteins
The third level does a jury selection
family (alignment is used as input) instead of just the new sequence
On the first level, a window of length 13 around the residue is used
The window slides down the sequence, making a prediction for each residue
The input includes the frequency of amino acids occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment)
The second level takes these predictions from neural networks that are centered on neighboring proteins
The third level does a jury selection
![Page 7: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/7.jpg)
PHDPHD
Predicts 4Predicts 4
Predicts 6Predicts 6
Predicts 5Predicts 5
![Page 8: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/8.jpg)
ThreadingThreading
Threading matches structure to sequence
True threading considers 3D spatial interactions
Threading matches structure to sequence
True threading considers 3D spatial interactions
![Page 9: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/9.jpg)
3D-1D Matching (Bowie et al.)3D-1D Matching (Bowie et al.)
Convert 3D structure into a stringInclude -helix, -sheet or neitherInclude buried or solvent accessible (6
levels) Total of 3X6=18 distinct statesWith Pa:j= probability of finding amino
acid (a) in environment (j) and Pa=probability of finding (a) anywhere
Convert 3D structure into a stringInclude -helix, -sheet or neitherInclude buried or solvent accessible (6
levels) Total of 3X6=18 distinct statesWith Pa:j= probability of finding amino
acid (a) in environment (j) and Pa=probability of finding (a) anywhere
€
saj = logPa: j
Pa
⎛
⎝ ⎜
⎞
⎠ ⎟
![Page 10: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/10.jpg)
3D-1D3D-1D
Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column
When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins
Calculate the information values score on a training set of multiple alignments and the score was used as a profile for each column
When applied to the globin family an clearly identified myoglobins from nonglobins but not from other globins
![Page 11: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/11.jpg)
Methods using 3D interactionsMethods using 3D interactions
Residues that have large separation in the sequence may end up next to each other when the protein is folded.
Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB
Use measure in alignment to evaluate cost, or to select the best alignment
Residues that have large separation in the sequence may end up next to each other when the protein is folded.
Define a measure of contact between residues (two atoms within 5Å) and count frequency of contact between all pairs in PDB
Use measure in alignment to evaluate cost, or to select the best alignment
![Page 12: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/12.jpg)
3D interactions3D interactions
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 13: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/13.jpg)
Potentials of mean force (POMF)Potentials of mean force (POMF)
Since the notion of contact is somewhat arbitrary, a more general formulation can be tried
Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart.
Since the notion of contact is somewhat arbitrary, a more general formulation can be tried
Derive an empirical function for the propensity of each of the 400 pairs of residues to be any given distance apart.
![Page 14: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/14.jpg)
Multiple Sequence ThreadingMultiple Sequence Threading
Multiple Sequence Alignment Align the most similar to create a consensus
sequence Align consensus sequences to create overall
alignmentUse the same strategy with structuresAssume that conserved hydrophobic
positions should pack in the coreThis appears to be work in progress (1997)
Multiple Sequence Alignment Align the most similar to create a consensus
sequence Align consensus sequences to create overall
alignmentUse the same strategy with structuresAssume that conserved hydrophobic
positions should pack in the coreThis appears to be work in progress (1997)
![Page 15: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/15.jpg)
ExampleExample
Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. The POMF would have a
peak around 5AAspartate (D) and valine
since do not often pack together The POMF will have a dip
around 5A
Two small hydrophobic residues alanine (A) and valine (V), both of which favor packing in the core of the protein. The POMF would have a
peak around 5AAspartate (D) and valine
since do not often pack together The POMF will have a dip
around 5A
POMF(A,V)
POMF(D,V)
Pro
bab
ility
Pro
bab
ility
Distance
Distance
5A
5A
![Page 16: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/16.jpg)
Sequence-Structure AlignmentSequence-Structure Alignment
For all know structures Align the unknown sequence to that
structure Find the best alignment Return the structure with the best global
alignmentUnfortunately, we cant use dynamic
programming (NP Complete) Heuristics must be used to explore the
space.
For all know structures Align the unknown sequence to that
structure Find the best alignment Return the structure with the best global
alignmentUnfortunately, we cant use dynamic
programming (NP Complete) Heuristics must be used to explore the
space.
![Page 17: Chapter 15](https://reader035.vdocuments.net/reader035/viewer/2022070416/568150cc550346895dbeef55/html5/thumbnails/17.jpg)
Evaluating MethodsEvaluating Methods Is the complexity worth it? This is difficult without a benchmark Few comparative studies have been performed
When they have been performed, authors of competing methods have complained that wrong parameters were used …
Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. All methods submit their predictions Predictions are analyzed based on fold recognition,
modeling accuracy and alignment accuracy. No one method or approach is obviously superior
Is the complexity worth it? This is difficult without a benchmark Few comparative studies have been performed
When they have been performed, authors of competing methods have complained that wrong parameters were used …
Critical Assessment of Structure Prediction (CASP 1994) releases protein structures prior to publication. All methods submit their predictions Predictions are analyzed based on fold recognition,
modeling accuracy and alignment accuracy. No one method or approach is obviously superior