1 protein structure similarity. 2 secondary structure elements: helices strands/sheets &...
TRANSCRIPT
![Page 1: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/1.jpg)
1
Protein Structure Similarity
![Page 2: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/2.jpg)
2
Secondary Structure Elements:
helicesstrands/sheets & loops
![Page 3: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/3.jpg)
3
Structure Prediction/Determination
Computational tools• Homology, threading• Molecular dynamics
Experimental tools
NMR spectrometryX-ray crystallography
![Page 4: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/4.jpg)
4
X-ray diffraction crystallography
Protein Structure Determination (1)
![Page 5: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/5.jpg)
5
Protein Structure Determination (2)
Nuclear magnetic resonance spectroscopy
![Page 6: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/6.jpg)
6
Protein Data Bank
1990 250 new structures1999 2500 new structures2000 >20,000 structures total2004 ~30,000 structures total
![Page 7: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/7.jpg)
7
Protein Data Bank
1990 250 new structures1999 2500 new structures2000 >20,000 structures total2004 ~30,000 structures total
Only about 10% of structures have been determined for known protein sequences
Protein Structure Initiative (PSI)
![Page 8: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/8.jpg)
8
Structure Similarity Refers to how well (or poorly) 3D folded
structures of proteins can be aligned Expected to reflect functional similarities
(interaction with other molecules)
Proteins in the TIM barrel fold family
![Page 9: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/9.jpg)
9
Alignment of 1xis and 1nar (TIM-Barrels)
Alignment computed by DALI helix axes
1xis1nar
Sayle, R. RasMol. A protein visualization tool.http://www.umass.edu/microbio/rasmol/index2.htm.
ribbon format
backbone format
![Page 10: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/10.jpg)
10
Structure Similarity Refers to how well (or poorly) 3D folded
structures of proteins can be aligned Is expected to reflect functional similarities
(interaction with other molecules) 2000: ~ 20,000 structures in PDB
~ 4,000 different folds (1:5 ratio)
![Page 11: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/11.jpg)
11
![Page 12: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/12.jpg)
12
![Page 13: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/13.jpg)
13
Structure Similarity Refers to how well (or poorly) 3D folded
structures of proteins can be aligned Is expected to reflect functional similarities
(interaction with other molecules) 2000: ~ 20,000 structures in PDB
~ 4,000 different folds (1:5 ratio) Three possible reasons:
- evolution, - physical constraints (e.g., few ways to maximize hydrophobic interactions), - limits in techniques used for structure determination
Given a new structure, the probability is high that it is similar to an existing one
![Page 14: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/14.jpg)
14
Sequence Structure Function
sequencesimilarity
Why Comparing Protein Folded Structures?
Low sequence similarity may yield very similar structures Sometimes high sequence similarity yields different
structures
![Page 15: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/15.jpg)
15
Alignment of 1xis and 1nar (TIM-Barrels)
1xis and 1nar have only 7% sequenceidentity, but approximately 70% of the residues are structurally similar
![Page 16: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/16.jpg)
16
Sequence Structure Function
sequencesimilarity
structuresimilarity
Why Comparing Protein Folded Structures?
Low sequence similarity may yield very similar structures Sometimes high sequence similarity yields different structures Structure comparison is expected to provide more pertinent
information about functional (dis-)similarity among proteins, especially with non-evolutionary relationships or non-detectable evolutionary relationships
![Page 17: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/17.jpg)
17
Ill-Posed Problem Multiple Terminology
(Dis-)similarity analysis Structure comparison Alignment, superposition, matching Classification
Applications Definitions and issuesMethods
![Page 18: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/18.jpg)
18
A Few Web Sites Protein Data Bank (PDB):
http://www.rcsb.org/pdb/ Protein classification:
SCOP:http://scop.berkeley.edu/
CATHhttp://www.biochem.ucl.ac.uk/bsm/cath/
Protein alignment: DALI:
http://www.ebi.ac.uk/dali/ LOCK:
http://motif.stanford.edu/lock2/
![Page 19: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/19.jpg)
19
Application #1: Find Global Similarities Among Protein
Structures Given two protein structures, find the
largest similar substructures For example, a substructure is a subset
of C atoms or a subset of secondary structure elements in each molecule
Several possible similarity measures Variants: 1-to-1, 1-to-many, many-to-
many (PDB) Must be automatic (and fast)
![Page 20: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/20.jpg)
20
Application #2: Classify Proteins
Many proteins, but relatively few distinct fold families [Chotia, 1992; Holm and Sander, 1996; Brenner et al. 1997]
Hierarchical classification Insight into functions and structure
stabilization Basis for homology and threading
Manual classification SCOP [Murzin et al., 1995]
![Page 21: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/21.jpg)
21
Application #2: Classify Proteins
Many proteins, but relatively few distinct fold families [Chotia, 1992; Holm and Sander, 1996; Brenner et al. 1997]
Hierarchical classification Insight into functions and structure
stabilization Basis for homology and threading
Manual classification SCOP [Murzin et al., 1995]
Increasing size of PDB Automatic classifiers: CATH [Orengo et al., 1997]; Pclass [Singh et al.]; FSSP [Holm and Sander]
Class: Similar secondary structure content
Fold: SSE’s in similar arrangement
Family: Clear evolutionary relationship
![Page 22: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/22.jpg)
22
Manuel vs. Automatic Classification
![Page 23: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/23.jpg)
23
Application #3: Find Motif in Protein Structure
Given a protein structure and a motif (e.g., a small collection of atoms corresponding to a binding site)
Find whether the motif matches a substructure of the protein
Variant: One motif against many proteins
Active sites of 1PIP and 5PAD. Only 3 amino-acids participate in the motif
![Page 24: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/24.jpg)
24
Application #4: Find Pharmacophore
Given:•Small collection (5-10) of small flexible
ligands with similar activity (hence, assumed to bind at same protein site)
•Low-energy conformations (several dozens to few 100’s) for each ligand
Find substructure (pharmacophore) that occurs in at least one conformation of each ligand
Key problem in drug design when binding site is unknown
![Page 25: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/25.jpg)
25
Application #4: Find Pharmacophore
1TLP
4TMN
5TMN
6TMN
Inhibitors of thermolysin
Clusters of low-energy conformations of 1TLP
The 4 ligands overlappedwith their pharmacophorematched
![Page 26: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/26.jpg)
26
Application #5: Search for Ligands Containing a
Pharmacophore Given:•Database containing several 100,000,
or more, small ligands •A pharmacophore P
Find all ligands that have a low-energy conformation containing P
Data mining of pharmaceutical databases (lead generation)
S.M. LaValle, P.W. Finn, L.E. Kavraki, and J.C. Latombe. A Randomized Kinematics-Based Approach to Pharmacophore-Constrained Conformational Search and Database Screening. J. of Computational Chemistry, 21(9):731-747, July 2000
![Page 27: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/27.jpg)
27
Applications Definitions and issuesMethods
![Page 28: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/28.jpg)
28
3D Molecular Structure
Collection of (possibly typed) atoms or groups of atoms in some given 3D relative placement
The placement of a group of atoms is defined by the position of a reference point (e.g., the center of an atom) and the orientation of a reference direction
The type can be the atom ID, the amino-acid ID, etc…
![Page 29: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/29.jpg)
29
Matching of Structures
Two structures A and B match iff:
1.Correspondence: There is a one-to-one map between their elements
2.Alignment:There exists a rigid-body transform T such that the RMSD between the elements in A and those in T(B) is less than some threshold .
![Page 30: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/30.jpg)
30
Complete Match
![Page 31: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/31.jpg)
31
Alignment of 3adk and 1gky
But a complete match is rarely possible: The molecules have different sizes Their shapes are only locally similar
Both matching and non-matching secondary structure elements
![Page 32: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/32.jpg)
32
Partial Match
Notion of support σ of the match: the match is between σ(A) and σ(B) Dual problem: - What is the support? - What is the transform? Often several (many) possible supports Small supports motifs
![Page 33: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/33.jpg)
33
Mathematical Relative
f
g
||f g||2
s
Over which support?
![Page 34: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/34.jpg)
34
Mathematical Relative
f
g
||f g||2
s
Over which support?
![Page 35: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/35.jpg)
35
Multiple Partial Matches
![Page 36: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/36.jpg)
36
Distributed Support
B
A
B
A
Gap
σ(A)
σ(B)
![Page 37: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/37.jpg)
37
What is Best?
B
A
B
A
Should gaps be penalized?
![Page 38: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/38.jpg)
38
What About This?
B
A
Sequence along backbone is not preserved
![Page 39: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/39.jpg)
39
Similarity measure is unlikely to satisfy triangular inequality for partial match
![Page 40: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/40.jpg)
40
Scoring Issues
Trade-off between size of σ and RMSD How should gaps be counted? Is there a “quality” of the correspondence?
[The correspondence may, or may not, satisfy type and/or backbone sequence preferences]
Should accessible surface be given more importance?
Similarity measure may be different from the inverse of RSMD (though no consensus on best measure!)
But RMSD is computationally very convenient!
![Page 41: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/41.jpg)
41
Examples
2( )
max / 2( )
1
Ti T
i i
ANGAP
a T b
B
2( )
1min ( )
| ( ) |T i ii T
a T bT
RMSD dissimilarity measure emphasizes differences smaller support
STRUCTAL’s similarity measure emphasizes similarities larger support
Gap penalty
![Page 42: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/42.jpg)
42
Comparison of Similarity Measures
A.C.M. May. Toward more meaningful hierarchical classification of amino acids scoring functions. Protein Engineering, 12:707-712, 1999reviews 37 protein structure similarity measures
The difficulty of defining a similarity score is probably due to the facts that structure comparison is an ill-posed problem and has multiple solutions
![Page 43: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/43.jpg)
43
Bottom LineFinding an optimal partial match is NP-hard: No fast algorithm is guaranteed to give an optimal answer for any given measure [Godzik, 1996]
Heuristic/approximate algorithms Probably not a single solution, but application- dependent solutions But there exist general algorithmic principles
![Page 44: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/44.jpg)
44
Computational Questions
Given a (dis)similarity measure and two proteins, compute the best match:
Which support? Which correspondence? Which alignment transform?
![Page 45: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/45.jpg)
45
Applications Definitions and issuesMethods
![Page 46: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/46.jpg)
46
Find Global Similarities Among Protein Structures
Input: Two sets of features (atoms or groups of atoms) {a1,…,an} and {b1,…,bm} belonging to two different proteins A and B
Output: - Maximal correspondence set C of pairs (ai,bj), where all ai and all bj are distinct- Alignment transform T such that the RMSD of the pairs (ai,T(bj)) is less than a given
Several possible outputsVariant of the Largest Common Point Set problem[Akutsu and Halldorsson, 1994]
![Page 47: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/47.jpg)
47
Possible Correspondence Constraints
Typed features:(ai,bj) is a possible correspondence pair iff Type(ai) = Type(bj)
Ordered features:(ai,bj) and (ai’,bj’), where i’>i, are possible correspondence pairs iff j’>j[E.g., sequence along backbone]
![Page 48: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/48.jpg)
48
Some Existing SoftwareC atoms: DALI [Holm and Sander, 1993]
STRUCTAL [Gerstein and Levitt, 1996]
MINAREA [Falicov and Cohen, 1996]
CE [Shindyalov and Bourne, 1998]
ProtDex [Aung,Fu and Tan, 2003]
Secondary structure elements and C atoms: VAST [Gibrat et al., 1996]
LOCK [Singh and Brutlag, 1996]
3dSEARCH [Singh and Brutlag, 1999]
![Page 49: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/49.jpg)
49
RMSD ≠ SimilarityBut matches and RMSD’s are not exactly what we need
In general, we need to computea similarity measure of the form maxT S(A,T(B)) where S is more complex than RMSD
Two-step approach: 1. Compute best matches using RMSD 2. Adjust transform to maximize similarity measure
![Page 50: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/50.jpg)
50
Computation of Best Matches
Two “simultaneous” subproblems • Find maximal correspondence set C• Find alignment transform T
Chicken-and-egg issue: Each subproblem is relatively simple:
– If we knew C, we could compute T– If we knew T, we could get C by proximity
But the combination is hard !!!
![Page 51: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/51.jpg)
51
Computation of Best Matches
Two “simultaneous” subproblems • Find maximal correspondence set C• Find alignment transform T
Chicken-and-egg issue: Each subproblem is relatively simple:
– If we knew C, we could compute T– If we knew T, we could get C by proximity
But the combination is hard !!!
Only requires computing 6 parameters
![Page 52: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/52.jpg)
52
Find Alignment Transform
Two sets of points A= {a1,…,an} and B = {b1,…,bn}
Correspondence pairs (ai, bi) Find T = arg minT RMSD(A,T(B)) O(n) closed-form solution
[Arun, Huang, and Blostein, 87] [Horn, 87][Horn, Hilden, and Negahdaripour, 88]
![Page 53: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/53.jpg)
53
O(n) SVD-Based Algorithm T combines translation t and rotation R,
such that T(bi) = t + R(bi)
b = (Σi=1,...,nbi)/n [mean of the bi’s] Place the origin of coordinate system at b
minT RMSD(A,T(B)) simplifies to (up to some constants):
t and R can be computed separately
t = a [mean of the ai’s]
n n
2
i i it,Ri=1 i=1
min a-t -2 a,R(b)
[Arun, Huang, and Blostein, 87]
![Page 54: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/54.jpg)
54
O(n) SVD-Based Algorithm A3n = [a1-a, ..., an-a] B3n = [b1-b, ..., bn-
b]
Compute SVD decomposition of 3×3 correlation matrix BAT: BAT = UDVT where D is a diagonal matrices with decreasing non-negative entries (singular values) along the diagonal
If det(U)det(V) = 1 then S = I, else S = diag(1,1,-1)
R = USVT[Arun, Huang, and Blostein, 87]
![Page 55: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/55.jpg)
55
[Arun, Huang, and Blostein, 87] rotation matrix
[Horn, 87] quaternion
![Page 56: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/56.jpg)
56
Trial-and-Error Approach to Protein Structure
Comparison
Guess small correspondence set
Compute T
Update correspondence set (correspondence from proximity)
Apply T
![Page 57: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/57.jpg)
57
Trial-and-Error Approach to Protein Structure
Comparison
1. Set CS to a seed correspondence set (small set sufficient to generate an alignment transform)
2. Compute the alignment transform T for CS and apply T to the second protein B
3. Update CS to include all pairs of features that are close apart
4. If CS has changed, then return to Step 2 else return (CS,T)
![Page 58: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/58.jpg)
58
Trial-and-Error Approach to Protein Structure
Comparison- result = nil- Iterate N times:
1. Set CS to a seed correspondence set (small set sufficient to generate an alignment transform)
2. Compute the alignment transform T for CS and apply T to the second protein B
3. Update CS to include all pairs of features that are close apart
4. If CS has changed, then return to Step 2 else result result {(CS,T)}
- Return result
![Page 59: 1 Protein Structure Similarity. 2 Secondary Structure Elements: helices strands/sheets & loops](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e455503460f94b39537/html5/thumbnails/59.jpg)
59
How to get seed correspondences?