computational aspects of molecular structure instructor: teresa ...€¦ · is white ....
TRANSCRIPT
![Page 1: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/1.jpg)
Lecture 6: Protein Structure comparison
Computational Aspects of Molecular Structure
Instructor: Teresa Przytycka, PhD
![Page 2: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/2.jpg)
• In evolution, structure is better preserved than sequence
• Structure comparison gives a powerful method for
searching for homologous proteins .
• Structure comparison allow to study protein evolution
• To classify structures
Why compare structures?
![Page 3: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/3.jpg)
Superposition of two structures
![Page 4: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/4.jpg)
Structural similarity between Acetylcholinesterase and Calmodulin
(Tsigelny et al, Prot Sci, 2000, 9:180
![Page 5: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/5.jpg)
Estimating Quality of the alignment: Root Mean Square Distance (RMSD)
∑=
=N
iii badN
BARMS1
2' ),(1),(
A= a1 … an ; B= b1 … bm ; Assume that ai is aligned with bi’ ; d(ai,bi) is the Euclidian distance between ai and bi.
![Page 6: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/6.jpg)
Problems with RMSD
A small local alignment error can propagate and the quality of alignment nay be underestimated
![Page 7: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/7.jpg)
Finding maximum common substructure is NP hard
Goal: Find the maximum subset of dots that are in both sets in the same relative position
We can superimpose 6 points
NP – hard: only exponential time algorithms are known
![Page 8: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/8.jpg)
Methods • Dynamic programming similar to sequence
alignment (we will discuss potential problems) • Identify pairs of fragments (usually secondary
structures) that are similar and try to glue them together into consistent alignment
• Presenting it as an optimization problem and using algorithms as simulated annealing, brunch and bound etc.
• Fast screening methods to that filter structure pairs to be compared by more elaborate algorithms
![Page 9: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/9.jpg)
Dynamic programming and it’s limitations
• This is not a clean dynamic programming type of problem but some program (e.g. SSAP) use DP as heuristic approach.
• Idea: score for a pair of two aligned residues is computed based on whether they are in the same context with respect to their (3D) environment.
• Environment is defined by the proximity to other close residues
A, B have similar environments thus we aligned them …but… After we remove x similarity is lost We don’t know what are your neighbors until you do whole alignment!
A
B
x
y z
![Page 10: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/10.jpg)
Example: SSAP
A, B- two fragments of protein structure views from i and k can be compared by calculating the difference between the corresponding vectors (that is vectors from i to all other nodes and from k to all other nodes.
![Page 11: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/11.jpg)
Double dynamic programming used by SSAP
• Using DP find optimal path Score between two vectors are a/(b+δ) where a,b constant
learned on PDB • Sum all optimal paths in the
summary matrix (top) • Other scores added: solvent
accessibility, torsion angle, volume
• Relative weight of these contribution optimized based on some pdb structures
• Do second dynamic programming step on the summary matrix.
![Page 12: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/12.jpg)
Going around the above problems and still using DP
• Method one: double dynamic programming • Method two: “iterative” dynamic programming
1. Let the current alignment be any alignment. 2. For every residuum compute vector describing its
environment using current alignment 3. Find best alignment using dynamic programming 4. Iteratate 2, 3 using the computed alignment as current
![Page 13: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/13.jpg)
SHEBA J.Joung, B.Lee (2000) Protein Engineering 535-543
STEP 1: Initial alignment. Scoring function for the first iteration of DP is as follows a i i’ = score_ for_anino_acid_similarity+ score_for_similarity_of_secodry_structures_it_belongs_to + similarity_in_watter_accesibility Iterative improvement:
STEP 2: Superimpose the structures so that the distances between aligned residues are minimized.
STEP 3: Using DP find max. number of aligned pairs whose distance is <3.5 A.
Iterate 2 and 3 REPEAT WHOLE PRCEDURE WITH A DIFFERENT INITIAL
ALIGNMENT (change first scoring function).
![Page 14: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/14.jpg)
DALI
• Dali is based on the comparison of intra-molecular distance matrices.
• The original Dali (Holm L and Sander C 1993, Protein structure comparison by alignment of distance matrices, J. Mol. Biol. 233:123-38) used a simulated annealing algorithm.
• A recent implementation, called DaliLite (Holm L and Park J, DaliLite workbench for protein structure comparison, (Bioinformatics 16:566-7), used a branch-and-bound strategy.
![Page 15: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/15.jpg)
Contact matrix
d(i,j) = distance( cα i , cα j )
Idea: Similar structures have similar contact matrices
Contact matrix n x n matrix where n = #residues
Below, pairs with d(i,j) below a certain treschold are gray and the rest is white
![Page 16: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/16.jpg)
Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747
• Identify all pairs of fragments that can be reasonably aligned without gaps: AFP – aligned fragment pairs (length <=8)for example using Contact Map similarity (see next slides)
• Extend the fragments using a heuristic (no global optimization)
![Page 17: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/17.jpg)
Dali solves optimization problem
),()),((),(
Bij
Aij
Biji j
Aij ddwdd
BAS
∑∑ Δ−
=
θ
i,j pairs of residues from “core” = aligned part D deviation of intramolecular C_alpha distance relative to their arithmetic mean θ – threshold similarity set empirically to 0.2 (20%) ω – exp(-d2/r2) r = 20A – down weight contribution from distant pairs
Find set of aligned residues pairs (i,j) that maximize the function
![Page 18: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/18.jpg)
Finding All fragments • Consider all possible pairs of 8x8 submatrices of the
contact matrices. Such matrices are small enough that the problem can be solved optimally.
• Put the fragments together using a Monte Carlo algorithm (slow process) –older version • New version brunch and bound
![Page 19: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/19.jpg)
Remarks
• Another method: Combinatorial Extension (CE) also starts identifying such short fragments but puts them together using a variant of dynamic programming
![Page 20: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/20.jpg)
Methods based on Secondary Structure Alignments
![Page 21: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/21.jpg)
Reducing the size of representation of protein fold
All atom Back bone atoms
Polygonal chain Cα-atoms
![Page 22: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/22.jpg)
Reducing the size of representation of protein fold
Secondary structure vectors
![Page 23: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/23.jpg)
Approach based on comparing secondary structure arrangement
Motivation: 1. Folds are often
defined as arrangement of secondary structure elements (sse).
2. Why not to compare arrangement of sse rather than going down to atomic level?
1EJ9: Human topoisomerase I
![Page 24: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/24.jpg)
VAST- graph theoretical approach
• http://www2.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml
• Treat each secondary structure as a vector of direction and length corresponding to the direction and length of the secondary structure. Attributes of such vector include the type of secondary structure, number of residues, etc.
• For two secondary structure provide a way of describing the relative spatial position of secondary structures – distance, angle, etc.
• VAST finds maximal subset of secondary structures that are in the same relative positions in compared protein structures and in the same order within the structure.
![Page 25: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/25.jpg)
Step 1: represent secondary structures as vectors
![Page 26: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/26.jpg)
VAST: Calculate (rik, zik)
3
1
z For both the query and target structures, For each SSE k, set the origin at the midpoint of k. Then calculate rik and zik for the endpoints of SSEs i ≠ k.
Vector position relative to the xy plane
xy z13
r13
![Page 27: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/27.jpg)
VAST: Create Comparison Graph
IL-4
IL-6
3 1
4
6
1 2
3
5
1 2 3 4 5 6
1 2 3 4 5
4
2
5
Nodes: r13<>r12 z13<>z12
Arcs: φ16<>φ15 must follow sequence order
Select path with highest “weights”
N
N
C
C
![Page 28: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/28.jpg)
VAST: Refinement
Aligned residues are red
Alignment extended to the end of this strand
Aligned SSEs guide the alignment of the Cα atoms
Alignments are allowed to extend beyond SSE boundaries
Refined alignment is computed via a Gibbs sampling algorithm (i.e. Monte Carlo)
![Page 29: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/29.jpg)
VAST credits & www
http://www.ncbi.nlm.nih.gov/Structure/
• Steve Bryant • John Spouge • Jean-Francois Gibrat • Paul Thiessen • Tom Madej • Eric Sayers
![Page 30: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/30.jpg)
Missed similarities:
circular permutations
From: J.Jung, BK Lee Protein Science 2001 1881-1885
![Page 31: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/31.jpg)
Geometric Hashing (Indexing / Fold invariants)
![Page 32: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/32.jpg)
Hashing
Hash table
Hash function: assigns indexes in hash table to the objects.
Hash function
list of all the words with given hash
value Set of objects
![Page 33: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/33.jpg)
Choosing hash function for protein structures
?
? ? ?
?
?
• Ideally: Different folds: different hash and same fold same hash values.
• Problem – “same” fold does not mean identical structure.l
• Modified goal: Same fold – “similar” hash values.
![Page 34: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/34.jpg)
Hashing for protein structures • Given is a query structure and a data base of
structures • Find fast way of searching similar structure in a
data base. • Idea: assign to each protein a list of features. • Identify protein that have the same (or similar
features) • Example: feature: number of helices and strands
in the structure. Proteins that have very different secondary structure composition than the query protein are filtered out and in a subsequent phase only proteins with similar secondary structure composition are compared.
![Page 35: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/35.jpg)
Hashing function (key function)
• In general it is many to one - we accept the fact that different folds may lead to the same result but we want to minimize such overlaps.
![Page 36: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/36.jpg)
Key function that describes relative position of secondary structure vectors
![Page 37: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/37.jpg)
Assume d = 2
Hopefully all related triples are hashed in the neighborhood of the key of the query, in practice there may be some false positive /negative
![Page 38: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/38.jpg)
Practical considerations • Dimension d cannot be to large, or else finding all
neighbors is becoming costly • There are data structures that are designed for
searching for neighbors is d-dimensional structure • Examples of good hash (key) function
– Angles between vectors – Distance between midpoints
• Agreement of the key function on three vectors is usually not enough to declare possible similarity. We have to require a larger number of matches, how large – depends on the size of structure.
![Page 39: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/39.jpg)
Geometric hashing
• R. Nussinov and H.J. Wolfson. Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. Proc. Natl. Acad. Sci. USA., 88:10495-10499, 1991.
• L. Holm and C. Sander. 3-d lookup: Fast protein structure database searches at 90 % reliability. Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology 179-187.
![Page 40: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/40.jpg)
Projection to Rn
• Encode the structure as a n-dimensional real vector
• Reduce the problem of comparing structure to computing Euclidean distance between the vectors
Problem: How to find a good encoding?
![Page 41: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/41.jpg)
Idea
Same ? S S’
I I
I(S) I(S’)
Easier comparison
![Page 42: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/42.jpg)
Properties of an invariant
• S = S’ ! I(S) = I(S’) • I(S) = I(S’) !S=S’ (that is not always) • “Strength” of an invariant: how likely two
different object receive the same invariant
![Page 43: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/43.jpg)
“Shape descriptors” for polygonal lines
• Motivated by Vasiliev knot invariants • Introduced by Rogen and Bohr, (Math.
Biosciences, 2002). • Rogen and Fain (PANS 2002)
![Page 44: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/44.jpg)
Main Idea • Consider a polygonal line embedded in 3D • Consider a projection of such line on a plane and count the
crossings (with or without sign) W(i1,i2)=The number of crossing depends on projection, but
the average number of crossing over all possible projection is an invariant of an embedded line: CAN BE EASILY COMPUTED
+ _
![Page 45: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/45.jpg)
The average over all projection of the “diagram” crossing number
Wr(γ) = 1/(4π) w(t1,t2) dt1 dt2
For polygonal lines one can replace it with summations over all pairs of intervals
Wr(γ) = Σ i1,i2 W(i1,i2) Where W is the integral as above but restricted to the
two intervals.
Writhe
γ x γ - diagonal
![Page 46: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/46.jpg)
Can count both the average of signed crossings and unsigned crossings
• Singed: I (1,2) = Wr(γ) = Σ i1,i2 W(i1,i2) • Unsigned: I |1,2| = Wr(γ) = Σ i1,i2 |W(i1,i2)| (same as
above but the crossings are unsigned)
Crossings between projections of two intervals averaged over all projections
![Page 47: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/47.jpg)
Towards stronger invariants
• I (1,2) : – look at a pair of segments, – compute average crossing number for the pair, – sum over all pairs
• Extending the concept (following Vasiliev knot invariant) to I (1,2) (3,4) : – consider 2 pairs of intervals at a time – compute product W(i1,i2) W(i2,i4) for the two pairs – sum over all possible pairs of pairs – … you can also consider triplets and so on.
![Page 48: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/48.jpg)
PRIDE
• Carugo, Pongor 2002 Consider set of distances C . Build histogram of these distances and
compare them. (for several different values of n)
![Page 49: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/49.jpg)
![Page 50: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/50.jpg)
LFF (Local Feature Frequency) Choi, Kwon, Kim 2003
• For each structure, subdivide the Ca-Ca distance matrix into submatirces corresponding to overlapping fragments
• Select 100 such submatrices to be representative “models”
• For every protein compute the distribution of these selected patterns in the protein structure
• To compare protein structure – compare these distributions.
![Page 51: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/51.jpg)
![Page 52: Computational Aspects of Molecular Structure Instructor: Teresa ...€¦ · is white . Combinatorial Extension algorithm (CE) Shindyalov & Bourne, Proot. Eng. 1998 739-747 • Identify](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f605b675530f65a9278e99b/html5/thumbnails/52.jpg)
models
Count the number of occurrences Of each model (here first) in the structure and report it on the corresponding position (here first) on the 100-long vector. Comparing structures is reduced to comparing vectors