![Page 1: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/1.jpg)
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein
Motifs
Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,Olivier Lichtarge, Lydia E. Kavraki
![Page 2: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/2.jpg)
Motivation
• Understanding the function of proteins is a fundamental purpose of biology
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
• Experimental determination of protein function is expensive and time consuming
• Algorithms for computational function prediction could guide and accelerate protein function discovery process
![Page 3: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/3.jpg)
A Computational Approach
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
• Comparative Analysis– Focus: Algorithms for Comparative
Analysis
• What is similar about proteins with similar function?– Sequence – same components?– Geometry – same structure?– Dynamics – same motion?
![Page 4: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/4.jpg)
A Computational Approach
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
• Comparative Analysis– Focus: Algorithms for Comparative
Analysis
• What is similar about proteins with similar function?– Sequence – same components?– Geometry – same structure?– Dynamics – same motion?
(Same Chemistry)
![Page 5: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/5.jpg)
What do we need?
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
• A motif for comparison– Representative of Biological function
• An algorithm for comparison– Search for Geometric and Chemical
similarity
• Statistical analysis– Classification of results
![Page 6: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/6.jpg)
Outline
• Evolutionary Trace (ET)– A source of biologically relevant motifs using
evolutionary data
• Match Augmentation (MA)– An algorithm for identifying geometric
similarity
• Statistical analysis– Statistically determined geometric thresholds
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 7: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/7.jpg)
The Evolutionary Trace (ET)
Lichtarge et al, JMB 1996; Lichtarge et al, JMB 1997; Lichtarge et al, PNAS 1996; Sowa et al, NSB 2001
A A A . E C WG Y R I G C KA K R . D C WG T R L F C LG A K I Y C LG T R I A C KA K K . D C WG Y R L C C LA K Y . E C W
Structure
alignment
tree
+
Functional site
G T R I A C K
G Y R I G C KG Y R L C C LG T R L F C LG A K I Y C L
A A A . E C W
A K K . D C WA K R . D C W
A K Y . E C W
position 1 2 3 4 5 6 7
consensus X - - X - C Xrank 2 - - 4 - 1 3
Evolutionary Trace
rank 4
![Page 8: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/8.jpg)
ET Clusters Functionally Relevant
http://imgen.bcm.tmc.edu/molgenlabs/lichtarge/trace_of_the_week/traces.html
Ligand binding site
ET clusters
Trp1 domain of HopDihydropteroate SynthaseGalectin CRDCluster Type
Structural Epitope : Yellow = ligand, Blue = Residues within 5Å of the ligandET Clusters : Yellow = ligand, Red = Largest Cluster, Other colors = trace residues
![Page 9: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/9.jpg)
Geometric Motifs
• Trace Clusters are functionally relevant
• A source for geometric motifs
• Geometric Motifs Function– Given a protein structure:
• Same Amino Acids • Same Geometry and Chemistry
– Does the protein have the same function?Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
?
![Page 10: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/10.jpg)
Outline
• Evolutionary Trace (ET)– A source of biologically relevant motifs using
evolutionary data
• Match Augmentation (MA)– An algorithm for identifying geometric
similarity
• Statistical analysis– Statistically determined geometric thresholds
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 11: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/11.jpg)
Geometric Comparison Algorithms• Geometric Hashing
Wolfson H.J. et al. IEEE Comp. Sci. Eng., 4(4):10–21,1997.
• JESSBarker J.A. et al. Bioinformatics, 19(13):1644-9, 2003.
• PINTSStark A. et al. Journal of Molecular Biology,
326:1307-16, 2003.
• Many Others– webFEATURE, DALI, CE, SSAP…
• Our method: Match Augmentation– Integrate Structural and Evolutionary data– Efficient application of hashing and depth first search
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
PSB 2005
![Page 12: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/12.jpg)
Geometric Comparison Strategy
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
• Biological Input: – A structure of a functional site (Motif)– A protein structure with unknown function
(Target)
• Geometric Search: – Find target atoms geometrically similar to motif
atoms, similar atoms and amino acids (Match)
• Output:– Match of atoms with greatest geometric similarity
• Might potentially identify a similar functional site in the target
![Page 13: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/13.jpg)
Motifs: Structure & Evolution Data• Structure of a Functional Site
– Points in three dimensions (3D) taken from atom coordinates (motif point)
– Labeled by residue and atom identity
– Alternate residues from mutation
• Support for complex active sites
– Priority-ranked motif points• Functional relevance
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
{G,C,T} C
43
12
![Page 14: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/14.jpg)
Input Data: Targets
• Targets– Points in 3D taken from atom
coordinates of whole protein structures (target points)
– Labeled by residue and atom identity
– No Alternate residues
– No ranking
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
{Y} C
![Page 15: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/15.jpg)
Search: Matching Criteria
• Geometric Similarity– points are within when
optimally superimposed
• Label Compatibility– Target residue label is a
member of Alternative Residues– Atom labels identical
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
<
{S,L,T}
{S}
C
C
![Page 16: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/16.jpg)
Matches
• Matches correlate motif points to target points– Bijection– Fulfill Geometric and Label Criteria
• Geometric Similarity measured by Least Root Mean Squared Distance (LRMSD)
• The match we seek:– Bijection of all motif points– Smallest LRMSD of all matches
considered
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
Motif
Target
Match
![Page 17: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/17.jpg)
Match Augmentation at a Glance
Input
SeedMatching
Augmentation
Output
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
• Design Principle:– Correlate high ranking points first– Exhaustively test potential matches– Filter for the match with lowest LRMSD
• Two Phases:– Seed Matching– Augmentation
![Page 18: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/18.jpg)
Match Augmentation at a Glance
Input
SeedMatching
Augmentation
Output
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
31
2Match High Ranked Points
• Design Principle:– Correlate high ranking points first– Exhaustively test potential matches– Filter for the match with lowest LRMSD
• Two Phases:– Seed Matching– Augmentation
![Page 19: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/19.jpg)
Match Augmentation at a Glance
Input
SeedMatching
Augmentation
Output
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
31
23
1
2Expand matches to rest of Motif
• Design Principle:– Correlate high ranking points first– Exhaustively test potential matches– Filter for the match with lowest LRMSD
• Two Phases:– Seed Matching– Augmentation
![Page 20: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/20.jpg)
Match Augmentation at a Glance
Input
SeedMatching
Augmentation
Output
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
31
2
4
31
2Expand matches to rest of Motif
• Design Principle:– Correlate high ranking points first– Exhaustively test potential matches– Filter for the match with lowest LRMSD
• Two Phases:– Seed Matching– Augmentation
![Page 21: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/21.jpg)
Match Augmentation at a Glance
Input
SeedMatching
Augmentation
Output
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
31
2
4
531
2Expand matches to rest of Motif
• Design Principle:– Correlate high ranking points first– Exhaustively test potential matches– Filter for the match with lowest LRMSD
• Two Phases:– Seed Matching– Augmentation
![Page 22: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/22.jpg)
Filtering Completed Matches• Augmentation implements a depth first search:
• Data is stored in heap of matches• Final output: match with smallest LRMSD
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
No more pointsto match
LRMSD:2.41
Matches Sortedby LRMSD
Final OutputLRMSD: 0.87
![Page 23: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/23.jpg)
Match Augmentation Summary
• Hybrid Algorithm– Seed Matching: Hashing – Augmentation: Depth First Search
• Finds matches to motifs within target structures– Final output: match with smallest LRMSD
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 24: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/24.jpg)
Testing MA on Biological data
• Data Set– 12 motifs selected from residues surrounding enzymatic
active sites – 73 targets, each evolutionarily related to one of the motifs– Details: www.cs.rice.edu/~brianyc/papers/PSB2005/
• Experimental Protocol– Search for each motif within every target.
• Matches of evolutionarily related motif-target pairs are “HPs” (BLUE)
• Matches of unrelated motif-target pairs are “NHPs” (RED)
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 25: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/25.jpg)
Match Augmentation Conclusions• Match Augmentation is accurate
– Identifies cognate active sites in 95.4% of evolutionarily related proteins
• Match Augmentation is very efficient– Matches can be found in a fraction of a
second
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 26: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/26.jpg)
Outline
• Evolutionary Trace (ET)– A source of biologically relevant motifs using
evolutionary data
• Match Augmentation (MA)– An algorithm for identifying geometric
similarity
• Statistical analysis– Statistically determined geometric thresholds
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 27: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/27.jpg)
Evaluating Statistical Significance• Hypothesis Testing Framework:
– H0: Motif and Target are functionally unrelated
– HA: Motif and Target are functionally related
• Reject H0 for a given match only if the match is unusual under H0.
• Problem: how do we evaluate the H0 for a given match?
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 28: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/28.jpg)
The “Usual” H0 distribution
• The set of matches between the motif and all functionally unrelated targets
• Previous methods approximate this distribution:– JESS
• Matches are compared to a reference population of motifs is partially ordered by degree of occurrence
– PINTS• Approximate the distribution of matches with an artificial
curve parameterized by motif size and residue content.
• MA can calculate this distribution explicitly by computing matches to the entire PDB
• JESS: Barker J.A. et al. Bioinformatics, 19(13):1644-9, 2003.• PINTS: Stark A. et al. Journal of Molecular Biology, 326:1307-16, 2003.Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 29: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/29.jpg)
A Distribution of Match LRMSDs
• LRMSD distribution of matches with entire PDB– Almost all known protein structures – Almost no functional relation to a our motifs
• Reasonable H0 Distribution
0 1 2 3 4 0 1 2 3 4 5
Unsmoothed Smoothed
LRMSD LRMSD
Frequency
Frequency
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 30: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/30.jpg)
How unusual is our match?
• We want: the probability of observing a match with lower LRMSD than given match
RMSD
est
imat
ed d
en
sity
0 1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
LRMSD
Frequency
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
Match LRMSD
A B
A: Area left of line
B: Area under curve
p-Value:A
Bp =
matches with lower LRMSD
matches total
• Apply P-value to reject H0
![Page 31: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/31.jpg)
Statistical Significance
• Result: Data driven statistical significance value (p-value)– No dependence on approximations like previous work
• p-value of a match tells us the probability of observing another match with lower LRMSD, with a functionally unrelated target
• Apply p-value to reject H0
• Do matches identifying cognate active sites (HPs) have low p-values? (i.e. Can we reject H0 for HPs?)
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 32: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/32.jpg)
Testing our Statistical Analysis
• Distributions of matches over the PDB can be calculated efficiently– 12:48 on a single machine, on average
• Do not have to scan the entire PDB to accurately determine the H0 distribution– 5% random sample accurate enough– Reduces sample time to 0:38, on average
• Matches of cognate active sites (HPs) are statistically significant (low p-values)
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 33: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/33.jpg)
Conclusions
• Match Augmentation is accurate and extremely efficient– Correctly identifies cognate active sites (HPs)– Identifies matches in fractions of a second
• Algorithmic efficiency enables detailed Statistical Analysis– Explicitly calculate H0 distribution without dependence on
approximated H0 distributions
– Matches of cognate active sites (HPs) are statistically significant
– Significance threshold translates into useful motif-specific LRMSD thresholds
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki
![Page 34: Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs Brian Y. Chen, Viacheslav Y. Fofanov, David M. Kristensen, Marek Kimmel,](https://reader030.vdocuments.net/reader030/viewer/2022033022/56649d425503460f94a1dd76/html5/thumbnails/34.jpg)
Special Thanks• Kavraki Group
– David Schwarz– Amarda Shehu– Allison Heath– Hernan Stamati– Anne Christian– Drew Bryant– Amanda Cruess– Brad Dodson– Jessica Wu
• Lichtarge Lab– David
Kristensen– Dan Morgan– Ivana Mihalek– Hui Yao
• Kimmel Group– Viacheslav
Fofanov
• Funding– NSF– NLM
5T15LM07093– March of
Dimes– Whitaker
Foundation– Sloan
Foundation– VIGRE– AMD
Algorithms for Structural Comparison and Statistical Analysis of 3D Protein Motifs
B. Chen, V. Fofanov, D. Kristensen, M. Kimmel, O. Lichtarge, L. Kavraki