protein structure alignment beyond spatial proximity 3 dsig_2012
DESCRIPTION
Background / Purpose: The problem of automatically constructing an accurate protein structure alignment still remains challenging especially when proteins to be aligned are distantly-related. Main conclusion: We present a novel method, DeepAlign, which aligns two protein structures using not only spatial proximity of equivalent residues (after rigid-body superposition), but also evolutionary distance and hydrogen bonding similarity.TRANSCRIPT
Protein structure alignment beyond spatial proximity
3DSIG 2012 Jul 14, Long Beach, California
Sheng WANGToyota Technological Institute at Chicago
Related works on Pairwise Structure Alignment
1
2
Almost all the structure alignment tools
TMalign, fr-TMalign
3 DALI, MUSTANG
4MAMMOTH, Vorolign, YAKUSA
5 FATCAT, CE, MATT, FlexProt
Note: all proteins we align only consider their C-alpha atom
Our contribution
Design a scoring function • local sub-structure similarity• evolutionary and functional information• angular similarity for hydrogen bonding
Employ a fast and efficient search algorithm • from highly similar local sub-structures pair (SFP)• recruit new SFPs that satisfies spatial constrains• final refine the alignment within a bound
Scoring Function
local similarity global similarity
CLESUM is the local structure substitution matrix;BLOSUM is the amino acid substitution matrix; v(i,j) measures the angular similarity using three vectors; d(i,j) measures the spatial proximity of two aligned residues. Note: both v(i,j) and d(i,j) are calculated after rigid-body superposition.
Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
θ
θ’
τ
i-2
i-1
i
i+1
(A)
(B)
RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCP
LDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR
The transformation from 3D structure to 1D CLE strings
alpha
beta
coil
S Wang, WM Zheng, “CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters.” JBCB, 2008
CLESUM : Conformational LEtter SUbstitution Matrix
Mij = 20* log 2 (Pij/PiPj)
Note: CLESUM is constructed using FSSP representatives.
typical helix
typical sheet
evolutionary
+ geometric
HHHHHHH EGHILLI
DGHVLLV HHHHHHH
HHHHHHH GHILLIQ
DGHVLLV HHHHHHH
(A) (B) correct incorrect
Same CLESUM, different BLOSUM
CLE ->AMI ->
Why Max and Add ?
max(0,CLESUM(i,j)+BLOSUM(i,j) )
BLOSUM
CLESUM
+ -
+
-√ o
×o
Note: log (Cij/ CiCj) + log (Bij/ BiBj) = log(CijBij / CiCj BiBj)
(A) (B) incorrect correct smaller RMSD larger RMSD
Why use angular similarity ?
The three vectors used in the vect-score v(i,j).
Using three vector's deviation for angular similarity
DeepAlign-score
SFP_long
SFP_short
Search Algorithm
[2] SFP stands for Similar Fragment Pair, using ∑max(0,CLESUM(i,j)+BLOSUM(i,j) )
Note:
[1] TopK > TopJ > M
Sort both SFP lists
SFP_long score rank
5 2 4 1
Example: TopK = 5; TopJ = 1
# of consistent SFPs = 4 # of consistent SFPs = 1
From TopK coarse-grained to TopJ fine-grained initial alignment
Top2 SFP is globally supported by three other SFPs, while Top1 SFP is supported only by itself.
3
ThirdUpdate
d1 d2
d3
d1 > d2 > d3
OutputAlignment
FisrtUpdate
SecondUpdate
Refine each fine-grained initial alignment by three iteration
Final refinement
SFP_short score rank
(high -> low)
Final refinement on DeepAlign-score only in bounded area
(1) refined fine-grained alignment (2) bounded area upon the alignment
(3) dynamic programming to find a path with maximal DeepAlign-score within bounded area
• CDD (Conserved Domain Database): contains 3591 conserved domain structure alignments.
• MALUDUP: contains 241 alignments for homologous domains originated from internal duplication.
• MALISAM: contains 130 alignments for structurally analogous motifs in proteins.
Result on manually-curated data
Result on discrimination data
• We use SABmark to test the ability of identifying distant homologs (super-family) and structural analogs (fold) among those negative data (with no structural similarity)
DeepAlign
DeepAlign
super-family fold
One example
Superimposition of domain d1pqsa_ and d1poh__ from MALISAM. (A) TMalign, (B) DeepAlign optimizing TM-score and (C) DeepAlign.
TMscore 0.288
TMscore 0.514
TMscore 0.473
Thank you !!
Please find the executable program of DeepAlign at:http://ttic.uchicago.edu/~jinbo/DeepAlign/DeepAlign_exe_V1.00.tar.gz