protein structure alignment beyond spatial proximity 3 dsig_2012
Post on 12-Jun-2015
115 Views
Preview:
DESCRIPTION
TRANSCRIPT
Protein structure alignment beyond spatial proximity
3DSIG 2012 Jul 14, Long Beach, California
Sheng WANGToyota Technological Institute at Chicago
Related works on Pairwise Structure Alignment
1
2
Almost all the structure alignment tools
TMalign, fr-TMalign
3 DALI, MUSTANG
4MAMMOTH, Vorolign, YAKUSA
5 FATCAT, CE, MATT, FlexProt
Note: all proteins we align only consider their C-alpha atom
Our contribution
Design a scoring function • local sub-structure similarity• evolutionary and functional information• angular similarity for hydrogen bonding
Employ a fast and efficient search algorithm • from highly similar local sub-structures pair (SFP)• recruit new SFPs that satisfies spatial constrains• final refine the alignment within a bound
Scoring Function
local similarity global similarity
CLESUM is the local structure substitution matrix;BLOSUM is the amino acid substitution matrix; v(i,j) measures the angular similarity using three vectors; d(i,j) measures the spatial proximity of two aligned residues. Note: both v(i,j) and d(i,j) are calculated after rigid-body superposition.
Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
θ
θ’
τ
i-2
i-1
i
i+1
(A)
(B)
RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCP
LDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR
The transformation from 3D structure to 1D CLE strings
alpha
beta
coil
S Wang, WM Zheng, “CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters.” JBCB, 2008
CLESUM : Conformational LEtter SUbstitution Matrix
Mij = 20* log 2 (Pij/PiPj)
Note: CLESUM is constructed using FSSP representatives.
typical helix
typical sheet
evolutionary
+ geometric
HHHHHHH EGHILLI
DGHVLLV HHHHHHH
HHHHHHH GHILLIQ
DGHVLLV HHHHHHH
(A) (B) correct incorrect
Same CLESUM, different BLOSUM
CLE ->AMI ->
Why Max and Add ?
max(0,CLESUM(i,j)+BLOSUM(i,j) )
BLOSUM
CLESUM
+ -
+
-√ o
×o
Note: log (Cij/ CiCj) + log (Bij/ BiBj) = log(CijBij / CiCj BiBj)
(A) (B) incorrect correct smaller RMSD larger RMSD
Why use angular similarity ?
The three vectors used in the vect-score v(i,j).
Using three vector's deviation for angular similarity
DeepAlign-score
SFP_long
SFP_short
Search Algorithm
[2] SFP stands for Similar Fragment Pair, using ∑max(0,CLESUM(i,j)+BLOSUM(i,j) )
Note:
[1] TopK > TopJ > M
Sort both SFP lists
SFP_long score rank
5 2 4 1
Example: TopK = 5; TopJ = 1
# of consistent SFPs = 4 # of consistent SFPs = 1
From TopK coarse-grained to TopJ fine-grained initial alignment
Top2 SFP is globally supported by three other SFPs, while Top1 SFP is supported only by itself.
3
ThirdUpdate
d1 d2
d3
d1 > d2 > d3
OutputAlignment
FisrtUpdate
SecondUpdate
Refine each fine-grained initial alignment by three iteration
Final refinement
SFP_short score rank
(high -> low)
Final refinement on DeepAlign-score only in bounded area
(1) refined fine-grained alignment (2) bounded area upon the alignment
(3) dynamic programming to find a path with maximal DeepAlign-score within bounded area
• CDD (Conserved Domain Database): contains 3591 conserved domain structure alignments.
• MALUDUP: contains 241 alignments for homologous domains originated from internal duplication.
• MALISAM: contains 130 alignments for structurally analogous motifs in proteins.
Result on manually-curated data
Result on discrimination data
• We use SABmark to test the ability of identifying distant homologs (super-family) and structural analogs (fold) among those negative data (with no structural similarity)
DeepAlign
DeepAlign
super-family fold
One example
Superimposition of domain d1pqsa_ and d1poh__ from MALISAM. (A) TMalign, (B) DeepAlign optimizing TM-score and (C) DeepAlign.
TMscore 0.288
TMscore 0.514
TMscore 0.473
Thank you !!
Please find the executable program of DeepAlign at:http://ttic.uchicago.edu/~jinbo/DeepAlign/DeepAlign_exe_V1.00.tar.gz
top related