protein structure alignment beyond spatial proximity 3 dsig_2012

Post on 12-Jun-2015

115 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Background / Purpose: The problem of automatically constructing an accurate protein structure alignment still remains challenging especially when proteins to be aligned are distantly-related. Main conclusion: We present a novel method, DeepAlign, which aligns two protein structures using not only spatial proximity of equivalent residues (after rigid-body superposition), but also evolutionary distance and hydrogen bonding similarity.

TRANSCRIPT

Protein structure alignment beyond spatial proximity

3DSIG 2012 Jul 14, Long Beach, California

Sheng WANGToyota Technological Institute at Chicago

Related works on Pairwise Structure Alignment

1

2

Almost all the structure alignment tools

TMalign, fr-TMalign

3 DALI, MUSTANG

4MAMMOTH, Vorolign, YAKUSA

5 FATCAT, CE, MATT, FlexProt

Note: all proteins we align only consider their C-alpha atom

Our contribution

Design a scoring function • local sub-structure similarity• evolutionary and functional information• angular similarity for hydrogen bonding

Employ a fast and efficient search algorithm • from highly similar local sub-structures pair (SFP)• recruit new SFPs that satisfies spatial constrains• final refine the alignment within a bound

Scoring Function

local similarity global similarity

CLESUM is the local structure substitution matrix;BLOSUM is the amino acid substitution matrix; v(i,j) measures the angular similarity using three vectors; d(i,j) measures the spatial proximity of two aligned residues. Note: both v(i,j) and d(i,j) are calculated after rigid-body superposition.

Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)

θ

θ’

τ

i-2

i-1

i

i+1

(A)

(B)

RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCP

LDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR

The transformation from 3D structure to 1D CLE strings

alpha

beta

coil

S Wang, WM Zheng, “CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters.” JBCB, 2008

CLESUM : Conformational LEtter SUbstitution Matrix

Mij = 20* log 2 (Pij/PiPj)

Note: CLESUM is constructed using FSSP representatives.

typical helix

typical sheet

evolutionary

+ geometric

HHHHHHH EGHILLI

DGHVLLV HHHHHHH

HHHHHHH GHILLIQ

DGHVLLV HHHHHHH

(A) (B) correct incorrect

Same CLESUM, different BLOSUM

CLE ->AMI ->

Why Max and Add ?

max(0,CLESUM(i,j)+BLOSUM(i,j) )

BLOSUM

CLESUM

+ -

+

-√ o

×o

Note: log (Cij/ CiCj) + log (Bij/ BiBj) = log(CijBij / CiCj BiBj)

(A) (B) incorrect correct smaller RMSD larger RMSD

Why use angular similarity ?

The three vectors used in the vect-score v(i,j).

Using three vector's deviation for angular similarity

DeepAlign-score

SFP_long

SFP_short

Search Algorithm

[2] SFP stands for Similar Fragment Pair, using ∑max(0,CLESUM(i,j)+BLOSUM(i,j) )

Note:

[1] TopK > TopJ > M

Sort both SFP lists

SFP_long score rank

5 2 4 1

Example: TopK = 5; TopJ = 1

# of consistent SFPs = 4 # of consistent SFPs = 1

From TopK coarse-grained to TopJ fine-grained initial alignment

Top2 SFP is globally supported by three other SFPs, while Top1 SFP is supported only by itself.

3

ThirdUpdate

d1 d2

d3

d1 > d2 > d3

OutputAlignment

FisrtUpdate

SecondUpdate

Refine each fine-grained initial alignment by three iteration

Final refinement

SFP_short score rank

(high -> low)

Final refinement on DeepAlign-score only in bounded area

(1) refined fine-grained alignment (2) bounded area upon the alignment

(3) dynamic programming to find a path with maximal DeepAlign-score within bounded area

• CDD (Conserved Domain Database): contains 3591 conserved domain structure alignments.

• MALUDUP: contains 241 alignments for homologous domains originated from internal duplication.

• MALISAM: contains 130 alignments for structurally analogous motifs in proteins.

Result on manually-curated data

Result on discrimination data

• We use SABmark to test the ability of identifying distant homologs (super-family) and structural analogs (fold) among those negative data (with no structural similarity)

DeepAlign

DeepAlign

super-family fold

One example

Superimposition of domain d1pqsa_ and d1poh__ from MALISAM. (A) TMalign, (B) DeepAlign optimizing TM-score and (C) DeepAlign.

TMscore 0.288

TMscore 0.514

TMscore 0.473

Thank you !!

Please find the executable program of DeepAlign at:http://ttic.uchicago.edu/~jinbo/DeepAlign/DeepAlign_exe_V1.00.tar.gz

top related