protein structure alignment beyond spatial proximity 3 dsig_2012

18
Protein structure alignment beyond spatial proximity 3DSIG 2012 Jul 14, Long Beach, California Sheng WANG Toyota Technological Institute at Chicago

Upload: sheng-wang

Post on 12-Jun-2015

115 views

Category:

Technology


0 download

DESCRIPTION

Background / Purpose: The problem of automatically constructing an accurate protein structure alignment still remains challenging especially when proteins to be aligned are distantly-related. Main conclusion: We present a novel method, DeepAlign, which aligns two protein structures using not only spatial proximity of equivalent residues (after rigid-body superposition), but also evolutionary distance and hydrogen bonding similarity.

TRANSCRIPT

Page 1: Protein structure alignment beyond spatial proximity 3 dsig_2012

Protein structure alignment beyond spatial proximity

3DSIG 2012 Jul 14, Long Beach, California

Sheng WANGToyota Technological Institute at Chicago

Page 2: Protein structure alignment beyond spatial proximity 3 dsig_2012

Related works on Pairwise Structure Alignment

1

2

Almost all the structure alignment tools

TMalign, fr-TMalign

3 DALI, MUSTANG

4MAMMOTH, Vorolign, YAKUSA

5 FATCAT, CE, MATT, FlexProt

Note: all proteins we align only consider their C-alpha atom

Page 3: Protein structure alignment beyond spatial proximity 3 dsig_2012

Our contribution

Design a scoring function • local sub-structure similarity• evolutionary and functional information• angular similarity for hydrogen bonding

Employ a fast and efficient search algorithm • from highly similar local sub-structures pair (SFP)• recruit new SFPs that satisfies spatial constrains• final refine the alignment within a bound

Page 4: Protein structure alignment beyond spatial proximity 3 dsig_2012

Scoring Function

local similarity global similarity

CLESUM is the local structure substitution matrix;BLOSUM is the amino acid substitution matrix; v(i,j) measures the angular similarity using three vectors; d(i,j) measures the spatial proximity of two aligned residues. Note: both v(i,j) and d(i,j) are calculated after rigid-body superposition.

Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)

Page 5: Protein structure alignment beyond spatial proximity 3 dsig_2012

θ

θ’

τ

i-2

i-1

i

i+1

(A)

(B)

RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCP

LDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR

The transformation from 3D structure to 1D CLE strings

alpha

beta

coil

S Wang, WM Zheng, “CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters.” JBCB, 2008

Page 6: Protein structure alignment beyond spatial proximity 3 dsig_2012

CLESUM : Conformational LEtter SUbstitution Matrix

Mij = 20* log 2 (Pij/PiPj)

Note: CLESUM is constructed using FSSP representatives.

typical helix

typical sheet

evolutionary

+ geometric

Page 7: Protein structure alignment beyond spatial proximity 3 dsig_2012

HHHHHHH EGHILLI

DGHVLLV HHHHHHH

HHHHHHH GHILLIQ

DGHVLLV HHHHHHH

(A) (B) correct incorrect

Same CLESUM, different BLOSUM

CLE ->AMI ->

Page 8: Protein structure alignment beyond spatial proximity 3 dsig_2012

Why Max and Add ?

max(0,CLESUM(i,j)+BLOSUM(i,j) )

BLOSUM

CLESUM

+ -

+

-√ o

×o

Note: log (Cij/ CiCj) + log (Bij/ BiBj) = log(CijBij / CiCj BiBj)

Page 9: Protein structure alignment beyond spatial proximity 3 dsig_2012

(A) (B) incorrect correct smaller RMSD larger RMSD

Why use angular similarity ?

Page 10: Protein structure alignment beyond spatial proximity 3 dsig_2012

The three vectors used in the vect-score v(i,j).

Using three vector's deviation for angular similarity

Page 11: Protein structure alignment beyond spatial proximity 3 dsig_2012

DeepAlign-score

SFP_long

SFP_short

Search Algorithm

[2] SFP stands for Similar Fragment Pair, using ∑max(0,CLESUM(i,j)+BLOSUM(i,j) )

Note:

[1] TopK > TopJ > M

Sort both SFP lists

Page 12: Protein structure alignment beyond spatial proximity 3 dsig_2012

SFP_long score rank

5 2 4 1

Example: TopK = 5; TopJ = 1

# of consistent SFPs = 4 # of consistent SFPs = 1

From TopK coarse-grained to TopJ fine-grained initial alignment

Top2 SFP is globally supported by three other SFPs, while Top1 SFP is supported only by itself.

3

Page 13: Protein structure alignment beyond spatial proximity 3 dsig_2012

ThirdUpdate

d1 d2

d3

d1 > d2 > d3

OutputAlignment

FisrtUpdate

SecondUpdate

Refine each fine-grained initial alignment by three iteration

Final refinement

SFP_short score rank

(high -> low)

Page 14: Protein structure alignment beyond spatial proximity 3 dsig_2012

Final refinement on DeepAlign-score only in bounded area

(1) refined fine-grained alignment (2) bounded area upon the alignment

(3) dynamic programming to find a path with maximal DeepAlign-score within bounded area

Page 15: Protein structure alignment beyond spatial proximity 3 dsig_2012

• CDD (Conserved Domain Database): contains 3591 conserved domain structure alignments.

• MALUDUP: contains 241 alignments for homologous domains originated from internal duplication.

• MALISAM: contains 130 alignments for structurally analogous motifs in proteins.

Result on manually-curated data

Page 16: Protein structure alignment beyond spatial proximity 3 dsig_2012

Result on discrimination data

• We use SABmark to test the ability of identifying distant homologs (super-family) and structural analogs (fold) among those negative data (with no structural similarity)

DeepAlign

DeepAlign

super-family fold

Page 17: Protein structure alignment beyond spatial proximity 3 dsig_2012

One example

Superimposition of domain d1pqsa_ and d1poh__ from MALISAM. (A) TMalign, (B) DeepAlign optimizing TM-score and (C) DeepAlign.

TMscore 0.288

TMscore 0.514

TMscore 0.473

Page 18: Protein structure alignment beyond spatial proximity 3 dsig_2012

Thank you !!

Please find the executable program of DeepAlign at:http://ttic.uchicago.edu/~jinbo/DeepAlign/DeepAlign_exe_V1.00.tar.gz