inferring functional information from domain co-evolution yohan kim, mehmet koyuturk, umut topkara,...

24
Inferring Functional Information from Domain co-evolution Yohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama and Shankar Subramaniam Gaurav Chadha Deepak Desore

Upload: homer-jefferson

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Inferring Functional Information from Domain co-evolutionYohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama andShankar Subramaniam

Gaurav Chadha

Deepak Desore

Layout

Motivation Computational Methods and Algorithms Results Conclusion Questions

Motivation (1 of 2..)

Prior Work Focused on understanding Protein function at the level

of entire protein sequences Assumption: Complete Sequence follows single

evolutionary trajectory

It is well known that a domain can exist in various contexts, which invalidates the above assumption for multi-domain protein sequences

Motivation (2 of 2 ..)

Our approach Improvement of Multiple Profile method Constructs Co-evolutionary Matrix to assign

phylogenetic similarity scores to each protein pair

Identifies Co-evolving regions using residue-level conservation

Computational Methods & Algorithms Constructing phylogenetic profiles

Protein(single) phylogenetic profiles Segment(Multiple) phylogenetic profiles Residue phylogenetic profiles

Computing Co-evolutionary matrices

Deriving phylogenetic similarity scores

Protein phylogenetic profiles

Phylogenetic profile is a vector which tells about the existence of a protein in a genome.

Let P = {P1,P2,…,Pn} be the

set of proteins and,

G = {G1,G2,…,Gm} be the set

of Genomes Every row represents binary

phylogenetic profile of a protein.

Protein phylogenetic profiles(contd.)

Single phylogenetic profile ψi for protein Pi is,

ψi(j) = - 1 , 1 <= j <= m

log(Eij)

where Eij is minimum BLAST E-value of local

alignment between Pi and Gj

Advantage: gives degree of sequence divergence

Protein phylogenetic profiles(contd.)

Mutual Information I(X,Y) defined as,

I(X,Y) = H(X) + H(Y) – H(X,Y),

where H(X), Shannon Entropy of X is defined as,

H(X) = ∑ px * log(px), x Є X

and px = P[X = x]

Phylogenetic similarity between ψi(j) and ψi(j) is,

μs(Pi,Pj) = I(ψi, ψi)

Segment phylogenetic profiles

Single profile based methods could miss significant interactions.

Domain D12 of P2 follows evolutionary trajectory

similar to P1 and P3 which single profile method didn’t capture.

Segment phylogen. profiles(contd.)

Dividing each protein Pi into fixed size segments S1i,S2

i,…,Sk

i

Phylogenetic similarity between two proteins,

μM(Pi,Pj) = max I(ψsi, ψt

j), s,t

where ψsi is phylogenetic profile of segment Sk

i of protein

Pi

Residue phylogenetic profiles

Problem with multiple phylogenetic profiles:

Both domains covered together by the segment S22,

overriding their individual phylogenetic profiles. Significant local alignment between two proteins

corresponds to the residues covered in the alignment rather than the whole sequences.

Residue phylog. profiles(contd.)

A(Pi,Gj) – set of significant local alignments between

Protein Pi and Genome Gj

T(A) = [rb,re] – interval of residues on Pi

corresponding to each alignment A Є A(Pi,Gj)

For each residue r on Pi phylogenetic profile is

ψri(j) = min - 1 , 1 <= j <= m

A Є A r log(E(A))

Ar = {A Є A(Pi,Gj): r Є T(A)} is the set of local

alignments that contain r

Computing co-evolutionary matrices

For each protein pair Pi and Pj with lengths li and lj,

co-evolutionary matrix entry Mij(r,s) is,

Mij(r,s) = I (ψri, ψs

j),

where 1 <= r <= li and 1 <= s <= lj

The Co-evolutionary Matrix contains Information about which regions of the two proteins co-

evolved The co-evolved domain(s) appear as a block of high

mutual information scores in the matrix

Deriving phylogenetic similarity scores Phylogenetic similarity scores between two proteins

Pi and Pj is,

μC(Pi,Pj) = max min Mij(a,b) 1<= r <= li r <= a <= r + W

1<= s <= lj s <= a <= s + W

where W is the window parameter that quantifies the minimum size of the region on a protein to be considered as a conserved domain.

Results

Implemented and tested on 4311 E.coli proteins 152 Genomes(131 Bacteria,17 Archaea,4 Eukaryota) Value of f (down-sampling factor) = 30, W = 2 These values translate in overlapping segments of 60

residue long Excluded homologous proteins from analysis Define p-value as fraction of non-homologous protein pairs

(N)

Results (contd.)

MIS – Mutual Information Score PP – No. of predicted protein pairs PPV = TP / (TP + FP) For all μ*, coverage = TP + FP TN and FN are the no. of protein pairs that do not meet the threshold

Results (contd.)

Co-evolutionary matrix has 1.5 times greater coverage at PPV = 0.7 than the single profile method

At same no. of PP, Co-evolutionary matrix has better PPV and sensitivity values than single profile method

Results (contd.)

Mutual Information score distribution for interacting and non-interacting protein pairs At 0 MIS, SP shows a

peak while CM doesn’t. In other ways, at low MIS scores, SP scores over CM

Results (contd.)

Shows p-values of Single Profile method v/s Co-evolutionary Matrix method Scattered circles show that

the two methods can predict very differently

Results (contd.) – Phosphotransferase system

Domain IIA(residues 1-170) and domain IIB(residue 170-320) Darker region shows that the domains have co-evolved. So we can

conclude that IIB evolved with IIC rather than IIA

Top-20 predicted interacting partners of protein IIAB for both methods

Results (contd.) - Chemotaxis

N-terminus of CheA(residues 1-200) and C-terminus of CheA(residues 540-670) co-evolved with C-terminus region of CheB (residues 170-340)

Top-20 predicted interacting partners of protein CheA using both methods

Results (contd.) – Kdp System

N-terminal domain of KdpD (residues 1-395) co-evolved with KdpC

Top-10 predicted interacting partners of protein KdpD using both methods

Conclusion

Results in this paper strongly suggest that co-evolution of proteins should be captured at the domain level Because domains with conflicting evolutionary histories

can co-exist in a single protein sequence Regions that are important for supporting both

functional and physical interactions between proteins can be detected

Questions

Thank You !!