inferring strengths of protein-protein interactions from experimental data using linear programming...
TRANSCRIPT
Inferring strengths of protein-protein interactions from experimental data using linear programming
Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu
Bioinformatics Center,Kyoto University
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Background (1/3)
Understanding protein-protein interactions is useful for understanding of protein functions. Transcription factors
Proteins interact with a factor. Regulate the gene.
Receptors, etc.
Background (2/3)
Various methods were developed for inference of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. a
nd Marcotte et al. 1999) Number of possible genes to be applied is limit
ed. Molecular dynamics
Long CPU time Difficult to predict precisely
Background (3/3)
A Model based on domain-domain interactions has been proposed. Use domains defined by databases
like InterPro or Pfam.
Domain
Domain
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Probabilistic model of interaction (1/2)
Model (Deng et al., 2002) Two proteins interact. At least one pair of domains
interacts. Interactions between domains are
independent events.D1
D2
D3
D2 D4
P2P1
: Proteins Pi and Pj interact : Domains Dm and Dn interact : Domain pair (Dm ,Dn) is include
d in protein pair PiX Pj
Probabilistic model of interaction (2/2)
Overview Background Probabilistic model Related work
Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002)
Biological experimental data Proposed methods Results of computational experiments Conclusion
Related work
INPUT: interacting protein pairs (positive examples) non-interacting protein pairs (negative example
s) OUTPUT: Pr(Dmn=1) for all domain pairs
Association method (Sprinzak et al., 2001)
Inference of probabilities of domain-domain interactions using ratios of frequencies
: Number of interacting protein pairs that include (Dm, Dn)
: Number of protein pairs that include (Dm, Dn)
EM method (Deng et al.,2002)
Probability (likelihood L) that experimental data {Oij={0,1}} are observed.
Use EM algorithm in order to (locally) maximize L.
Estimate Pr(Dmn=1)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Biological experimental data Related methods (Association and EM) use o
nly binary data (interact or not). Experimental data using Yeast 2 hybrid
Ito et al. (2000, 2001) Uetz et al. (2001)
For many protein pairs, different results (Oi
j = {0,1}) were observed.
We developed new methods using raw numerical data.
Numerical data
Ito et al. (2000,2001) For each protein pair, experiments
were performed multiple times. IST (Interaction Sequence Tag)
Number of observed interactions By using a threshold, we obtain binary
data.
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Proposed methods It seems difficult
to modify EM method for numerical data.
Linear Programming
For binary data LPBN Combined methods
LPEM EMLP
SVM-based method For numerical data
ASNM LPNM
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Combination of EM and LPBN
LPEM method Use the results of LPBN as initial
parameter values for EM. EMLP method
Constrains to LPBN with the following inequalities so that LP solutions are close to EM solutions.
Simple SVM-based method
Feature vector
Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative
examples
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Strength of protein-protein interaction
For each protein pair, experiments were performed multiple times.
The ratio can be considered as strength.
Kij : Number of observed interactions for a protein pair (Pi,Pj)
Mij : Number of experiments for (Pi,Pj)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments
Conclusion
Computational experimentsfor binary data
DIP database (Xenarios et al., 2002) 1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test
Computational environment Xeon processor 2.8 GHz LP solver: loqo
Computational experimentsfor numerical data
YIP database (Ito et al., 2001, 2002) IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test
Computational environment Xeon processor 2.8 GHz LP solver: lp_solve
Results on test data (numerical data)
LPNM is the best. EM and Association methods
classify Pr(Pij=1) into either 0 or 1.
LPNM ASNM
EM ASSOC
Ave. Error
0.0308 0.0405 0.295 0.277
CPU (sec.) 1.20 0.0077 1.62 0.0088
Conclusion We have defined a new problem to infer
strengths of protein-protein interactions.
We have proposed LP-based methods. For binary data
LPBN, LPEM, EMLP SVM-based method
For numerical data ASNM LPNM LPNM outperformed the other methods.