drug-target interaction prediction using semantic similarity and … · 2018. 11. 26. ·...
TRANSCRIPT
Drug-Target Interaction PredictionUsing Semantic Similarity and Edge
Partitioning
Guillermo Palma1, Maria-Esther Vidal1, and Louiqa Raschid2
1Universidad Simón Bolívar, Venezuela2 University of Maryland, USA
ISWC 2014, Riva de Garda, Italy. October 2014
Universidad Simón Bolívar
1
Increases median survival to nearly five years for HER2-positiveMetastatic Breast Cancer.
Docetaxel
Trastuzumab
Pertuzumab
http://www.drugdevelopment-technology.com/projects/pertuzumab/http://www.nasdaq.com/article
News - September 28th 2014.CLEOPATRA PROJECT*
Effectiveness of thecombination of two Monoclonal drugsin a chemotherapytreatment.
2
Increases median survival to nearly five years for HER2-positiveMetastatic Breast Cancer.
Docetaxel
Trastuzumab
Pertuzumab
http://www.drugdevelopment-technology.com/projects/pertuzumab/http://www.nasdaq.com/article
News - September 28th 2014.CLEOPATRA PROJECT*
Effectiveness of thecombination of two Monoclonal drugsin a chemotherapytreatment.
Can computational tools be used to predict combinations of drugs?
2
Inhibits the ability of HER2 to interact with
3
Drugs of the same family-Monoclonal drugs
Inhibits the ability of HER2 to interact with
3
Drugs of the same family-Monoclonal drugs
Proteins of the same family-HER family
Inhibits the ability of HER2 to interact with
3
Inhibits the ability of HER2 to interact with
4
Inhibits the ability of HER2 to interact with
d1
d2
Drugs
Similar
4
Inhibits the ability of HER2 to interact with
d1
d2
Drugs
Similar
Targets
Similar
t1
t2
4
Inhibits the ability of HER2 to interact with
d1
d2
Drugs
Similar
Targets
Similar
t1
t2
4
Inhibits the ability of HER2 to interact with
d1
d2
Drugs
Similar
Targets
Similar
t1
t2
4
Drug-Target InteractionsDrugs Targets
5
Drugs
DrugSemantic Similarity Measure
Targets
Drug-Target Interactions
Chemical Space Similarity 6
Drugs
DrugSemantic Similarity Measure
TargetSemantic Similarity Measure
Targets
Drug-Target Interactions
Chemical Space Similarity Genomic Space Similarity7
Drugs
DrugSemantic Similarity Measure
TargetSemantic Similarity Measure
Targets
Drug-Target Interactions
Chemical Space Similarity Genomic Space Similarity8
Drugs
DrugSemantic Similarity Measure
TargetSemantic Similarity Measure
Targets
Drug-Target Interactions
Drug-Target Interactions
Chemical Space Similarity Genomic Space Similarity9
Drugs
DrugSemantic Similarity Measure
TargetSemantic Similarity Measure
TargetsDrug-Target Interactions
Drug-Target Predictions
Drug-Target Interactions
Chemical Space Similarity Genomic Space Similarity10
Main Contributions1
11
Main Contributions1 2
ML Method
Y YsemEP Ycntrl
BLM 0.888 0.911 0.798
NBI 0.833 0.900 0.769
GIP 0.943 0.958 0.843
LapRLS 0.941 0.956 0.844
KBMF2K 0.939 0.960 0.845
12
Main Contributions1
✔
3
2
ML Method
Y YsemEP Ycntrl
BLM 0.888 0.911 0.798
NBI 0.833 0.900 0.769
GIP 0.943 0.958 0.843
LapRLS 0.941 0.956 0.844
KBMF2K 0.939 0.960 0.845
13
Agenda
1
2
Semantics Based Edge Partitioning Problem (semEP)
Empirical Evaluation
14
SEMANTICS BASED EDGE PARTITIONING PROBLEM (SEMEP)
1
15
semEP
semEP: Semantic Based Edge Partitioning
16
semEP
semEP: Semantic Based Edge Partitioning
Similarity Measures
16
semEP
semEP: Semantic Based Edge Partitioning
Similarity Measures
16
d1
d2
d3
d4
d5
t1
t2
t3
t4
t5
i1
i2
i3
i4
i5i6i7
i8
i9
Semantics Based Edge Partitioning Problem (semEP)
17
d1
d2
d3
d4
d5
t1
t2
t3
t4
t5
i1
i2
i3
i4
i5i6i7
i8
i9
Semantics Based Edge Partitioning Problem (semEP)
Minimize the number of clusters.Density of the Clusters is Maximized.Semantics encoded in the ontologies is used to compute similarity between entities.
17
d1
d2
d3
d4
d5
t1
t2
t3
t4
t5
i1
i2
i3
i4
i5i6i7
i8
i9
Semantics Based Edge Partitioning Problem (semEP)
semEP is the problem of partitioning the edges of the bipartite graph intothe minimal set of clusters that maximize the cluster density.
Minimize the number of clusters.Density of the Clusters is Maximized.Semantics encoded in the ontologies is used to compute similarity between entities.
17
Mapping to Vertex Coloring Graph (VCG)
semEP bipartite graphd1
d2
d3
t1
t2
t3
i1
i4 i3
i2
18
Mapping to Vertex Coloring Graph (VCG)
semEP bipartite graphd1
d2
d3
t1
t2
t3
i1
i4 i3
i2
18
Mapping to Vertex Coloring Graph (VCG)
semEP bipartite graph
i2 i3
i1 i4
VCGd1
d2
d3
t1
t2
t3
i1
i4 i3
i2
18
Mapping to Vertex Coloring Graph (VCG)
semEP bipartite graphd1
d2
d3
t1
t2
t3
i1
i4 i3
i2
i2 i3
i1 i4
VCG
d2 is not similar to d1
Edges in the bipartite graph are mapped to edges in VCG.There is an edge e between two nodes i1=(t1,d1) and i2=(t2,d2) in VCG iff:
sim(t1,t2) < θt, threshold on the similarity of t1 and t2, OR sim(d1,d2) < θd, threshold on the similarity of d1 and d2
19
Mapping to Vertex Coloring Graph (VCG)
semEP bipartite graph VCG
i2 i3
i1 i4d2 is not similar to d1
Edges in the bipartite graph are mapped to nodes in VCG.There is an edge e between two nodes i1=(t1,d1) and i2=(t2,d2) in VCG iff:
sim(t1,t2) < θt, threshold on the similarity of t1 and t2, OR sim(d1,d2) < θd, threshold on the similarity of d1 and d2
d1
d2
d3
t1
t2
t3
i1
i4 i3
i2
20
Mapping to Vertex Coloring Graph (VCG)
semEP bipartite graph VCG
i2 i3
i1 i4d2 is not similar to d1
t2 is not similar to t3
Edges in the bipartite graph are mapped to nodes in VCG.There is an edge e between two nodes i1=(t1,d1) and i2=(t2,d2) in VCG iff:
sim(t1,t2) < θt, threshold on the similarity of t1 and t2, OR sim(d1,d2) < θd, threshold on the similarity of d1 and d2
d1
d2
d3
t1
t2
t3
i1
i4 i3
i2
21
The Vertex Coloring Problem Coloring the vertices of a graph such
that no two adjacent vertices share the same color.
The Vertex Coloring Problem seeks to minimize the number of colors for a given graph.
The Vertex Coloring is an NP-hard problem [Garey 79].
i2 i3
i1 i4
22
cDensityd1
d2
d3
t1
t2
t3
sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4
23
cDensityd1
d2
d3
t1
t2
t3
sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4
23
cDensityd1
d2
d3
t1
t2
t3
sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4
23
cDensityd1
d2
d3
t1
t2
t3
sim(d1,d3)=sim(d2,d3)=sim(t1,t3)=sim(t2,t3)=0.1sim(d1,d2)=sim(t1,t2)=0.4
23
The Vertex Coloring Problem Coloring the vertices of a graph such
that no two adjacent vertices share the same color.
The Vertex Coloring Problem seeks to minimize the number of colors for a given graph.
The Vertex Coloring is an NP-hard problem [Garey 79].
i2 i3
i1 i4
24
The Vertex Coloring Problem Coloring the vertices of a graph such
that no two adjacent vertices share the same color.
The Vertex Coloring Problem seeks to minimize the number of colors for a given graph.
The Vertex Coloring is an NP-hard problem [Garey 79].
semEP implements the well-known approximate algorithm DSATUR algorithm to solve the Vertex Coloring Problem and to partition the Bipartite Graph Edges.
i2 i3
i1 i4
25
EMPIRICAL EVALUATION2
26
Evaluation on Drug-Target Interactions
Drug Similarity: drug-drug chemical similarityscore based on the hashed fingerprints from SMILES
Target Similarity: target-target similarity scorebased on the normalized Smith-Waterman sequence similarity score.
• 900 Drugs, 1,000 Targets and 5,000 Interactions: Nuclear receptor, Gprotein-coupled receptors (GPCRs), Ion channels, and Enzymes.
K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local models. Bioinformatics, 25(18).2009.
Data from Drugbank
27
semEP Predictions
Prediction probability:p1
28
Evaluation Protocol
A 10-fold cross validation:• Training data: Randomly selected 90% of
positive and negative interactions.• Test data: remaining 10% of the
interactions.
29
State-of-the-art Machine Learning Methods
• BLM: Bipartite Local Method [Cheng et al] • LapRLS: Laplacian Regularized Least Squares
[Xia et al]• GIP: Gaussian Interaction Profile [Van
Laarhoven et al]• KBMF2K: Kernelized Bayesian Matrix
Factorization with twin Kernels [Gonen]• NBI: Network-Based Inference [Cheng et al]
30
Experiment IResearch Question: Can semEP predictions
improve the performance of state-of-the art prediction methods?
Evaluation Protocol:Set of interactions of the benchmark
(positive and negative predictions).
Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions
added.
The same number of random predictions are added to the predictions of Y.
Ysem
EP
Ycnt
rlY
31
Experiment IResearch Question: Can semEP predictions
improve the performance of state-of-the art prediction methods?
Evaluation Protocol:Set of interactions of the benchmark
(positive and negative predictions).
Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions
added.
The same number of random predictions are added to the predictions of Y.
Ysem
EP
Ycnt
rlY✔
31
Experiment IResearch Question: Can semEP predictions
improve the performance of state-of-the art prediction methods?
Evaluation Protocol:Set of interactions of the benchmark
(positive and negative predictions).
Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions
added.
The same number of random predictions are added to the predictions of Y.
Ysem
EP
Ycnt
rlY✔
✔
31
Experiment IResearch Question: Can semEP predictions
improve the performance of state-of-the art prediction methods?
Evaluation Protocol:Set of interactions of the benchmark
(positive and negative predictions).
Best semEP predictions (probability>0.5) are added to the initial positive predictions of Y.No more than 30% of positive predictions
added.
The same number of random predictions are added to the predictions of Y.
Ysem
EP
Ycnt
rlY✔
✔
✔31
Evaluation of semEP and State-of-the-art Machine Learning Methods
ML Method Y YsemEP Ycntrl
BLM 0.888 0.911 0.798
NBI 0.833 0.900 0.769
GIP 0.943 0.958 0.843
LapRLS 0.941 0.956 0.844
KBMF2K 0.939 0.960 0.845
AUC for the GPCR dataset
semEP is able to improve performance of all the methods
Performance of the methods degrades for Ycntrl.
32
Evaluation of semEP and State-of-the-art Machine Learning Methods
ML Method Y YsemEP Ycntrl
BLM 0.888 0.911 0.798
NBI 0.833 0.900 0.769
GIP 0.943 0.958 0.843
LapRLS 0.941 0.956 0.844
KBMF2K 0.939 0.960 0.845
ML Method Y YsemEP Ycntrl
BLM 0.472 0.481 0.327
NBIds 0.615 0.719 0.467
GIP 0.705 0.764 0.563
LapRLS 0.630 0.704 0.517
KBMF2K 0.673 0.760 0.544
AUC for the GPCR dataset
AUPR for the GPCR dataset
semEP is able to improve performance of all the methods
Performance of the methods degrades for Ycntrl.
33
Overlap of Top10 positive predictions of semEP
The overlap is remarkably low. Results suggest that semEP predictions are accurate and diverse.All these techniques explore different spaces.
ML Method
Nuclear Receptor GPCR Ion channel Enzyme
Equal Different Equal Different Equal Different Equal Different
BLM 1 9 0 10 0 10 0 10
NBI 0 10 1 9 0 10 0 10
GIP 2 8 1 9 0 10 3 7
LapRLS 4 6 1 9 0 10 2 8
KBMF2K 4 6 0 10 0 10 0 10
34
Experiment IIResearch Question: Can semEP novel
predictions be validated?Evaluation Protocol:Top5 novel predicted interactions are
validated in the STITCH drug-target interaction website.Novel predicted interaction are interactions
that do not appear in the dataset.
STITCH: http://stitch.embl.de/
35
STITCH http://stitch.embl.de/
36
STITCH http://stitch.embl.de/
36
STITCH http://stitch.embl.de/
36
STITCH http://stitch.embl.de/
36
Validation of Top 5 Drug-Target Interactions (Novel predictions)
Top 5 were validated in STITCH http://stitch.embl.de/
Novel predicted interactions are interactions that do not appear in the dataset. semEP novel predicted interactions can be validated across all target groups.
ML Method Nuclear Receptor GPCR Ion Channel Enzyme
semEP 4 5 1 4
BLM 2 1 0 0
NBI 1 1 1 2
GIP 3 3 1 1
LapRLS 5 3 2 2
KBMF2K 3 4 2 2
37
Analyzing Top Drug-Target Interactions (Novel predictions for GPCRs)
Top 2, 3, and 4:D02076 hsa:146D02076 hsa:147D00604 has:147Probability: 0.8 38
Conclusions1
39
Conclusions1 2
ML Method
Y YsemEP Ycntrl
BLM 0.888 0.911 0.798
NBI 0.833 0.900 0.769
GIP 0.943 0.958 0.843
LapRLS 0.941 0.956 0.844
KBMF2K 0.939 0.960 0.845
40
Conclusions1
✔
3
2
ML Method
Y YsemEP Ycntrl
BLM 0.888 0.911 0.798
NBI 0.833 0.900 0.769
GIP 0.943 0.958 0.843
LapRLS 0.941 0.956 0.844
KBMF2K 0.939 0.960 0.845
41
Future Directions Apply semEP to other domains, e.g., to predict drug-
drug interactions or adverse drug events or gene GO annotations.
http://informatics.mayo.edu/adepedia/index.php/Main_Pagehttp://omictools.com/metaadedb-s5660.html
Adverse Drug EventsDrugs
42
MANY THANKS!QUESTIONS
https://code.google.com/p/semep/Code Available at:
43
Solutions to semEP
Partition with greater density given that d1,d2, and d3 are similar, as well as t1, t2, and t3.
Solutions to semEP
Partition with greater density given that d1,d2, and d3 are similar, as well as t1, t2, and t3.
Solutions to semEP
Partition with greater density given that d1,d2, and d3 are similar, as well as t1, t2, and t3.