collective prediction of multiple types of links in heterogeneous...

19
12/16/2014 Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks Bokai Cao 1 , Xiangnan Kong 2 , Philip S. Yu 1 1 University of Illinois at Chicago 2 Worcester Polytechnic Institute

Upload: others

Post on 26-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

12/16/2014!

Collective Prediction of Multiple Types of Links in Heterogeneous Information Networks���Bokai Cao1, Xiangnan Kong2, Philip S. Yu1!1University of Illinois at Chicago!2Worcester Polytechnic Institute!

Page 2: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Heterogeneous Information Networks (HINs)���

✤  Twitter social network!

✤  Nodes: users, tweets, locations!

✤  Links: follow, post, forward, check-in!

✤  DBLP academic network!

✤  Nodes: authors, papers, journals!

✤  Links: authored by, cite, publish in!

Page 3: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

A Bioinformatics Heterogeneous Information Network Schema���

Page 4: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Applications of Link Prediction in Bioinformatics���

4!

Predicted Link Type! Application!Gene —> Gene! Protein-Protein Interaction Prediction!

Gene —> Disease! Prioritization of Candidate Disease Genes!

Gene —> Gene Ontology! Automated Gene Ontology Annotation!

Chemical Compound —> Gene! Drug-Target Interaction Prediction!

Chemical Compound —> Disease! Drug Discovery!

Chemical Compound —> Side Effect! Drug Side Effect Profiling!

Chemical Compound —> Chemical Ontology! Automatic Annotation of Chemical Ontology!

Page 5: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

A Bioinformatics Heterogeneous Information Network���

5!

Gene1!

Gene2!

Gene3!

Drug1!

Drug2!

Drug3!

PPI!

PPI!

Disease1! Disease2! Disease3!

bind!

causeDisease!

treatDisease!

Page 6: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Linkage Homophily Principle���

✤  Conventional proximity measures for homogeneous links can not work on heterogeneous link prediction. !

✤  Linkage Homophily Principle !!Two nodes are more likely to be directly linked if

most of their respective similar nodes are linked.!

6!

Disease!Gene!causeDisease!

Page 7: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

7!

Gene! Drug!

Disease!Gene!

Gene! Gene!PPI!

Gene!bind-1! bind!

causeDisease!Gene!

causeDisease-1!

Generalized neighbors of source nodes!

Meta-path based Source Node Similarities���

Page 8: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Meta-path based Target Node Similarities���

8!

Drug!

Gene!Disease!causeDisease-1!

Disease!causeDisease!

Disease! Disease!treatDisease-1! treatDisease!

Generalized neighbors of target nodes!

Page 9: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Node Similarity Extraction in HINs���

9!

Page 10: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Relatedness Measure���

Weighted by the product of similarity between generalized neighbors and the node pair.!

10!

Relatedness measure =!

#existing links between generalized neighbors of si and tp!

#maximum potential links between generalized neighbors of si and tp!

Formulated based on the Linkage Homophily Principle:!

Page 11: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Multiple Types of Links in HINs���

11!

Gene!A!

Drug!B!

Disease!C!

causeDisease!

bind!

treatDisease!

?!

?!

Gene!A!

Drug!B!

Disease!C!

Gene!A!

Drug!B!

Disease!C!

can imply!

Page 12: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Link Prediction in HINs���

✤  Conventional link prediction approaches assume that links are independent identically distributed (i.i.d.).!

✤  In fact, links are interconnected via adjacent nodes in a network, thus are correlated with each other.!

u! r! y1!

u! s! y2!

u! t! y3!

link! label!

r!

s! t!u!

independent!

related!y1!

y3!y2!

12!

Page 13: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Heterogeneous Collective Link Prediction (HCLP)���✤  Given:!

✤  Li: a set of existing links with type i, i=1,…,m!

✤  Ui: a set of unlabeled links with type i, i=1,…,m!

✤  Loop for k iterations:!

✤  Loop i for m link types:!

✤  Compute the relatedness measure from the HIN for Li!

✤  Use Li to train a classifier hi!

✤  Allow hi to label examples from Ui!

✤  Add these self-labeled examples with the most confidence into the HIN!13!

Page 14: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Analogy to Co-training���

✤  Different views share the same set of labels!

! !HCLP: different types of links coexist in a HIN!

✤  Exchange the most confident predictions on unlabeled data!

! !HCLP: add the most confident predictions on each type of unlabeled links into the network!

14!

Page 15: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Experiments���SLAP Network Schema���

15!

#node types = 10!#link types = 11!#nodes > 290k!#links > 720k!

Page 16: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Compared Methods���

16!

Independent link prediction!

Collective link prediction!

Normalized path count! ILP(NPC)! HCLP(NPC)!

Random walk! ILP(RW)! HCLP(RW)!

Symmetric random walk! ILP(SRW)! HCLP(SRW)!

Relatedness measure! ILP(RM)! HCLP(RM)!

Page 17: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Experimental Results���

17!

Page 18: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Robustness Analysis���

✤  HCLP algorithm can effectively leverage complementary information from other link types, and improve the performances of predicting a particular link type even with its limited training data through a process of heterogeneous collective link prediction.!

18!

Page 19: Collective Prediction of Multiple Types of Links in Heterogeneous …bcao1/doc/icdm14a_slides.pdf · 2014-12-18 · HCLP algorithm can effectively leverage complementary information

Conclusions���

✤  Studied the problem of collective link prediction in HINs:!

✤  designed a relatedness measure between different types of objects based on the linkage homophily principle;!

✤  proposed an iterative framework to predict multiple types of links collectively.!

✤  Performances of link prediction can be improved by collectively predicting multiple types of links in HINs.!

19!