meta paths and meta structures: analysing large ...odefinition [sun et al. vldb 2011] 20 yizhou sun,...

102
Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks Reynold Cheng Zhipeng Huang Yudian Zheng Jing Yan Ka Yu Wong Eddie Ng Database Group:

Upload: others

Post on 05-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Meta Paths and Meta Structures:

Analysing Large Heterogeneous

Information Networks

Reynold

Cheng

Zhipeng

Huang

Yudian

Zheng

Jing

Yan

Ka Yu

Wong

Eddie

Ng

Database

Group:

Page 2: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Information is Everywhere !oSocial Networking Websites

2https://makeawebsitehub.com/social-media-sites/

Page 3: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Information is Everywhere !oBiological Network

3http://serious-science.org/controlling-noisy-dynamics-in-biological-networks-to-fight-cancer-5376

Page 4: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Information is Everywhere !

oResearch Collaboration Network

4https://scholarlykitchen.sspnet.org/2017/04/07/updated-figures-scale-nature-researchers-use-

scholarly-collaboration-networks/

Page 5: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Information is Everywhere !

oProduct Recommendation Network

5

http://www.sciencedirect.com/science/article/pii/S0957417413006921

Byunghak Leem. Heuiju Chun. An impact of online recommendation network on demand

Page 6: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

The Real WorldoHeterogeneous Information Network(s),

i.e. HIN(s).

oNetworks: Nodes & Links

– Nodes: Various Types

– Links: Various Types

6Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs

Page 7: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Example HINs

oDBLP Bibliographic

Network

oNetworks: Nodes & Links

– Node (Type):

• KDD (Venue)

• Jiawei Han (Author)

– Link (Type):

• Write (Author Paper)

• Publish (Paper Venue)

7Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Page 8: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Example HINs

oThe IMDB Movie

Network

oNetworks: Nodes & Links

– Node (Type):

• Forrest Gump (Movie)

• Tom Cruise (Actor)

– Link (Type):

• Make (Producer Movie)

• Act (Author Movie)

8Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Page 9: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Example HINs

oThe Facebook Network

oNetworks:

– Node (Type):

• Jimmy (User)

• Coca Cola (Product)

– Link (Type):

• Like (User Product)

• Follow (User User)

9Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Page 10: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

HINs are Ubiquitous !oHealthcare

– Doctor, Patient, Disease

oSource Code Repository

– Project, Developer, Repository

oE-Commerce

– Seller, Buyer, Product

oNews

– Author, Organization

10Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Page 11: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Knowledge Graph (KG)oTurn Web Knowledge into KG

11Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.

Page 12: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Knowledge Graph (KG)oExample KGs

12Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.

Page 13: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Knowledge Graph (KG)oStatistics in Existing KGs

13Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.

Page 14: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Problems in HIN

oLink Prediction

oEntity Profiling

oData Integration

14

Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs

Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S. Yu. COSNET: Connecting Heterogeneous Social

Networks with Local and Global Consistency, KDD 2015.

Page 15: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

15

Overview of the Tutorial

oRelevance Search

Find Similar/Relevant Objects in Networks

oExamples

DBLP1

▪ Who are most similar to Jiawei Han ?

▪ Whose recent publication is relevant with Jiawei Han’s research ?

1 http://dblp.uni-trier.de/

Page 16: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

oWhere do relations (meta-path)

come from?– Provided by experts [Sun VLDB’11]

• Not easy for a complex schema!

16

Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. “Discovering Meta-

Paths in Large Heterogeneous Information Networks”, in WWW 2015.

Overview of the Tutorial

Page 17: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

o Query Recommendation: to suggest

alternate relevant queries to a search

engine user

o How will HIN benefit query

recommendation ?

17

Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for

Long-Tail Queries. CIKM 2016.

Overview of the Tutorial

Page 18: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

o How can we express using more complexstructure?

o More Expressive (i.e., contain more information)than a meta path.

18

Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, Xiang Li. Meta Structure:

Computing Relevance in Large Heterogeneous Information Networks. KDD 2016.

Overview of the Tutorial

Page 19: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Similarity Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work19

Page 20: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Definition of Meta-PathoDefinition [Sun et al. VLDB 2011]

20Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Similarity Search in Heterogeneous Information Networks. VLDB 2011.

oExample

Page 21: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work21

Page 22: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

22

Relevance Search

oMotivation

Find Similar/Relevant Objects in Networks

oExamples

DBLP1

▪ Who are most similar to Jiawei Han ?

▪ Whose recent publication is relevant with Jiawei Han’s research ?

IMDb2

▪ Who are most similar to Tom Cruise ?

▪ Which movie is most relevant to Tom Cruise?

1 http://dblp.uni-trier.de/2 http://www.imdb.com/

Page 23: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

23

Relevance Search

oTarget

To answer these questions systematically

oSolutions

How to measure the similarity?▪ Define a Effective Similarity Function like Cosine, Euclidean

distance, Jaccard coefficient.

Structure similarity or Semantic similarity?▪ Structure Similarity: Based on structural similarity of sub‐network

structures. (like SimRank and PPR)

▪ Semantic Similarity: influenced by similar network structures. This

matters more for HIN! Semantic->edge relations

Page 24: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

24

SimRank

oModel

Idea: Two objects are similar if they are referenced

by similar objects

oDefinition▪ S(a,b) = Average similarity between in-neighbors of object a I(a)

and in-neighbors of object b I(b). Between [0, 1].

▪ S(a,b) = 1, if a=b

= , if a≠b

where c is the constant and 0<c<1

[Jeh, Glen, and Jennifer Widom. KDD’02] Jeh, Glen "SimRank: a measure of structural-context similarity."

Page 25: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

25

SimRank

oExample

S(a,a) = 1

S(a,b) =𝑐

1×1× 1 = c

▪ S(a,b) ideally should be 1.

▪ But, in reality the graph does not describe everything about

them, so by using the C to make s(a,b)<1. Adding C is to

expresses limited confidence or decay with distance.

x

b

a

[Jeh, Glen, and Jennifer Widom. KDD’03] Jeh, Glen "SimRank: a measure of structural-context similarity."

Page 26: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

26

Personalized PageRank (PPR)

oModel

Idea: Originally defined by Google as

a measure of importance for web-pages.

oDefinition▪ Given a graph G, a starting source node s, a target node t, and

a teleport probability 𝛼. Perform random walk from s. At each

step stop with the probability 𝛼, otherwise continue

performing random walk.

▪ Then the Personalized PageRank from s to t is

PPR𝑠~𝑡 = P(𝒔 → 𝒕)

[Jeh, Glen, and Jennifer Widom. WWW’02] Jeh, Glen "Scaling personalized web search."

Page 27: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

27

Personalized PageRank (PPR)

oExample

Starting from A, and 𝛼 = 0.2

For each target A, B, C, D

oCalculation

Iterative computation (Power Method);

Monte-carlo simulation (Approximation);

Bookmark Coloring Algorithm, and etc…

Page 28: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

28

Path Constrained Random Walk

oModel

Random walk on given paths.

oDefinition

▪ Performing random walks on given meta-paths with

the fixed starting point and target point.

▪ PCRW: Transition probability of the random walk

following a given meta-path.

▪ Between [0, 1].

PCRW(𝑠, 𝑡|𝚷) = P(𝒔 → 𝒕|𝚷)

[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”

Page 29: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

29

Path Constrained Random Walk

oExample

= P1 -> P2 -> P3

1. Pro(B.Obama | P1)=1

2. Pro(M.A. Obama | P2) = Pro(B.Obama | P1) / 2 = 0.5

Pro(N.Obama | P2) = Pro(B.Obama | P1) / 2= 0.5

3. Pro(M.Obama | P3) = Pro(M.A. Obama | P2) /2 + Pro(N.Obama | P2) /2 = 0.5

Pro(B.Obama | P3) = Pro(M.A. Obama | P2) 2 + Pro(N.Obama | P2) /2 = 0.5

[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”

Person Person Person

hasChild hasChild-1

m1

m1

Page 30: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

30

PathSim

oModel

Path Counts (PC):

#paths following a given meta-path

oDefinition

▪ Can only be applied on symmetric meta paths

(consider the node type and link type)

▪ Normalized version of PC. Between [0, 1].

▪ PathSim s, t | m =2×PC(s,t|m)

PC s,s +PC(t,t)

[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous

Information Networks

Page 31: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

31

PathSim

oExample

PC(B.Obama, M.Obama)=2

PC(B.Obama, B.Obama)=2

PC(M.Obama, M.Obama)=2

PS(B.Obama,M.Obama)=2*2/(2+2) =1

[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous

Information Networks

Person Person Person

hasChild hasChild-1

m1

Page 32: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

32

HeteSim

oModel

Improvement of SimRank for

Heterogeneous Information Network

oDefinition

▪ Any arbitrary meta paths.

▪ Given relations

[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.

Page 33: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

33

HeteSim

oExample

𝒎𝟏= P1 -> P2 -> P3

HeteSim (B.Obama, M.Obama|𝑚1)=1

|𝑂𝐵.𝑂𝑏𝑎𝑚𝑎|+|𝐼𝑀.𝑂𝑏𝑎𝑚𝑎|(𝐻𝑒𝑡𝑒𝑆𝑖𝑚(M.A.Obama, M.A.Obama)+Hetesim(N.Obama, N.Obama))

=𝟏

(𝟐×𝟐)𝟏 + 𝟏 = 𝟎. 𝟓

Person Person Person

hasChild hasChild-1

m1

[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.

Page 34: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Comparison

oFor PathSim, HeteSim and PCRW, even for the same

example they have different values.

oThese metrics are designed for different applications

or measurement scenarios.

oNo dominating similarity measurements so far.

34

Page 35: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

35

Other Measurements

oKnowSim (APWeb’14)

Measure similarity between nodes by RWs on given

meta-path and the reverse meta-path respectively.

oAvgSim (ICDM’16)

Measure the similarity of Documents by modeling them

into heterogeneous information networks.

oRelSim (SDM’16)

Measure the similarity relations in heterogeneous

information network.

Page 36: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

36

Summary

Structure

-based

Semantic-

basedSymmetric?

SimRank √ Yes

PPR √ Yes

PCRW √ No

PathSim √ Yes

HeteSim √ Yes

Page 37: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work37

Page 38: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Questions

oWhere do meta paths come from?– Provided by experts [Sun VLDB’11]

• Not easy for a complex schema!

– Enumeration within a given length of

meta paths [Cohen ECML’11]

• No clue about the length!

–How do I know the weights ?

38

Page 39: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Our Contributions (WWW’15)

oDesign a solution that:

– (1) Discovers the best meta paths

– (2) Learns the weights, without

maximum weight specified.

[Meng WWW’15] Changping Meng, Reynold Cheng, Silviu Maniu,

Pierre Senellart, and Wangda Zhang. “Discovering Meta-Paths in Large

Heterogeneous Information Networks”, in WWW 2015.

39

Page 40: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Meta-Path Framework

oFramework

Meta Path

Generation

Example

node pairs

Meta-paths

Relevance

Function

Knowledge

Graph

(Yago)

(B. Obama, M. Obama)

(B. Clinton, H. Clinton)

(Linear

Function)

G

F

Challenge: Each node and edge can have many

class labels. The number of candidate meta paths

grows exponentially with their path lengths.

40

Page 41: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Generating Meta-Paths

o In Two Phases

Example

node pairs

Meta-paths

Relevance

Function Knowledge

Graph

G

F

Link

Type

Node

Type

41

Page 42: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Phase 1: Link-Only Path Generation

oForward Stage-wise Path Generation (FSPG)

– iteratively generate the most related meta-paths and update

the model

Example

Pairs

Get one most

related meta

path m

Model

TrainingMeta path

m

Updated

model

FINISH

Based on the Least-Angle

Regression (LARS) model

[Efron, Ann.Stat’04]

Y

N

GreedyTree

Converg

e

To train the

weights on

meta paths.

42

Page 43: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

oGreedyTree

– A tree that greedily expands the node which has the largest

priority score

– Priority Score : related to the correlation between m and r

• m is the vector expression of a meta path, r is the residual vector

which evaluates the gap between the truth and current model

Phase 1: Link-Only Path Generation

43

Page 44: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

GreedyTree

Phase 1: Link-Only Path Generation

44

Page 45: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Phase 2: Node Class Generation

oWhy node classes?

– A link only meta path may introduce some unrelated result

pairs

– It is less specific

– Solution : Lowest Common Ancestor (LCA)

• Record the LCA in the node of GreedyTree

45

Page 46: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

ExperimentsoDatasets

– DBLP (4 areas: DB, DM, AI, IR)

• 14K papers, 14K authors, 9K topics, 20 venues.

– Yago

• A KG derived from Wikipedia, WordNet and

GeoNames.

• CORE Facts: 2.1 million nodes, 8 million edges,

125 edge types, 0.36 million node types

oLink-prediction evaluation

– Select n pairs of certain relationships as

example pairs

– Randomly select another m pairs to predict

the links46

Page 47: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experiment 1: Effectiveness

oBaseline: enumerate all meta paths within a given

max length L = 1, 2, 3, 4

– L is small low recall.

– L is large low precision.

ROC for link prediction47

Page 48: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experiment 2

oCase study: Yago citizenOf

– Better than direct link (PCRW 1)

– Better than best PCRW 2

– Better than PCRW 3,4

5 most relevant meta paths

for “citizenOf”

48

Page 49: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experiment 3: Efficiency

oFindings:

– In Yago, 2 orders of magnitude better than paths with

lengths more than 2.

– In DBLP, the running time is comparable to PCRW 5, but

the accuracy is much better.

Running time of FSPG

49

Page 50: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work50

Page 51: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

One Application

oQuery Recommendation: to suggest alternate relevant

queries to a search engine user

– 1) As you type;

– 2) Related queries

51

Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for

Long-Tail Queries. CIKM 2016.

Page 52: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Long Tail Distribution

oLong-tail queries: queries that are not commonly

requested by users

– “akira kurosawa influence george lucas”

52

Page 53: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Motivation

oUbiquitous:

– 84% of 10M queries appear no more than 3 times.

oNecessary:

– Existing works that only rely on query log alone can no

longer handle well these queries.

53

Page 54: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Query Log

oA set of user log <q, u, t, C>

– q: the query

– u: user id

– t: time stamp

– C: the clicked URLs

oSession: a time window, a mission.

oExisting methods rely on query logs to analyze the

flow among queries.

54

Boldi, Paolo, et al. "The query-flow graph: model and applications." Proceedings of the 17th ACM conference

on Information and knowledge management. ACM, 2008.

Bonchi, Francesco, et al. "Efficient query recommendations in the long tail via center-piece subgraphs."

Proceedings of the 35th international ACM SIGIR conference on Research and development in information

retrieval. ACM, 2012.

Page 55: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Knowledge Graph

55

Hoffart, Johannes, et al. "Yago2: a spatially and temporally enhanced Knowledge Graph from wikipedia."

(2012).

Page 56: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Relationship in the KG

oMeta path representation:

– P: city nextTo city

oQ: “weather Los Angeles”

– Rec:

• “weather Las Vegas”

• “weather San Diego”

56

[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous

Information Networks

Page 57: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

57

System Overview

oG = (Gqf, K, teq, P)

– Gqf is a query-flow graph

– K is a Knowledge Graph

– tEQ is a set of entity-query links

– P is a set of meta path to be extracted from

query log

Page 58: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Offline

58

oGqf is built as described in [1].

o teq is built from entity linking and

normalizing the weights.

oP:

– Get the set of entity pairs within the same

session: 𝒆𝒊, 𝒆𝒋 𝒆𝒊, 𝒆𝒋 ∈ 𝒔𝒌}

– Get the meta path between ei and ej (we

use the shortest path for simplicity)

– Stored by the type of ei

Page 59: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Online

Input query q:

LA weather

Step 1. Entity Linking Step 2. Entity Expansion

0.25

0.25

e1,1

e1,2

e1

meta path P1: city citynextTo

P1

P1

Entity Linking Tool

(e.g., Dexter2)

<San_Diego>

<Las_Vegas>

Recommendation:

q1 = San Diego weather

q2 = Las Vegas weather

Step 3. Query Searching

e1,1

e1,2

q1

q2

0.25

0.25

q1 = San Diego weather

q2 = Las Vegas weather

<Los_Angeles>

San Diego weather

Las Vegas weather

<San_Diego>

<Las_Vegas>e1 = <Los_Angeles>

e2 = <weather>

e2 P2

<weather>

meta path P2:property propertyisA

e20.5

0.5

<weather>

0.5

o Three Steps:

– Entity Linking (use existing tool)

– Entity Expansion (use P)

– Query Searching (PPR)

59

Page 60: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Step 1: Entity Linking

oGiven

– q = “weather Los Angeles”

oReturn:

– e1 = Los_Angeles

60

Ceccarelli, Diego, et al. "Dexter: an open source framework for entity linking." Proceedings of the sixth

international workshop on Exploiting semantic annotations in information retrieval. ACM, 2013.

Page 61: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Step 2. Entity Expansion

oGiven

– e1 = Los_Angeles

oUsing P:

– city NextTo city

oReturn

– e2 = Las_Vegas

– e3 = San_Diego

61

Page 62: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Step 3. Query Searching

oGiven:

– e2 = Las_Vegas

– e3 = San_Diego

oReturn:

– q1 = “weather las vegas”

– q2 = “weather san diego”

62

Page 63: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experiments

oDataset: AOL. 20M query instances from 9M distinct

queries.

oUse 10%, 50%, 90% for building the query log, and

10% for testing.

oTesting sets: We use 3, 5, 10 as the threshold for

long-tail queries. We name them L’3, L’5 and L’10.

oMeasures:

– Coverage

– Precision@5

63

Page 64: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experimental Results

64

Page 65: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Efficiency

oTime for offline:

oTime for entity linking:

– 60ms for Dexter2, and can reduce to 0.4ms if we use FEL

method.

65

Page 66: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work66

Page 67: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

67

Limitations of Meta Paths

oFail to discover common nodes in

different meta paths!

– E.g., a researcher wants to search for some

authors who have published papers in the

same venue and in the same topic with his

papers.a

1a

2a

3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

Page 68: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

68

Limitations of Meta Paths

oFail to discover common nodes in

different meta paths!

– E.g., a researcher wants to search for some

authors who have published papers in the

same venue and in the same topic with his

papers.a

1a

2a

3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

Page 69: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

69

Limitations of Meta Paths

oFail to discover common nodes in

different meta paths!

– E.g., a researcher wants to search for some

authors who have published papers in the

same venue and in the same topic with his

papers.a

1a

2a

3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

Page 70: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Meta Structure

o A meta structure is a directed acyclic graph (DAG) with a single source and sink (target) node

o More Expressive (i.e., contain moreinformation) than a meta path.

70

[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information

Networks” KDD 2016

Page 71: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work71

Page 72: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Relevance Measure 1: StructCount

oStructCount: extension of PathCount

oStructCount biases towards popular

objects with a large number of links.

StructCount(x0, y0 | S) = GraphIns(x0, y0 | S)

72

[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information

Networks” KDD 2016

Page 73: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Layers of Meta Structure

oThe layer of meta structure is a topological ordering

of a DAG

1 2 3 4 5

73

Page 74: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Relevance Measure 2: SCSE

oStructure Constrained Random Walk (SCSE):

extension of PCRW.

a1

a2

a3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

1.0

0.5

0.25

0.5

74

Page 75: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Relevance Measure 2: SCSE

a1

a2

a3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

75

Page 76: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Relevance Measure 3: BSCSE

oBiased Structure Constrained Random Walk (BSCSE):

extension of BPCRW.

– A combination of SC and SCSE

– SC 0 1 SCSE

77

[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information

Networks” KDD 2016

Page 77: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Relevance Measures: Summary

Meta Path Meta Structure Meaning

PathCount StructCount # of meta-path/structure instances

PCRW SCSERandom walk probability on meta-

path/structure

BPCRW BSCSE Combination of count and probability

78

Page 78: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

i-LTable

o Index the probability distribution starting from the i-th

layer of a meta structure.

a1

a2

a3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

Key / layer 3 Value

<ICDM,

social>

<Pei, 1.0>

<KDD,

mining>

<Pei, 0.5>

<Han, 0.5>

<VLDB,

efficient>

<Han, 1.0>

<VLDB,

privacy>

<Yang, 1.0>

<AAAI,

efficient>

<Yang, 1.0>

79

Page 79: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experiment: Entity Resolution

oOn YAGO, we have duplicated

entities, e.g., Barack_Obama and

Presidency_Of_Barack_Obama

o We retrieve the top-k pairs; the high

relevance of the node pairs indicates

that the nodes are duplicated

oArea under PR-Curve (AUC)

80

Page 80: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Experiment: Entity Resolution

P1 P2

Measure PathCou

nt

PCRW PathSim PathCou

nt

PCRW PathSim

AUC 0.1324 0.0120 0.0097 0.0003 0.0014 0.0002

Linear Combination(optimal ) Meta Structure S

Measure PathCou

nt

PCRW PathSim SC SCSE BSCSE*

AUC 0.2898 0.2606 0.2920 0.5556 0.5640 0.564081

Page 81: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work82

Page 82: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Meta-Paths Demo

Welcome to Meta-Paths

Interactive toolbox for querying

Heterogeneous Information Networks

by Examples

Start Learn More

83

Page 83: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

New Query

84

Page 84: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

FSPG Execution

The FSPG

algorithm will be

triggered on the

server, returning

the results upon

completion.

85

Page 85: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Generated Meta-Paths

86

Page 86: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Node Pair Generation

Results of the

Meta-Paths

algorithm are

shown. Upon

clicking

“Proceed”, the

node pair

generation

service will be

triggered on the

server.

87

Page 87: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Suggested Node Pairs

The user can

remove some of

the suggested

node pairs, and

use the remaining

pairs to refine the

Meta-Paths in an

iterative manner.

88

Page 88: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Fine-tuning Node Pairs

Click “Proceed” to

start the next

iteration, or

“Finish” to view

the final query

results.

89

Page 89: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Next Iteration

90

Page 90: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Final Results

Click “Save” to

keep a copy of

the query results.

Alternatively, click

“New” to start a

new query.

91

Page 91: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Final Results

92

Page 92: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work93

Page 93: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

94

Conclusions

oHeterogeneous Information Networks

are more powerful than Homogeneous

Information Networks

oMeta-path can capture the relevances

(similarities) between two nodes

oMeta-structure captures more complex

relationships in structures

Page 94: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

95

Future Work

Dynamic Similarity Search on

Meta-Paths

oSometimes the direct relevance search

can not reveal the true relationship

among entities.

oSolutions: Dynamic Network Search

oProblems: 1. No efficient top-k query

algorithms. 2. No predicates or posterior

knowledge of the network

oML methods could help!

Page 95: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

96

Future Work

Ming HINs with Meta Structure

oUse Meta Structure to perform various

data mining tasks on HINs, e.g.,

recommendation, classification and

clustering.

oDesign effective and efficient

techniques to discover meta structure

to express the relationship between

entities.

Page 96: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

97

Future Work

Knowledge Graph exploration

oQ1: Given an entity of interest in a KG,

use different meta paths and meta

structures to find related entities,and

sort them according to relevance.

oQ2: Given some entity pairs, find some

meta structures to account for their

relationships (meta path version has

been solved).

Page 97: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

98

Future Work

Personalized Knowledge Graph

oPersonalized Recommendation is

popular and useful in recommendation.

oRich information from query logs.

oQuestions: How to build a Personalized

KG for each user?

oStorage and efficiency

oPrivacy issues

Page 98: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

99

Future Work

Knowledge Graph maintenance

oQ1: Build a domain-specific KG from

some given entity samples and a

document corpus.

oQ2: Expand a KG by crawling info from

internet.

oQ3: Error detection within a KG using

meta path and meta structures.

oQ4: Error correction automatically.

Page 99: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

100

Future Work

Knowledge Graph cleaning

oRelations / Nodes in KG are inherently

“dirty” (many are curated based on

automatic tools / scripts, which lead to

duplications or error data)

oHow to clean the Knowledge Graph by

removing dirty relations / nodes ?

Page 100: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

101

Future Work

Machine Learning

oMachine learning / deep learning is so

hot nowadays !

oHow to leverage the techniques in

machine learning / deep learning to

better enhance the heterogeneous

information networks (or knowledge

graphs) ?

Page 101: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

102

Future Work

Bioinformatics

oThe network is also very common in the

biology. This can help interpret the

network more accurately.

oMulti-discipline is very popular now.

oCan we find some typical examples in

biological information networks and

use meta-path or meta-structure to

analyze them?

Page 102: Meta Paths and Meta Structures: Analysing Large ...oDefinition [Sun et al. VLDB 2011] 20 Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Thanks ! Q & A

Reynold

Cheng

Zhipeng

Huang

Yudian

Zheng

Jing

Yan

Ka Yu

Wong

Eddie

Ng

Database

Group: