meta paths and meta structures: analysing large ...odefinition [sun et al. vldb 2011] 20 yizhou sun,...

Post on 05-Jul-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Meta Paths and Meta Structures:

Analysing Large Heterogeneous

Information Networks

Reynold

Cheng

Zhipeng

Huang

Yudian

Zheng

Jing

Yan

Ka Yu

Wong

Eddie

Ng

Database

Group:

Information is Everywhere !oSocial Networking Websites

2https://makeawebsitehub.com/social-media-sites/

Information is Everywhere !oBiological Network

3http://serious-science.org/controlling-noisy-dynamics-in-biological-networks-to-fight-cancer-5376

Information is Everywhere !

oResearch Collaboration Network

4https://scholarlykitchen.sspnet.org/2017/04/07/updated-figures-scale-nature-researchers-use-

scholarly-collaboration-networks/

Information is Everywhere !

oProduct Recommendation Network

5

http://www.sciencedirect.com/science/article/pii/S0957417413006921

Byunghak Leem. Heuiju Chun. An impact of online recommendation network on demand

The Real WorldoHeterogeneous Information Network(s),

i.e. HIN(s).

oNetworks: Nodes & Links

– Nodes: Various Types

– Links: Various Types

6Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs

Example HINs

oDBLP Bibliographic

Network

oNetworks: Nodes & Links

– Node (Type):

• KDD (Venue)

• Jiawei Han (Author)

– Link (Type):

• Write (Author Paper)

• Publish (Paper Venue)

7Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Example HINs

oThe IMDB Movie

Network

oNetworks: Nodes & Links

– Node (Type):

• Forrest Gump (Movie)

• Tom Cruise (Actor)

– Link (Type):

• Make (Producer Movie)

• Act (Author Movie)

8Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Example HINs

oThe Facebook Network

oNetworks:

– Node (Type):

• Jimmy (User)

• Coca Cola (Product)

– Link (Type):

• Like (User Product)

• Follow (User User)

9Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

HINs are Ubiquitous !oHealthcare

– Doctor, Patient, Disease

oSource Code Repository

– Project, Developer, Repository

oE-Commerce

– Seller, Buyer, Product

oNews

– Author, Organization

10Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.

Knowledge Graph (KG)oTurn Web Knowledge into KG

11Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.

Knowledge Graph (KG)oExample KGs

12Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.

Knowledge Graph (KG)oStatistics in Existing KGs

13Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.

Problems in HIN

oLink Prediction

oEntity Profiling

oData Integration

14

Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs

Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S. Yu. COSNET: Connecting Heterogeneous Social

Networks with Local and Global Consistency, KDD 2015.

15

Overview of the Tutorial

oRelevance Search

Find Similar/Relevant Objects in Networks

oExamples

DBLP1

▪ Who are most similar to Jiawei Han ?

▪ Whose recent publication is relevant with Jiawei Han’s research ?

1 http://dblp.uni-trier.de/

oWhere do relations (meta-path)

come from?– Provided by experts [Sun VLDB’11]

• Not easy for a complex schema!

16

Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. “Discovering Meta-

Paths in Large Heterogeneous Information Networks”, in WWW 2015.

Overview of the Tutorial

o Query Recommendation: to suggest

alternate relevant queries to a search

engine user

o How will HIN benefit query

recommendation ?

17

Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for

Long-Tail Queries. CIKM 2016.

Overview of the Tutorial

o How can we express using more complexstructure?

o More Expressive (i.e., contain more information)than a meta path.

18

Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, Xiang Li. Meta Structure:

Computing Relevance in Large Heterogeneous Information Networks. KDD 2016.

Overview of the Tutorial

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Similarity Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work19

Definition of Meta-PathoDefinition [Sun et al. VLDB 2011]

20Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K

Similarity Search in Heterogeneous Information Networks. VLDB 2011.

oExample

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work21

22

Relevance Search

oMotivation

Find Similar/Relevant Objects in Networks

oExamples

DBLP1

▪ Who are most similar to Jiawei Han ?

▪ Whose recent publication is relevant with Jiawei Han’s research ?

IMDb2

▪ Who are most similar to Tom Cruise ?

▪ Which movie is most relevant to Tom Cruise?

1 http://dblp.uni-trier.de/2 http://www.imdb.com/

23

Relevance Search

oTarget

To answer these questions systematically

oSolutions

How to measure the similarity?▪ Define a Effective Similarity Function like Cosine, Euclidean

distance, Jaccard coefficient.

Structure similarity or Semantic similarity?▪ Structure Similarity: Based on structural similarity of sub‐network

structures. (like SimRank and PPR)

▪ Semantic Similarity: influenced by similar network structures. This

matters more for HIN! Semantic->edge relations

24

SimRank

oModel

Idea: Two objects are similar if they are referenced

by similar objects

oDefinition▪ S(a,b) = Average similarity between in-neighbors of object a I(a)

and in-neighbors of object b I(b). Between [0, 1].

▪ S(a,b) = 1, if a=b

= , if a≠b

where c is the constant and 0<c<1

[Jeh, Glen, and Jennifer Widom. KDD’02] Jeh, Glen "SimRank: a measure of structural-context similarity."

25

SimRank

oExample

S(a,a) = 1

S(a,b) =𝑐

1×1× 1 = c

▪ S(a,b) ideally should be 1.

▪ But, in reality the graph does not describe everything about

them, so by using the C to make s(a,b)<1. Adding C is to

expresses limited confidence or decay with distance.

x

b

a

[Jeh, Glen, and Jennifer Widom. KDD’03] Jeh, Glen "SimRank: a measure of structural-context similarity."

26

Personalized PageRank (PPR)

oModel

Idea: Originally defined by Google as

a measure of importance for web-pages.

oDefinition▪ Given a graph G, a starting source node s, a target node t, and

a teleport probability 𝛼. Perform random walk from s. At each

step stop with the probability 𝛼, otherwise continue

performing random walk.

▪ Then the Personalized PageRank from s to t is

PPR𝑠~𝑡 = P(𝒔 → 𝒕)

[Jeh, Glen, and Jennifer Widom. WWW’02] Jeh, Glen "Scaling personalized web search."

27

Personalized PageRank (PPR)

oExample

Starting from A, and 𝛼 = 0.2

For each target A, B, C, D

oCalculation

Iterative computation (Power Method);

Monte-carlo simulation (Approximation);

Bookmark Coloring Algorithm, and etc…

28

Path Constrained Random Walk

oModel

Random walk on given paths.

oDefinition

▪ Performing random walks on given meta-paths with

the fixed starting point and target point.

▪ PCRW: Transition probability of the random walk

following a given meta-path.

▪ Between [0, 1].

PCRW(𝑠, 𝑡|𝚷) = P(𝒔 → 𝒕|𝚷)

[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”

29

Path Constrained Random Walk

oExample

= P1 -> P2 -> P3

1. Pro(B.Obama | P1)=1

2. Pro(M.A. Obama | P2) = Pro(B.Obama | P1) / 2 = 0.5

Pro(N.Obama | P2) = Pro(B.Obama | P1) / 2= 0.5

3. Pro(M.Obama | P3) = Pro(M.A. Obama | P2) /2 + Pro(N.Obama | P2) /2 = 0.5

Pro(B.Obama | P3) = Pro(M.A. Obama | P2) 2 + Pro(N.Obama | P2) /2 = 0.5

[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”

Person Person Person

hasChild hasChild-1

m1

m1

30

PathSim

oModel

Path Counts (PC):

#paths following a given meta-path

oDefinition

▪ Can only be applied on symmetric meta paths

(consider the node type and link type)

▪ Normalized version of PC. Between [0, 1].

▪ PathSim s, t | m =2×PC(s,t|m)

PC s,s +PC(t,t)

[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous

Information Networks

31

PathSim

oExample

PC(B.Obama, M.Obama)=2

PC(B.Obama, B.Obama)=2

PC(M.Obama, M.Obama)=2

PS(B.Obama,M.Obama)=2*2/(2+2) =1

[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous

Information Networks

Person Person Person

hasChild hasChild-1

m1

32

HeteSim

oModel

Improvement of SimRank for

Heterogeneous Information Network

oDefinition

▪ Any arbitrary meta paths.

▪ Given relations

[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.

33

HeteSim

oExample

𝒎𝟏= P1 -> P2 -> P3

HeteSim (B.Obama, M.Obama|𝑚1)=1

|𝑂𝐵.𝑂𝑏𝑎𝑚𝑎|+|𝐼𝑀.𝑂𝑏𝑎𝑚𝑎|(𝐻𝑒𝑡𝑒𝑆𝑖𝑚(M.A.Obama, M.A.Obama)+Hetesim(N.Obama, N.Obama))

=𝟏

(𝟐×𝟐)𝟏 + 𝟏 = 𝟎. 𝟓

Person Person Person

hasChild hasChild-1

m1

[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.

Comparison

oFor PathSim, HeteSim and PCRW, even for the same

example they have different values.

oThese metrics are designed for different applications

or measurement scenarios.

oNo dominating similarity measurements so far.

34

35

Other Measurements

oKnowSim (APWeb’14)

Measure similarity between nodes by RWs on given

meta-path and the reverse meta-path respectively.

oAvgSim (ICDM’16)

Measure the similarity of Documents by modeling them

into heterogeneous information networks.

oRelSim (SDM’16)

Measure the similarity relations in heterogeneous

information network.

36

Summary

Structure

-based

Semantic-

basedSymmetric?

SimRank √ Yes

PPR √ Yes

PCRW √ No

PathSim √ Yes

HeteSim √ Yes

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work37

Questions

oWhere do meta paths come from?– Provided by experts [Sun VLDB’11]

• Not easy for a complex schema!

– Enumeration within a given length of

meta paths [Cohen ECML’11]

• No clue about the length!

–How do I know the weights ?

38

Our Contributions (WWW’15)

oDesign a solution that:

– (1) Discovers the best meta paths

– (2) Learns the weights, without

maximum weight specified.

[Meng WWW’15] Changping Meng, Reynold Cheng, Silviu Maniu,

Pierre Senellart, and Wangda Zhang. “Discovering Meta-Paths in Large

Heterogeneous Information Networks”, in WWW 2015.

39

Meta-Path Framework

oFramework

Meta Path

Generation

Example

node pairs

Meta-paths

Relevance

Function

Knowledge

Graph

(Yago)

(B. Obama, M. Obama)

(B. Clinton, H. Clinton)

(Linear

Function)

G

F

Challenge: Each node and edge can have many

class labels. The number of candidate meta paths

grows exponentially with their path lengths.

40

Generating Meta-Paths

o In Two Phases

Example

node pairs

Meta-paths

Relevance

Function Knowledge

Graph

G

F

Link

Type

Node

Type

41

Phase 1: Link-Only Path Generation

oForward Stage-wise Path Generation (FSPG)

– iteratively generate the most related meta-paths and update

the model

Example

Pairs

Get one most

related meta

path m

Model

TrainingMeta path

m

Updated

model

FINISH

Based on the Least-Angle

Regression (LARS) model

[Efron, Ann.Stat’04]

Y

N

GreedyTree

Converg

e

To train the

weights on

meta paths.

42

oGreedyTree

– A tree that greedily expands the node which has the largest

priority score

– Priority Score : related to the correlation between m and r

• m is the vector expression of a meta path, r is the residual vector

which evaluates the gap between the truth and current model

Phase 1: Link-Only Path Generation

43

GreedyTree

Phase 1: Link-Only Path Generation

44

Phase 2: Node Class Generation

oWhy node classes?

– A link only meta path may introduce some unrelated result

pairs

– It is less specific

– Solution : Lowest Common Ancestor (LCA)

• Record the LCA in the node of GreedyTree

45

ExperimentsoDatasets

– DBLP (4 areas: DB, DM, AI, IR)

• 14K papers, 14K authors, 9K topics, 20 venues.

– Yago

• A KG derived from Wikipedia, WordNet and

GeoNames.

• CORE Facts: 2.1 million nodes, 8 million edges,

125 edge types, 0.36 million node types

oLink-prediction evaluation

– Select n pairs of certain relationships as

example pairs

– Randomly select another m pairs to predict

the links46

Experiment 1: Effectiveness

oBaseline: enumerate all meta paths within a given

max length L = 1, 2, 3, 4

– L is small low recall.

– L is large low precision.

ROC for link prediction47

Experiment 2

oCase study: Yago citizenOf

– Better than direct link (PCRW 1)

– Better than best PCRW 2

– Better than PCRW 3,4

5 most relevant meta paths

for “citizenOf”

48

Experiment 3: Efficiency

oFindings:

– In Yago, 2 orders of magnitude better than paths with

lengths more than 2.

– In DBLP, the running time is comparable to PCRW 5, but

the accuracy is much better.

Running time of FSPG

49

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work50

One Application

oQuery Recommendation: to suggest alternate relevant

queries to a search engine user

– 1) As you type;

– 2) Related queries

51

Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for

Long-Tail Queries. CIKM 2016.

Long Tail Distribution

oLong-tail queries: queries that are not commonly

requested by users

– “akira kurosawa influence george lucas”

52

Motivation

oUbiquitous:

– 84% of 10M queries appear no more than 3 times.

oNecessary:

– Existing works that only rely on query log alone can no

longer handle well these queries.

53

Query Log

oA set of user log <q, u, t, C>

– q: the query

– u: user id

– t: time stamp

– C: the clicked URLs

oSession: a time window, a mission.

oExisting methods rely on query logs to analyze the

flow among queries.

54

Boldi, Paolo, et al. "The query-flow graph: model and applications." Proceedings of the 17th ACM conference

on Information and knowledge management. ACM, 2008.

Bonchi, Francesco, et al. "Efficient query recommendations in the long tail via center-piece subgraphs."

Proceedings of the 35th international ACM SIGIR conference on Research and development in information

retrieval. ACM, 2012.

Knowledge Graph

55

Hoffart, Johannes, et al. "Yago2: a spatially and temporally enhanced Knowledge Graph from wikipedia."

(2012).

Relationship in the KG

oMeta path representation:

– P: city nextTo city

oQ: “weather Los Angeles”

– Rec:

• “weather Las Vegas”

• “weather San Diego”

56

[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous

Information Networks

57

System Overview

oG = (Gqf, K, teq, P)

– Gqf is a query-flow graph

– K is a Knowledge Graph

– tEQ is a set of entity-query links

– P is a set of meta path to be extracted from

query log

Offline

58

oGqf is built as described in [1].

o teq is built from entity linking and

normalizing the weights.

oP:

– Get the set of entity pairs within the same

session: 𝒆𝒊, 𝒆𝒋 𝒆𝒊, 𝒆𝒋 ∈ 𝒔𝒌}

– Get the meta path between ei and ej (we

use the shortest path for simplicity)

– Stored by the type of ei

Online

Input query q:

LA weather

Step 1. Entity Linking Step 2. Entity Expansion

0.25

0.25

e1,1

e1,2

e1

meta path P1: city citynextTo

P1

P1

Entity Linking Tool

(e.g., Dexter2)

<San_Diego>

<Las_Vegas>

Recommendation:

q1 = San Diego weather

q2 = Las Vegas weather

Step 3. Query Searching

e1,1

e1,2

q1

q2

0.25

0.25

q1 = San Diego weather

q2 = Las Vegas weather

<Los_Angeles>

San Diego weather

Las Vegas weather

<San_Diego>

<Las_Vegas>e1 = <Los_Angeles>

e2 = <weather>

e2 P2

<weather>

meta path P2:property propertyisA

e20.5

0.5

<weather>

0.5

o Three Steps:

– Entity Linking (use existing tool)

– Entity Expansion (use P)

– Query Searching (PPR)

59

Step 1: Entity Linking

oGiven

– q = “weather Los Angeles”

oReturn:

– e1 = Los_Angeles

60

Ceccarelli, Diego, et al. "Dexter: an open source framework for entity linking." Proceedings of the sixth

international workshop on Exploiting semantic annotations in information retrieval. ACM, 2013.

Step 2. Entity Expansion

oGiven

– e1 = Los_Angeles

oUsing P:

– city NextTo city

oReturn

– e2 = Las_Vegas

– e3 = San_Diego

61

Step 3. Query Searching

oGiven:

– e2 = Las_Vegas

– e3 = San_Diego

oReturn:

– q1 = “weather las vegas”

– q2 = “weather san diego”

62

Experiments

oDataset: AOL. 20M query instances from 9M distinct

queries.

oUse 10%, 50%, 90% for building the query log, and

10% for testing.

oTesting sets: We use 3, 5, 10 as the threshold for

long-tail queries. We name them L’3, L’5 and L’10.

oMeasures:

– Coverage

– Precision@5

63

Experimental Results

64

Efficiency

oTime for offline:

oTime for entity linking:

– 60ms for Dexter2, and can reduce to 0.4ms if we use FEL

method.

65

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work66

67

Limitations of Meta Paths

oFail to discover common nodes in

different meta paths!

– E.g., a researcher wants to search for some

authors who have published papers in the

same venue and in the same topic with his

papers.a

1a

2a

3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

68

Limitations of Meta Paths

oFail to discover common nodes in

different meta paths!

– E.g., a researcher wants to search for some

authors who have published papers in the

same venue and in the same topic with his

papers.a

1a

2a

3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

69

Limitations of Meta Paths

oFail to discover common nodes in

different meta paths!

– E.g., a researcher wants to search for some

authors who have published papers in the

same venue and in the same topic with his

papers.a

1a

2a

3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

Meta Structure

o A meta structure is a directed acyclic graph (DAG) with a single source and sink (target) node

o More Expressive (i.e., contain moreinformation) than a meta path.

70

[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information

Networks” KDD 2016

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work71

Relevance Measure 1: StructCount

oStructCount: extension of PathCount

oStructCount biases towards popular

objects with a large number of links.

StructCount(x0, y0 | S) = GraphIns(x0, y0 | S)

72

[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information

Networks” KDD 2016

Layers of Meta Structure

oThe layer of meta structure is a topological ordering

of a DAG

1 2 3 4 5

73

Relevance Measure 2: SCSE

oStructure Constrained Random Walk (SCSE):

extension of PCRW.

a1

a2

a3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

1.0

0.5

0.25

0.5

74

Relevance Measure 2: SCSE

a1

a2

a3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

75

Relevance Measure 3: BSCSE

oBiased Structure Constrained Random Walk (BSCSE):

extension of BPCRW.

– A combination of SC and SCSE

– SC 0 1 SCSE

77

[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information

Networks” KDD 2016

Relevance Measures: Summary

Meta Path Meta Structure Meaning

PathCount StructCount # of meta-path/structure instances

PCRW SCSERandom walk probability on meta-

path/structure

BPCRW BSCSE Combination of count and probability

78

i-LTable

o Index the probability distribution starting from the i-th

layer of a meta structure.

a1

a2

a3

p1,2

p1,1

p2,1

p2,2

p3,2

p3,1

v1

v2

v3

v4

t1 t

2t3

t4

K D D “m ining” AAAIVLD B “efficient” “privacy”

AAAI’15 VLD B’15K D D’15K D D’07

IC D M “social”

IC D M ’12

w rite publishm ention

VLD B’06

author paper venue topicobject types:

edge types:

Key / layer 3 Value

<ICDM,

social>

<Pei, 1.0>

<KDD,

mining>

<Pei, 0.5>

<Han, 0.5>

<VLDB,

efficient>

<Han, 1.0>

<VLDB,

privacy>

<Yang, 1.0>

<AAAI,

efficient>

<Yang, 1.0>

79

Experiment: Entity Resolution

oOn YAGO, we have duplicated

entities, e.g., Barack_Obama and

Presidency_Of_Barack_Obama

o We retrieve the top-k pairs; the high

relevance of the node pairs indicates

that the nodes are duplicated

oArea under PR-Curve (AUC)

80

Experiment: Entity Resolution

P1 P2

Measure PathCou

nt

PCRW PathSim PathCou

nt

PCRW PathSim

AUC 0.1324 0.0120 0.0097 0.0003 0.0014 0.0002

Linear Combination(optimal ) Meta Structure S

Measure PathCou

nt

PCRW PathSim SC SCSE BSCSE*

AUC 0.2898 0.2606 0.2920 0.5556 0.5640 0.564081

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work82

Meta-Paths Demo

Welcome to Meta-Paths

Interactive toolbox for querying

Heterogeneous Information Networks

by Examples

Start Learn More

83

New Query

84

FSPG Execution

The FSPG

algorithm will be

triggered on the

server, returning

the results upon

completion.

85

Generated Meta-Paths

86

Node Pair Generation

Results of the

Meta-Paths

algorithm are

shown. Upon

clicking

“Proceed”, the

node pair

generation

service will be

triggered on the

server.

87

Suggested Node Pairs

The user can

remove some of

the suggested

node pairs, and

use the remaining

pairs to refine the

Meta-Paths in an

iterative manner.

88

Fine-tuning Node Pairs

Click “Proceed” to

start the next

iteration, or

“Finish” to view

the final query

results.

89

Next Iteration

90

Final Results

Click “Save” to

keep a copy of

the query results.

Alternatively, click

“New” to start a

new query.

91

Final Results

92

Outline◦ Introduction

– Motivation

– Heterogeneous Information Network (HIN)

– Applications

◦ Meta-Path

– Definition

– Relevance Search

– Meta-Path Discovery

– Query Recommendation

◦ Meta-Structure

– Definition

– Relevance Search

◦ Demo

◦ Conclusions & Future Work93

94

Conclusions

oHeterogeneous Information Networks

are more powerful than Homogeneous

Information Networks

oMeta-path can capture the relevances

(similarities) between two nodes

oMeta-structure captures more complex

relationships in structures

95

Future Work

Dynamic Similarity Search on

Meta-Paths

oSometimes the direct relevance search

can not reveal the true relationship

among entities.

oSolutions: Dynamic Network Search

oProblems: 1. No efficient top-k query

algorithms. 2. No predicates or posterior

knowledge of the network

oML methods could help!

96

Future Work

Ming HINs with Meta Structure

oUse Meta Structure to perform various

data mining tasks on HINs, e.g.,

recommendation, classification and

clustering.

oDesign effective and efficient

techniques to discover meta structure

to express the relationship between

entities.

97

Future Work

Knowledge Graph exploration

oQ1: Given an entity of interest in a KG,

use different meta paths and meta

structures to find related entities,and

sort them according to relevance.

oQ2: Given some entity pairs, find some

meta structures to account for their

relationships (meta path version has

been solved).

98

Future Work

Personalized Knowledge Graph

oPersonalized Recommendation is

popular and useful in recommendation.

oRich information from query logs.

oQuestions: How to build a Personalized

KG for each user?

oStorage and efficiency

oPrivacy issues

99

Future Work

Knowledge Graph maintenance

oQ1: Build a domain-specific KG from

some given entity samples and a

document corpus.

oQ2: Expand a KG by crawling info from

internet.

oQ3: Error detection within a KG using

meta path and meta structures.

oQ4: Error correction automatically.

100

Future Work

Knowledge Graph cleaning

oRelations / Nodes in KG are inherently

“dirty” (many are curated based on

automatic tools / scripts, which lead to

duplications or error data)

oHow to clean the Knowledge Graph by

removing dirty relations / nodes ?

101

Future Work

Machine Learning

oMachine learning / deep learning is so

hot nowadays !

oHow to leverage the techniques in

machine learning / deep learning to

better enhance the heterogeneous

information networks (or knowledge

graphs) ?

102

Future Work

Bioinformatics

oThe network is also very common in the

biology. This can help interpret the

network more accurately.

oMulti-discipline is very popular now.

oCan we find some typical examples in

biological information networks and

use meta-path or meta-structure to

analyze them?

Thanks ! Q & A

Reynold

Cheng

Zhipeng

Huang

Yudian

Zheng

Jing

Yan

Ka Yu

Wong

Eddie

Ng

Database

Group:

top related