meta paths and meta structures: analysing large ...odefinition [sun et al. vldb 2011] 20 yizhou sun,...
TRANSCRIPT
Meta Paths and Meta Structures:
Analysing Large Heterogeneous
Information Networks
Reynold
Cheng
Zhipeng
Huang
Yudian
Zheng
Jing
Yan
Ka Yu
Wong
Eddie
Ng
Database
Group:
Information is Everywhere !oSocial Networking Websites
2https://makeawebsitehub.com/social-media-sites/
Information is Everywhere !oBiological Network
3http://serious-science.org/controlling-noisy-dynamics-in-biological-networks-to-fight-cancer-5376
Information is Everywhere !
oResearch Collaboration Network
4https://scholarlykitchen.sspnet.org/2017/04/07/updated-figures-scale-nature-researchers-use-
scholarly-collaboration-networks/
Information is Everywhere !
oProduct Recommendation Network
5
http://www.sciencedirect.com/science/article/pii/S0957417413006921
Byunghak Leem. Heuiju Chun. An impact of online recommendation network on demand
The Real WorldoHeterogeneous Information Network(s),
i.e. HIN(s).
oNetworks: Nodes & Links
– Nodes: Various Types
– Links: Various Types
6Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs
Example HINs
oDBLP Bibliographic
Network
oNetworks: Nodes & Links
– Node (Type):
• KDD (Venue)
• Jiawei Han (Author)
– Link (Type):
• Write (Author Paper)
• Publish (Paper Venue)
7Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
Example HINs
oThe IMDB Movie
Network
oNetworks: Nodes & Links
– Node (Type):
• Forrest Gump (Movie)
• Tom Cruise (Actor)
– Link (Type):
• Make (Producer Movie)
• Act (Author Movie)
8Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
Example HINs
oThe Facebook Network
oNetworks:
– Node (Type):
• Jimmy (User)
• Coca Cola (Product)
– Link (Type):
• Like (User Product)
• Follow (User User)
9Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
HINs are Ubiquitous !oHealthcare
– Doctor, Patient, Disease
oSource Code Repository
– Project, Developer, Repository
oE-Commerce
– Seller, Buyer, Product
oNews
– Author, Organization
10Jiawei Han. A Meta Path-Based Approach for Similarity Search and Mining of Heterogeneous Information Networks.
Knowledge Graph (KG)oTurn Web Knowledge into KG
11Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.
Knowledge Graph (KG)oExample KGs
12Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.
Knowledge Graph (KG)oStatistics in Existing KGs
13Gerhard Weikum. Knowledge Graphs: from a Fistful of Triples to Deep Data and Deep Text.
Problems in HIN
oLink Prediction
oEntity Profiling
oData Integration
14
Yangqiu Song. Recent Development of Heterogeneous Information Networks: From Meta-paths to Meta-graphs
Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S. Yu. COSNET: Connecting Heterogeneous Social
Networks with Local and Global Consistency, KDD 2015.
15
Overview of the Tutorial
oRelevance Search
Find Similar/Relevant Objects in Networks
oExamples
DBLP1
▪ Who are most similar to Jiawei Han ?
▪ Whose recent publication is relevant with Jiawei Han’s research ?
1 http://dblp.uni-trier.de/
oWhere do relations (meta-path)
come from?– Provided by experts [Sun VLDB’11]
• Not easy for a complex schema!
16
Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. “Discovering Meta-
Paths in Large Heterogeneous Information Networks”, in WWW 2015.
Overview of the Tutorial
o Query Recommendation: to suggest
alternate relevant queries to a search
engine user
o How will HIN benefit query
recommendation ?
17
Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for
Long-Tail Queries. CIKM 2016.
Overview of the Tutorial
o How can we express using more complexstructure?
o More Expressive (i.e., contain more information)than a meta path.
18
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, Xiang Li. Meta Structure:
Computing Relevance in Large Heterogeneous Information Networks. KDD 2016.
Overview of the Tutorial
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Similarity Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work19
Definition of Meta-PathoDefinition [Sun et al. VLDB 2011]
20Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu. PathSim: Meta Path-Based Top-K
Similarity Search in Heterogeneous Information Networks. VLDB 2011.
oExample
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work21
22
Relevance Search
oMotivation
Find Similar/Relevant Objects in Networks
oExamples
DBLP1
▪ Who are most similar to Jiawei Han ?
▪ Whose recent publication is relevant with Jiawei Han’s research ?
IMDb2
▪ Who are most similar to Tom Cruise ?
▪ Which movie is most relevant to Tom Cruise?
1 http://dblp.uni-trier.de/2 http://www.imdb.com/
23
Relevance Search
oTarget
To answer these questions systematically
oSolutions
How to measure the similarity?▪ Define a Effective Similarity Function like Cosine, Euclidean
distance, Jaccard coefficient.
Structure similarity or Semantic similarity?▪ Structure Similarity: Based on structural similarity of sub‐network
structures. (like SimRank and PPR)
▪ Semantic Similarity: influenced by similar network structures. This
matters more for HIN! Semantic->edge relations
24
SimRank
oModel
Idea: Two objects are similar if they are referenced
by similar objects
oDefinition▪ S(a,b) = Average similarity between in-neighbors of object a I(a)
and in-neighbors of object b I(b). Between [0, 1].
▪ S(a,b) = 1, if a=b
= , if a≠b
where c is the constant and 0<c<1
[Jeh, Glen, and Jennifer Widom. KDD’02] Jeh, Glen "SimRank: a measure of structural-context similarity."
25
SimRank
oExample
S(a,a) = 1
S(a,b) =𝑐
1×1× 1 = c
▪ S(a,b) ideally should be 1.
▪ But, in reality the graph does not describe everything about
them, so by using the C to make s(a,b)<1. Adding C is to
expresses limited confidence or decay with distance.
x
b
a
[Jeh, Glen, and Jennifer Widom. KDD’03] Jeh, Glen "SimRank: a measure of structural-context similarity."
26
Personalized PageRank (PPR)
oModel
Idea: Originally defined by Google as
a measure of importance for web-pages.
oDefinition▪ Given a graph G, a starting source node s, a target node t, and
a teleport probability 𝛼. Perform random walk from s. At each
step stop with the probability 𝛼, otherwise continue
performing random walk.
▪ Then the Personalized PageRank from s to t is
PPR𝑠~𝑡 = P(𝒔 → 𝒕)
[Jeh, Glen, and Jennifer Widom. WWW’02] Jeh, Glen "Scaling personalized web search."
27
Personalized PageRank (PPR)
oExample
Starting from A, and 𝛼 = 0.2
For each target A, B, C, D
oCalculation
Iterative computation (Power Method);
Monte-carlo simulation (Approximation);
Bookmark Coloring Algorithm, and etc…
28
Path Constrained Random Walk
oModel
Random walk on given paths.
oDefinition
▪ Performing random walks on given meta-paths with
the fixed starting point and target point.
▪ PCRW: Transition probability of the random walk
following a given meta-path.
▪ Between [0, 1].
PCRW(𝑠, 𝑡|𝚷) = P(𝒔 → 𝒕|𝚷)
[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”
29
Path Constrained Random Walk
oExample
= P1 -> P2 -> P3
1. Pro(B.Obama | P1)=1
2. Pro(M.A. Obama | P2) = Pro(B.Obama | P1) / 2 = 0.5
Pro(N.Obama | P2) = Pro(B.Obama | P1) / 2= 0.5
3. Pro(M.Obama | P3) = Pro(M.A. Obama | P2) /2 + Pro(N.Obama | P2) /2 = 0.5
Pro(B.Obama | P3) = Pro(M.A. Obama | P2) 2 + Pro(N.Obama | P2) /2 = 0.5
[Cohen ECML’11]W. Cohen, N. Lao “Relational Retrieval Using a Combination of Path-Constrained Random Walks”
Person Person Person
hasChild hasChild-1
m1
m1
30
PathSim
oModel
Path Counts (PC):
#paths following a given meta-path
oDefinition
▪ Can only be applied on symmetric meta paths
(consider the node type and link type)
▪ Normalized version of PC. Between [0, 1].
▪ PathSim s, t | m =2×PC(s,t|m)
PC s,s +PC(t,t)
[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous
Information Networks
31
PathSim
oExample
PC(B.Obama, M.Obama)=2
PC(B.Obama, B.Obama)=2
PC(M.Obama, M.Obama)=2
PS(B.Obama,M.Obama)=2*2/(2+2) =1
[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous
Information Networks
Person Person Person
hasChild hasChild-1
m1
32
HeteSim
oModel
Improvement of SimRank for
Heterogeneous Information Network
oDefinition
▪ Any arbitrary meta paths.
▪ Given relations
[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.
33
HeteSim
oExample
𝒎𝟏= P1 -> P2 -> P3
HeteSim (B.Obama, M.Obama|𝑚1)=1
|𝑂𝐵.𝑂𝑏𝑎𝑚𝑎|+|𝐼𝑀.𝑂𝑏𝑎𝑚𝑎|(𝐻𝑒𝑡𝑒𝑆𝑖𝑚(M.A.Obama, M.A.Obama)+Hetesim(N.Obama, N.Obama))
=𝟏
(𝟐×𝟐)𝟏 + 𝟏 = 𝟎. 𝟓
Person Person Person
hasChild hasChild-1
m1
[Shi, Kong, Huang TKDE’2014] Hetesim: A general framework for relevance measure in heterogeneous networks.
Comparison
oFor PathSim, HeteSim and PCRW, even for the same
example they have different values.
oThese metrics are designed for different applications
or measurement scenarios.
oNo dominating similarity measurements so far.
34
35
Other Measurements
oKnowSim (APWeb’14)
Measure similarity between nodes by RWs on given
meta-path and the reverse meta-path respectively.
oAvgSim (ICDM’16)
Measure the similarity of Documents by modeling them
into heterogeneous information networks.
oRelSim (SDM’16)
Measure the similarity relations in heterogeneous
information network.
…
36
Summary
Structure
-based
Semantic-
basedSymmetric?
SimRank √ Yes
PPR √ Yes
PCRW √ No
PathSim √ Yes
HeteSim √ Yes
…
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work37
Questions
oWhere do meta paths come from?– Provided by experts [Sun VLDB’11]
• Not easy for a complex schema!
– Enumeration within a given length of
meta paths [Cohen ECML’11]
• No clue about the length!
–How do I know the weights ?
38
Our Contributions (WWW’15)
oDesign a solution that:
– (1) Discovers the best meta paths
– (2) Learns the weights, without
maximum weight specified.
[Meng WWW’15] Changping Meng, Reynold Cheng, Silviu Maniu,
Pierre Senellart, and Wangda Zhang. “Discovering Meta-Paths in Large
Heterogeneous Information Networks”, in WWW 2015.
39
Meta-Path Framework
oFramework
Meta Path
Generation
Example
node pairs
Meta-paths
Relevance
Function
Knowledge
Graph
(Yago)
(B. Obama, M. Obama)
(B. Clinton, H. Clinton)
(Linear
Function)
G
F
Challenge: Each node and edge can have many
class labels. The number of candidate meta paths
grows exponentially with their path lengths.
40
Generating Meta-Paths
o In Two Phases
Example
node pairs
Meta-paths
Relevance
Function Knowledge
Graph
G
F
Link
Type
Node
Type
41
Phase 1: Link-Only Path Generation
oForward Stage-wise Path Generation (FSPG)
– iteratively generate the most related meta-paths and update
the model
Example
Pairs
Get one most
related meta
path m
Model
TrainingMeta path
m
Updated
model
FINISH
Based on the Least-Angle
Regression (LARS) model
[Efron, Ann.Stat’04]
Y
N
GreedyTree
Converg
e
To train the
weights on
meta paths.
42
oGreedyTree
– A tree that greedily expands the node which has the largest
priority score
– Priority Score : related to the correlation between m and r
• m is the vector expression of a meta path, r is the residual vector
which evaluates the gap between the truth and current model
Phase 1: Link-Only Path Generation
43
GreedyTree
Phase 1: Link-Only Path Generation
44
Phase 2: Node Class Generation
oWhy node classes?
– A link only meta path may introduce some unrelated result
pairs
– It is less specific
– Solution : Lowest Common Ancestor (LCA)
• Record the LCA in the node of GreedyTree
45
ExperimentsoDatasets
– DBLP (4 areas: DB, DM, AI, IR)
• 14K papers, 14K authors, 9K topics, 20 venues.
– Yago
• A KG derived from Wikipedia, WordNet and
GeoNames.
• CORE Facts: 2.1 million nodes, 8 million edges,
125 edge types, 0.36 million node types
oLink-prediction evaluation
– Select n pairs of certain relationships as
example pairs
– Randomly select another m pairs to predict
the links46
Experiment 1: Effectiveness
oBaseline: enumerate all meta paths within a given
max length L = 1, 2, 3, 4
– L is small low recall.
– L is large low precision.
ROC for link prediction47
Experiment 2
oCase study: Yago citizenOf
– Better than direct link (PCRW 1)
– Better than best PCRW 2
– Better than PCRW 3,4
5 most relevant meta paths
for “citizenOf”
48
Experiment 3: Efficiency
oFindings:
– In Yago, 2 orders of magnitude better than paths with
lengths more than 2.
– In DBLP, the running time is comparable to PCRW 5, but
the accuracy is much better.
Running time of FSPG
49
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work50
One Application
oQuery Recommendation: to suggest alternate relevant
queries to a search engine user
– 1) As you type;
– 2) Related queries
51
Zhipeng Huang, Bogdan Cautis, Reynold Cheng, Yudian Zheng. KB-Enabled Query Recommendation for
Long-Tail Queries. CIKM 2016.
Long Tail Distribution
oLong-tail queries: queries that are not commonly
requested by users
– “akira kurosawa influence george lucas”
52
Motivation
oUbiquitous:
– 84% of 10M queries appear no more than 3 times.
oNecessary:
– Existing works that only rely on query log alone can no
longer handle well these queries.
53
Query Log
oA set of user log <q, u, t, C>
– q: the query
– u: user id
– t: time stamp
– C: the clicked URLs
oSession: a time window, a mission.
oExisting methods rely on query logs to analyze the
flow among queries.
54
Boldi, Paolo, et al. "The query-flow graph: model and applications." Proceedings of the 17th ACM conference
on Information and knowledge management. ACM, 2008.
Bonchi, Francesco, et al. "Efficient query recommendations in the long tail via center-piece subgraphs."
Proceedings of the 35th international ACM SIGIR conference on Research and development in information
retrieval. ACM, 2012.
Knowledge Graph
55
Hoffart, Johannes, et al. "Yago2: a spatially and temporally enhanced Knowledge Graph from wikipedia."
(2012).
Relationship in the KG
oMeta path representation:
– P: city nextTo city
oQ: “weather Los Angeles”
– Rec:
• “weather Las Vegas”
• “weather San Diego”
56
[Sun, Han VLDB’11] Y. Sun, J. Han, el “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous
Information Networks
57
System Overview
oG = (Gqf, K, teq, P)
– Gqf is a query-flow graph
– K is a Knowledge Graph
– tEQ is a set of entity-query links
– P is a set of meta path to be extracted from
query log
Offline
58
oGqf is built as described in [1].
o teq is built from entity linking and
normalizing the weights.
oP:
– Get the set of entity pairs within the same
session: 𝒆𝒊, 𝒆𝒋 𝒆𝒊, 𝒆𝒋 ∈ 𝒔𝒌}
– Get the meta path between ei and ej (we
use the shortest path for simplicity)
– Stored by the type of ei
Online
Input query q:
LA weather
Step 1. Entity Linking Step 2. Entity Expansion
0.25
0.25
e1,1
e1,2
e1
meta path P1: city citynextTo
P1
P1
Entity Linking Tool
(e.g., Dexter2)
<San_Diego>
<Las_Vegas>
Recommendation:
q1 = San Diego weather
q2 = Las Vegas weather
Step 3. Query Searching
e1,1
e1,2
q1
q2
0.25
0.25
q1 = San Diego weather
q2 = Las Vegas weather
<Los_Angeles>
San Diego weather
Las Vegas weather
<San_Diego>
<Las_Vegas>e1 = <Los_Angeles>
e2 = <weather>
e2 P2
<weather>
meta path P2:property propertyisA
e20.5
0.5
<weather>
0.5
o Three Steps:
– Entity Linking (use existing tool)
– Entity Expansion (use P)
– Query Searching (PPR)
59
Step 1: Entity Linking
oGiven
– q = “weather Los Angeles”
oReturn:
– e1 = Los_Angeles
60
Ceccarelli, Diego, et al. "Dexter: an open source framework for entity linking." Proceedings of the sixth
international workshop on Exploiting semantic annotations in information retrieval. ACM, 2013.
Step 2. Entity Expansion
oGiven
– e1 = Los_Angeles
oUsing P:
– city NextTo city
oReturn
– e2 = Las_Vegas
– e3 = San_Diego
61
Step 3. Query Searching
oGiven:
– e2 = Las_Vegas
– e3 = San_Diego
oReturn:
– q1 = “weather las vegas”
– q2 = “weather san diego”
62
Experiments
oDataset: AOL. 20M query instances from 9M distinct
queries.
oUse 10%, 50%, 90% for building the query log, and
10% for testing.
oTesting sets: We use 3, 5, 10 as the threshold for
long-tail queries. We name them L’3, L’5 and L’10.
oMeasures:
– Coverage
– Precision@5
63
Experimental Results
64
Efficiency
oTime for offline:
oTime for entity linking:
– 60ms for Dexter2, and can reduce to 0.4ms if we use FEL
method.
65
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work66
67
Limitations of Meta Paths
oFail to discover common nodes in
different meta paths!
– E.g., a researcher wants to search for some
authors who have published papers in the
same venue and in the same topic with his
papers.a
1a
2a
3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
68
Limitations of Meta Paths
oFail to discover common nodes in
different meta paths!
– E.g., a researcher wants to search for some
authors who have published papers in the
same venue and in the same topic with his
papers.a
1a
2a
3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
69
Limitations of Meta Paths
oFail to discover common nodes in
different meta paths!
– E.g., a researcher wants to search for some
authors who have published papers in the
same venue and in the same topic with his
papers.a
1a
2a
3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
Meta Structure
o A meta structure is a directed acyclic graph (DAG) with a single source and sink (target) node
o More Expressive (i.e., contain moreinformation) than a meta path.
70
[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information
Networks” KDD 2016
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work71
Relevance Measure 1: StructCount
oStructCount: extension of PathCount
oStructCount biases towards popular
objects with a large number of links.
StructCount(x0, y0 | S) = GraphIns(x0, y0 | S)
72
[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information
Networks” KDD 2016
Layers of Meta Structure
oThe layer of meta structure is a topological ordering
of a DAG
1 2 3 4 5
73
Relevance Measure 2: SCSE
oStructure Constrained Random Walk (SCSE):
extension of PCRW.
a1
a2
a3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
1.0
0.5
0.25
0.5
74
Relevance Measure 2: SCSE
a1
a2
a3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
75
Relevance Measure 3: BSCSE
oBiased Structure Constrained Random Walk (BSCSE):
extension of BPCRW.
– A combination of SC and SCSE
– SC 0 1 SCSE
77
[Huang KDD’16] ZP. Huang “Meta Structure: Computing Relevance on Large Heterogeneous Information
Networks” KDD 2016
Relevance Measures: Summary
Meta Path Meta Structure Meaning
PathCount StructCount # of meta-path/structure instances
PCRW SCSERandom walk probability on meta-
path/structure
BPCRW BSCSE Combination of count and probability
78
i-LTable
o Index the probability distribution starting from the i-th
layer of a meta structure.
a1
a2
a3
p1,2
p1,1
p2,1
p2,2
p3,2
p3,1
v1
v2
v3
v4
t1 t
2t3
t4
K D D “m ining” AAAIVLD B “efficient” “privacy”
AAAI’15 VLD B’15K D D’15K D D’07
IC D M “social”
IC D M ’12
w rite publishm ention
VLD B’06
author paper venue topicobject types:
edge types:
Key / layer 3 Value
<ICDM,
social>
<Pei, 1.0>
<KDD,
mining>
<Pei, 0.5>
<Han, 0.5>
<VLDB,
efficient>
<Han, 1.0>
<VLDB,
privacy>
<Yang, 1.0>
<AAAI,
efficient>
<Yang, 1.0>
79
Experiment: Entity Resolution
oOn YAGO, we have duplicated
entities, e.g., Barack_Obama and
Presidency_Of_Barack_Obama
o We retrieve the top-k pairs; the high
relevance of the node pairs indicates
that the nodes are duplicated
oArea under PR-Curve (AUC)
80
Experiment: Entity Resolution
P1 P2
Measure PathCou
nt
PCRW PathSim PathCou
nt
PCRW PathSim
AUC 0.1324 0.0120 0.0097 0.0003 0.0014 0.0002
Linear Combination(optimal ) Meta Structure S
Measure PathCou
nt
PCRW PathSim SC SCSE BSCSE*
AUC 0.2898 0.2606 0.2920 0.5556 0.5640 0.564081
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work82
Meta-Paths Demo
Welcome to Meta-Paths
Interactive toolbox for querying
Heterogeneous Information Networks
by Examples
Start Learn More
83
New Query
84
FSPG Execution
The FSPG
algorithm will be
triggered on the
server, returning
the results upon
completion.
85
Generated Meta-Paths
86
Node Pair Generation
Results of the
Meta-Paths
algorithm are
shown. Upon
clicking
“Proceed”, the
node pair
generation
service will be
triggered on the
server.
87
Suggested Node Pairs
The user can
remove some of
the suggested
node pairs, and
use the remaining
pairs to refine the
Meta-Paths in an
iterative manner.
88
Fine-tuning Node Pairs
Click “Proceed” to
start the next
iteration, or
“Finish” to view
the final query
results.
89
Next Iteration
90
Final Results
Click “Save” to
keep a copy of
the query results.
Alternatively, click
“New” to start a
new query.
91
Final Results
92
Outline◦ Introduction
– Motivation
– Heterogeneous Information Network (HIN)
– Applications
◦ Meta-Path
– Definition
– Relevance Search
– Meta-Path Discovery
– Query Recommendation
◦ Meta-Structure
– Definition
– Relevance Search
◦ Demo
◦ Conclusions & Future Work93
94
Conclusions
oHeterogeneous Information Networks
are more powerful than Homogeneous
Information Networks
oMeta-path can capture the relevances
(similarities) between two nodes
oMeta-structure captures more complex
relationships in structures
95
Future Work
Dynamic Similarity Search on
Meta-Paths
oSometimes the direct relevance search
can not reveal the true relationship
among entities.
oSolutions: Dynamic Network Search
oProblems: 1. No efficient top-k query
algorithms. 2. No predicates or posterior
knowledge of the network
oML methods could help!
96
Future Work
Ming HINs with Meta Structure
oUse Meta Structure to perform various
data mining tasks on HINs, e.g.,
recommendation, classification and
clustering.
oDesign effective and efficient
techniques to discover meta structure
to express the relationship between
entities.
97
Future Work
Knowledge Graph exploration
oQ1: Given an entity of interest in a KG,
use different meta paths and meta
structures to find related entities,and
sort them according to relevance.
oQ2: Given some entity pairs, find some
meta structures to account for their
relationships (meta path version has
been solved).
98
Future Work
Personalized Knowledge Graph
oPersonalized Recommendation is
popular and useful in recommendation.
oRich information from query logs.
oQuestions: How to build a Personalized
KG for each user?
oStorage and efficiency
oPrivacy issues
99
Future Work
Knowledge Graph maintenance
oQ1: Build a domain-specific KG from
some given entity samples and a
document corpus.
oQ2: Expand a KG by crawling info from
internet.
oQ3: Error detection within a KG using
meta path and meta structures.
oQ4: Error correction automatically.
100
Future Work
Knowledge Graph cleaning
oRelations / Nodes in KG are inherently
“dirty” (many are curated based on
automatic tools / scripts, which lead to
duplications or error data)
oHow to clean the Knowledge Graph by
removing dirty relations / nodes ?
101
Future Work
Machine Learning
oMachine learning / deep learning is so
hot nowadays !
oHow to leverage the techniques in
machine learning / deep learning to
better enhance the heterogeneous
information networks (or knowledge
graphs) ?
102
Future Work
Bioinformatics
oThe network is also very common in the
biology. This can help interpret the
network more accurately.
oMulti-discipline is very popular now.
oCan we find some typical examples in
biological information networks and
use meta-path or meta-structure to
analyze them?
Thanks ! Q & A
Reynold
Cheng
Zhipeng
Huang
Yudian
Zheng
Jing
Yan
Ka Yu
Wong
Eddie
Ng
Database
Group: