school of computer science carnegie mellon university big-align: fast bipartite graph alignment...

57
School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

Upload: madalyn-liff

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

School of Computer ScienceCarnegie Mellon University

BiG-Align: Fast Bipartite Graph Alignment

Danai Koutra Hanghang Tong David Lubensky

IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

Page 2: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 2

Can we identify users across social networks?

Same or “similar” users?

Page 3: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 3

More applications?

protein-protein alignment

chemical compound comparison

IR: synonym extraction

link prediction &viral marketing

Optical character

recognition

Structure matching in DB

wikitranslation

Page 4: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 4

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Page 5: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 5

Problem Definition

INPUT: A, B

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Page 6: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 6

Problem Definition

INPUT: A, B

OUTPUT: P and …(permutationmatrices)

P (users)

A B

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Page 7: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 7

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Problem Definition

INPUT: A, B

OUTPUT: P and Q(permutationmatrices)

s.t. min || PAQ - B|| F 2

P (users)

A B

Q (groups)

A Busers/groups

permutation of A

permutation of users/groups in

A

Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what?

constraints / relaxations

Page 8: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 8

Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

ug

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

uA B

P (users)

A B

Q (groups)

A B

Page 9: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 9

Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

CONSTRAINTS:(a) Pij, Qij = probabilities (not 1-1 mapping)

(b) sparse matrices P and Q (more efficient for large scale graphs)

ug

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

uA B

P (users)

A B

Q (groups)

A B

Page 10: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 10

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Page 11: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs

BiG-Align

vs.

other approaches

11

Page 12: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

BiG-Align

vs.

other approaches

12

Page 13: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

The hope is: the specific graph structure will lead to more

accurate graph alignment

BiG-Align

vs.

other approaches

13

Page 14: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

14

Page 15: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

(2) coupled alignment:individual & community-level

nodes

communities

15

Page 16: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

(2) coupled alignment:individual & community-level

(3) conversion of uni-partite graph to bi-partite --> clustering + (2)

nodes

communities

16

Page 17: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

(2) coupled alignment:individual & community-level

(3) conversion of unipartite graph to bipartite --> clustering + (2)

(4) general formulation: (a) match clouds of points (point-feature graph)(b) tensors (e.g. time-evolving, or other 3rd dimension)

17

users

emailtime

nodes

communities

Page 18: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 18

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Page 19: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 19

BiG-Align: algorithmDETAILS

untilconvergence

alternating, projected gradient descent

Page 20: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 20

BiG-Align: algorithmDETAILS

untilconvergence

Probabilistic

Constraint

Page 21: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 21

BiG-Align: algorithmDETAILS

untilconvergence

Sparsity Constraint

Page 22: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 22

BiG-Align: algorithmDETAILS

untilconvergence

Sparsity Constraint

min f = min||| PAQ – B||F 2 + λΣPij +

μΣQij

Page 23: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 23

RoadMap

• Problem Definition• What’s different?• BiG-Align

Optimizations• Uni-Align• Conclusions

Page 24: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 24

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

Page 25: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 25

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

alternating, projected gradient descent

Page 26: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 26

Optimization 1:Structurally equivalent nodes

DETAILS

• Aggregation to super-nodes

Graph A

Page 27: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 27

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

alternating, projected gradient descent

Page 28: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 28

Optimization 2:Initialization of P and Q

DETAILS

• Why is the initialization important?

global minimumlocal minima

Page 29: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 29

• Social networks are structured: the degree distribution is power-law like.

Optimization 2:Initialization of P and Q

DETAILS

ranked nodes

log(

degr

ee)

Page 30: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 30

Optimization 2:Initialization of P and Q

DETAILS• Network-inspired initialization

cluster 1

cluster 2cluster n

cluster 2 cluster n

k

k

user degrees of GA

user degrees in GB

……………

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...…

2000

15

00

1000

945

940

800

799

750

740

735

730

… … … 3 2 1

1000800500450449445

…1

P

1-1 matching of top k nodes 1-1 matching of clusters of degrees

cluster 1

degr

ee

rank of node

knee

k

Page 31: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 31

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

alternating, projected gradient descent

Page 32: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 32

Optimization 3:Steps of gradient descent

DETAILS

• Constant step: thrashing or slow convergence

Page 33: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 33

Optimization 3:Steps of gradient descent

DETAILS

• Variable step with line search: strategy for local optimum

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas

Page 34: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 34

• Variable step with line search: strategy for local optimum

• BiG-Align-Exact: computes the steps at every iteration

Optimization 3:Steps of gradient descent

DETAILS

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas

Page 35: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 35

Optimization 3:Steps of gradient descent

DETAILS

• But

3.104104 2.104

10-4

3. 10-4

5. 10-4st

ep s

ize

(η)

iterations

Slow change in the steps

Page 36: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 36

Optimization 3:Steps of gradient descent

DETAILS

• But

• BiG-Align-Skip: compute η’s every m (=500) iterations

3.104104 2.104

10-4

3. 10-4

5. 10-4st

ep s

ize

(η)

iterations

Slow change in the steps

Page 37: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 37

RoadMap

• Problem Definition• What’s different?• BiG-Align

Experiments• Uni-Align• Conclusions

Page 38: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 38

Experimental Setup• Implementation: Matlab• Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres)

• Setup: random permutations noise level: 0 - 20 %

Ground truth

Simulate real-world applications

Page 39: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 39

State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

Page 40: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 40

State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

Bi-partite Uni-partite

Page 41: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 41

Big-Align: Accuracy vs. Runtime

marker size related to graph size

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact

Page 42: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 42

Big-Align: Accuracy vs. Runtime

Big-Align improves both speed and accuracy.

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact

Page 43: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 43

Big-Align: Accuracy w.r.t. noise

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama

Page 44: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 44

Big-Align: Accuracy w.r.t. noise

BiG-Align improves the accuracy for almost all levels of noise.

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama

Page 45: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 45

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Page 46: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 46

Algorithm: Uni-AlignDETAILS

n nodes

d features• node degree• clustering coeff•… …

min || PAQ - B||F 2

fixed

P

Page 47: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 47

Algorithm: Uni-AlignDETAILS

n nodes

d features

min || PAQ - B||F 2 P

P = g*(A,B,S,U)= = closed-form solution

SVDA = USVT

O(n.d2)

Page 48: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 48

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align

Experiments• Conclusions

Page 49: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 49

Uni-Align• Dataset: Facebook friendship graph

(64K users)

• Setup: uni-partite bi-partite graph Feature extraction

node degree egonet degree edges in egonet mean degree of node’s neighbors

egonet

Page 50: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 50

Uni-Align: Accuracy vs. Runtime

Uni-Align, followed by Net-Align, is more accurate and faster than other approaches.

NMF-based

NetAlign

Umeyama

Uni-Align

Page 51: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 51

Uni-Align: Runtime

Uni-Align is 2x - 31,700x faster depending on graph size.

Umeyama

Uni-Align

NMF-based

NetAlign-deg

NetAlign-full

Page 52: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 52

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Page 53: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 53

Conclusions

• Formulation: new problem / constraints

Page 54: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 54

Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs

Page 55: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 55

Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs• Evaluations: more accurate and efficient

Page 56: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 56

Beyond BiG-Align: Multi-way Linkage~

(1) All build upon BiG-Align (2) Led to 7 patents

– ~

S1: Dynamic Graph Linkage

– ~

S2: Community-level Linkage

S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage

Page 57: School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10

Danai Koutra (CMU) 57

Thank you!

http://www.cs.cmu.edu/[email protected]