an efficient algorithm for discovering frequent subgraphs michihiro kuramochi and george karypis...
TRANSCRIPT
An Efficient Algorithm for Discovering Frequent Subgraphs
Michihiro Kuramochi and George KarypisICDM, 2001報告者:蔡明瑾
Introduction
Structural pattern Biology, chemistry Chemical compounds
graph vertex– item edge – relation between items
Undirected connected labeled graph
b
a
x
a
y
x
Graph Isomorphism
b
a
x
a
x
y
a
b
x
a
y
x
G1(V1,E1) and G2(V2,E2) are topologically identical to each other.
There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa.
v0
v1 v2
v0
v1 v2
=
Canonical labeling
Adjacency listb
a
x
a
x
y
v0
v1 v2
v0
v1
v2
v0
b
v1
a
v2
a
x x
x y
x ycode = baaxxy
a
b
x
a
y
x
v0
v1 v2
v0
v1
v2
v0
a
v1
b
v2
a
x y
x x
y xcode = abaxyx
||
Canonical labeling
Different permutation of vertices lead to different canonical label.
|v|! Largest codes
Vertex invariants
Properties don’t change across isomorphism mappings. Vertex degree Vertex label siblings
b
a
x
a
x
y
Vertex Degrees and Labels
Adjacency Matrix Partitioning verteices by degrees and labels
that every partition contains vertices with same degree and label
Degree : p0={v0,v1,v3}:2
Degree+label :p0={ v1,v2}:(2,a),p1={v0}:(2,b)
Vertex Degrees and Labels
b
a
x
a
x
y
v0
v1 v2
v0
v1
v2
v0
b
v1
a
v2
a
x x
x y
x ycode = baaxxy
Vertex Degrees and Labels
b
a
x
a
x
y
v0
v1 v2
v1
v2
v0
v1
a
v2
a
v0
b
y x
y x
x xcode = aabyxx
p0={ v1,v2}:2,a,p1={v0}:2,b
原本: 3!
現在: 2!x 1!
Running example minsup =20
1
0 2
1 2 1
0
0
0
3
1 3
0
1
0 2
1
0
0 3
3
0
1
0
2
4
0
0
1
1
0
10
0
21
1
23
2
41
g0 g1 g2
Tid_list {0,1,2}
{0,2} {0,1} {2}
cl 010 021 123
Frequent 1_subgraph
Running example minsup =2
tid {0,1,2}
cl 010
child
{0,2}
021
{0,1}
123
0
10
0
21
1
23
0
1 2
0 10
1 1
0 00
101
23
Possible tid
{0,1,2}
c0 c2 c3
{0,2}
{0,1}
0
101
00
c1
{0,1,2}
c0,c1,c2,c3
c2 c3
……
0
1 2
0 10
10
23
c2 c30
101
00
c1
tid {0,2} {0,1,2} {0,1} {0,1}
cl 01201x 10000x 10203x 21133x
1
23
13
c4
tid {0,1,2}
cl 010
child
c1,c2,c3
{0,2}
021
{0,1}
123
0
10
0
21
1
23
c2 c3,c4
Frequent 2_subgraph
Frequency computing
Id-list Intersection two k-subgraph’s id-list
Frequent->find the support Not frequent -> pruned
Candidate generation
Joining two frequent k-subgraph ->k+1 candidate subgraph
Having same k-1 core Vertex labeling Multiple cores Multiple automorphisms
Vertex labeling
Multiple automorphism
Multiple cores
0
1 2
0 10
10
23
c2 c30
101
00
c11
23
13
c4
tid {0,1,2}
cl 010
child
c1,c2,c3
{0,2}
021
{0,1}
123
0
10
0
21
1
23
c2 c3,c4
0
1 2
0 1q1
tid {0,2}
cl 01201x
child
{0,1,2}
10000x
{0,1}
10203x
{0,1}
21133x
11
00
0
1 2
0 1
2
1
Possible tid
{0, 2}{0, 2}
q0,q1
q0 0
2
0 1 q2
1
00
{0,}
q1
0
2 1
1 0
1
0
{0, 2}
不符合 downward closure
不符合 downward closure
Experiment
AMD 1.53GHz 2GB main memory Linux OS chemical compound:
PTE(340),66 atom types and four bond types,27 edges/graph on average
DTP(223,644),104 atom types and three bound types and 22 edges/graph on average
Synthetic datasets
PTE and DTP
Synthetic datasets
Synthetic datasets |D|=10000,|S|=200,|LE|=1,minsup=2%