an efficient algorithm for discovering frequent subgraphs michihiro kuramochi and george karypis...

An Efficient Algorithm for Discovering Frequent Subgraphs

Michihiro Kuramochi and George KarypisICDM, 2001報告者：蔡明瑾

Introduction

Structural pattern Biology, chemistry Chemical compounds

graph vertex– item edge – relation between items

Undirected connected labeled graph

b

a

x

a

y

x

Graph Isomorphism

b

a

x

a

x

y

a

b

x

a

y

x

G1(V1,E1) and G2(V2,E2) are topologically identical to each other.

There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa.

v0

v1 v2

v0

v1 v2

=

Canonical labeling

Adjacency listb

a

x

a

x

y

v0

v1 v2

v0

v1

v2

v0

b

v1

a

v2

a

x x

x y

x ycode = baaxxy

a

b

x

a

y

x

v0

v1 v2

v0

v1

v2

v0

a

v1

b

v2

a

x y

x x

y xcode = abaxyx

||

Canonical labeling

Different permutation of vertices lead to different canonical label.

|v|! Largest codes

Vertex invariants

Properties don’t change across isomorphism mappings. Vertex degree Vertex label siblings

b

a

x

a

x

y

Vertex Degrees and Labels

Adjacency Matrix Partitioning verteices by degrees and labels

that every partition contains vertices with same degree and label

Degree ： p0={v0,v1,v3}:2

Degree+label ：p0={ v1,v2}:(2,a),p1={v0}:(2,b)


b

a

x

a

x

y

v0

v1 v2

v0

v1

v2

v0

b

v1

a

v2

a

x x

x y

x ycode = baaxxy


b

a

x

a

x

y

v0

v1 v2

v1

v2

v0

v1

a

v2

a

v0

b

y x

y x

x xcode = aabyxx

p0={ v1,v2}:2,a,p1={v0}:2,b

原本： 3!

現在： 2!x 1!

Running example minsup =20

1

0 2

1 2 1

0

0

0

3

1 3

0

1

0 2

1

0

0 3

3

0

1

0

2

4

0

0

1

1

0

10

0

21

1

23

2

41

g0 g1 g2

Tid_list {0,1,2}

{0,2} {0,1} {2}

cl 010 021 123

Frequent 1_subgraph

Running example minsup =2

tid {0,1,2}

cl 010

child

{0,2}

021

{0,1}

123

0

10

0

21

1

23

0

1 2

0 10

1 1

0 00

101

23

Possible tid

{0,1,2}

c0 c2 c3

{0,2}

{0,1}

0

101

00

c1

{0,1,2}

c0,c1,c2,c3

c2 c3

……

0

1 2

0 10

10

23

c2 c30

101

00

c1

tid {0,2} {0,1,2} {0,1} {0,1}

cl 01201x 10000x 10203x 21133x

1

23

13

c4

tid {0,1,2}

cl 010

child

c1,c2,c3

{0,2}

021

{0,1}

123

0

10

0

21

1

23

c2 c3,c4

Frequent 2_subgraph

Frequency computing

Id-list Intersection two k-subgraph’s id-list

Frequent->find the support Not frequent -> pruned

Candidate generation

Joining two frequent k-subgraph ->k+1 candidate subgraph

Having same k-1 core Vertex labeling Multiple cores Multiple automorphisms

Vertex labeling

Multiple automorphism

Multiple cores

0

1 2

0 10

10

23

c2 c30

101

00

c11

23

13

c4

tid {0,1,2}

cl 010

child

c1,c2,c3

{0,2}

021

{0,1}

123

0

10

0

21

1

23

c2 c3,c4

0

1 2

0 1q1

tid {0,2}

cl 01201x

child

{0,1,2}

10000x

{0,1}

10203x

{0,1}

21133x

11

00

0

1 2

0 1

2

1

Possible tid

{0, 2}{0, 2}

q0,q1

q0 0

2

0 1 q2

1

00

{0,}

q1

0

2 1

1 0

1

0

{0, 2}

不符合 downward closure

不符合 downward closure

Experiment

AMD 1.53GHz 2GB main memory Linux OS chemical compound:

PTE(340),66 atom types and four bond types,27 edges/graph on average

DTP(223,644),104 atom types and three bound types and 22 edges/graph on average

Synthetic datasets

PTE and DTP

Synthetic datasets

Synthetic datasets |D|=10000,|S|=200,|LE|=1,minsup=2%

an efficient algorithm for discovering frequent subgraphs michihiro kuramochi and george karypis...

Documents