clustering pathways using graph mining approach mahmud shahriar hossain monika akbar pramodh pochu...
TRANSCRIPT
![Page 1: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/1.jpg)
Clustering PathwaysUsing Graph Mining Approach
Mahmud Shahriar HossainMonika AkbarPramodh PochuVenkata Sesha Sanagavarapu
![Page 2: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/2.jpg)
2
Design Pipeline
Preprocessor
Frequent Subgraph Discovery
Graph Objects of Pathways
Mined Data
Pathway Clustering
STKE Dataset
NN Search Pathway Relations
![Page 3: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/3.jpg)
3
Dataset Properties (size)
Total Pathways = 50
Size of Pathway, k
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
105
110
Nu
mb
er
of
k-e
dg
e p
ath
wa
ys
0
1
2
3
4
![Page 4: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/4.jpg)
4
Dataset Properties (size)
Total Pathways = 50
Size Range
0-1
0
11
-20
21
-30
31
-40
41
-50
51
-60
61
-70
71
-80
81
-90
91
-10
0
10
0-1
10
Nu
mb
er
of
Pa
thw
ays
in S
ize
Ra
ng
e
0
1
2
3
4
5
6
7
8
9
10
11
12
13
![Page 5: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/5.jpg)
5
pf-ipf (tf-idf)
Transaction Items bought
David Lopez Orange Juice (2), Potato chip (3), Pepsi (1)
Robbie Lamb Potato chip (3), Pepsi (3), Beer (1)
Jonathan Branden Potato chip (1), Pepsi (1)
John Paxton Potato chip (2), Coconut Cookies (2), Pepsi (1)
Rafal Angryk Swiss Army Knife (15)
Jeannete Radclif Potato chip (2), Coconut Cookies (3)
Rocky Ross Orange Juice (2), Coconut Cookies (3)
Richard MaClaster Coconut Cookies (3), Beer (1)
………… ……………………………….
![Page 6: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/6.jpg)
6
Dataset Properties (pf-ipf)
Number of Edges in MPG = 1376
min_pfipf
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Nu
mb
er o
f ed
ges
left
0
200
400
600
800
1000
1200
1400
![Page 7: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/7.jpg)
7
Dataset Properties (pf-ipf)
Total Pathways=50
min_pfipf
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Nu
mb
er o
f p
ath
way
s le
ft
20
25
30
35
40
45
50
![Page 8: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/8.jpg)
8
Subgraph Discovery
k # of Subgraphs generated
Time (sec.)
1 1,376 Existing
2 5,380 41
3 29,565 149
4 187,508 971
5 1274,852 7518
--- ---- -----
min_sup=2%
• What so novel about pruning edges?
![Page 9: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/9.jpg)
9
Subgraph Discovery
Contour graph for number of subgraphs
min_sup4 6 8 10 12 14 16 18 20
pf-
ipf
thre
sho
ld0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
1000 2000 3000 4000
0
1000
2000
3000
4000
5000
6000
46
810
1214
1618
20
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Nu
mb
er o
f S
ub
gra
ph
s
min
_sup
pf-ipf threshold
Total Run: 10X9
0 1000 2000 3000 4000 5000 6000
![Page 10: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/10.jpg)
10
Subgraph Discovery
minsup= 4.0%min_tfidf= 0.01
k
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Tim
e (m
s)
0
50x103
100x103
150x103
200x103
250x103
300x103
350x103
400x103
FSGSEM
![Page 11: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/11.jpg)
11
Subgraph Discovery
minsup= 4.0%min_tfidf= 0.01
k
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Tim
e (m
s)
0
500
1000
1500
2000
2500
3000
FSGSEM
![Page 12: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/12.jpg)
12
Subgraph Discovery
minsup= 4.0%min_tfidf= 0.01
k
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# o
f A
tem
pts
0
250000
500000
750000
1000000
1250000
FSGSEM
k Number of Subgraphs
Time Saved (%)
Attempts Saved(%)
2 186 99.83 98.983 246 98.33 86.154 305 98.57 86.385 323 98.95 86.916 313 98.96 85.647 279 98.88 83.258 263 98.67 78.919 292 98.38 74.76
10 364 98.58 74.7511 470 98.76 78.0812 608 99.04 81.8413 785 99.22 85.0214 980 99.38 87.6315 1117 99.48 89.4816 1075 99.53 90.2617 804 99.51 89.4018 430 99.34 85.2219 141 98.76 71.2220 20 96.15 9.1921 1 75.74 -574.47
Overall attempts saved = 89.52%Overall time saved = 99.39%
![Page 13: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/13.jpg)
13
Clustering
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
6 8 10 12 14 16 18 200.010.020.030.040.050.060.070.080.090.10
Ave
rag
e S
C
min_sup
pf-
ipf
thre
sho
ld
Average SC Mesh plot for 10 clusters using different min_sup and pf-ipf threshold
0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22
![Page 14: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/14.jpg)
14
Clustering
Average SC Contour Graph for 10 clusters using different min_sup and pf-ipf
min_sup
4 6 8 10 12 14 16 18 20
pf-
ipf
thre
sh
old
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.08 0.10 0.12 0.14 0.16 0.18 0.20
![Page 15: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/15.jpg)
15
Nearest Neighbors
Each bar indicates 100 execution time of NN search of a pathway
Sample Pathway
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Tim
e (
ms
)
0
2000
4000
6000
8000
10000
12000
14000
16000
Cover Tree Brute-force
Cover Tree andBrute-force method
![Page 16: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/16.jpg)
16
Pathway Relations (StoryTelling)
Bidirectional Search
S
p1
p2
p3
T
p7
p8
p9
![Page 17: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/17.jpg)
17
Pathway Relations (StoryTelling)
Numbers of varying length storiesfor different branching factor
Story length, t
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Nu
mb
er
of
t-le
ng
th s
tori
es
0
50
100
150
200
250
300
350
b=2b=4b=6b=8
![Page 18: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/18.jpg)
18
Pathway Relations (StoryTelling)
Numbers of varying length storiesfor different branching factor
Story length, t
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Nu
mb
er
of
t-le
ng
th s
tori
es
0
50
100
150
200
250
300
350
b=2b=3b=4b=5b=6b=7b=8b=9b=10
![Page 19: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/19.jpg)
19
Pathway Relations (StoryTelling)
Branching factor, b
2 3 4 5 6 7 8 9 10
To
tal s
tori
es f
rom
all
pa
irs
0
200
400
600
800
1000
Branching factor, b
2 3 4 5 6 7 8 9 10
Tim
e t
o g
ene
rate
all
sto
rie
s (
ms)
0.0
200.0x103
400.0x103
600.0x103
800.0x103
1.0x106
1.2x106
1.4x106
Branching factor, b
2 3 4 5 6 7 8 9 10
Len
gth
of
the
lon
ges
t s
tory
4
6
8
10
12
14
16
![Page 20: Clustering Pathways Using Graph Mining Approach Mahmud Shahriar Hossain Monika Akbar Pramodh Pochu Venkata Sesha Sanagavarapu](https://reader035.vdocuments.net/reader035/viewer/2022062804/5697bf791a28abf838c8292f/html5/thumbnails/20.jpg)
20
Questions ???