kdd’09,june 28-july 1,2009,paris,france copyright 2009 acm frequent pattern mining with uncertain...

21
KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Upload: loraine-simpson

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

KDD’09,June 28-July 1,2009,Paris,FranceCopyright 2009 ACM

Frequent Pattern Mining with

Uncertain Data

Page 2: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

OutlineIntroduction

Definition

Algorithm

Experiment Results

Conclusion

Page 3: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Introduction

This paper will study the problem of frequent pattern mining

by examining the relative behavior of the extensions of well known classes of deterministic algorithms.

Page 4: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Definition

Page 5: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Definition

Page 6: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

AlgorithmStep1. Extending the H-mine AlgorithmStep2. Extending the FP-growth AlgorithmStep3.Computation of Support Upper

BoundsStep4.Mining Frequent Patterns with UFP-

treeStep5. Determining Support with a Trie

Tree

Page 7: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

H-Mine (Example)TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m

300300 a, b, d, e, g, a, b, d, e, g, kk

400400 a, c, d, ha, c, d, h

min_sup_count = 2min_sup_count = 2

Scan TDB Complete set of frequent items Complete set of frequent items can be found and outputcan be found and output ::{ { a:3, c:3, d:4, e:3, g:2a:3, c:3, d:4, e:3, g:2 } }

Following the alphabetical Following the alphabetical order of frequent items order of frequent items (called (called F-listF-list): ): a-c-d-e-ga-c-d-e-g

IDID Frequent-item Frequent-item projectionprojection

100100 c, d, e, gc, d, e, g

200200 a, c, d, ea, c, d, e

300300 a, d, e, ga, d, e, g

400400 a, c, da, c, dBuild Build H-structH-struct in in

main memorymain memoryScan TDBScan TDB

Page 8: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

H-Mine (Example)TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m

300300 a, b, d, e, g, a, b, d, e, g, kk

400400 a, c, d, ha, c, d, h

min_sup_count = 2min_sup_count = 2

Scan TDB Complete set of frequent items Complete set of frequent items can be found and outputcan be found and output ::{ { a:3, c:3, d:4, e:3, g:2a:3, c:3, d:4, e:3, g:2 } }

Following the alphabetical Following the alphabetical order of frequent items order of frequent items (called (called F-listF-list): ): a-c-d-e-ga-c-d-e-g

IDID Frequent-item Frequent-item projectionprojection

100100 c, d, e, gc, d, e, g

200200 a, c, d, ea, c, d, e

300300 a, d, e, ga, d, e, g

400400 a, c, da, c, dBuild Build H-structH-struct in in

main memorymain memoryScan TDBScan TDB

Page 9: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

H-Mine (Example)TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m

300300 a, b, d, e, g, a, b, d, e, g, kk

400400 a, c, d, ha, c, d, h

min_sup_count = 2min_sup_count = 2

Scan TDB Complete set of frequent items Complete set of frequent items can be found and outputcan be found and output ::{ { a:3, c:3, d:4, e:3, g:2a:3, c:3, d:4, e:3, g:2 } }

Following the alphabetical Following the alphabetical order of frequent items order of frequent items (called (called F-listF-list): ): a-c-d-e-ga-c-d-e-g

IDID Frequent-item Frequent-item projectionprojection

100100 c, d, e, gc, d, e, g

200200 a, c, d, ea, c, d, e

300300 a, d, e, ga, d, e, g

400400 a, c, da, c, dBuild Build H-structH-struct in in

main memorymain memoryScan TDBScan TDB

Page 10: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

H-Mine (Example) (Cont.)aa cc dd ee gg

33 33 44 33 22

cc dd ee gg

aa cc dd ee

aa dd ee gg

aa cc dd

100100

200200

300300

400400FrequentFrequent

projectionsprojections

HeaderHeadertable Htable H

H-StructH-Struct

Page 11: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

H-Mine (Example) (Cont.)

cc dd ee gg

aa cc dd ee

aa dd ee gg

aa cc dd

100100

200200

300300

400400FrequentFrequent

projectionsprojections

cc dd ee gg

22 33 22 11

HeaderHeadertable Htable H

aa cc dd ee gg

33 33 44 33 22HeaderHeadertable Htable H

ac: 2ac: 2ad: 3ad: 3ae: 2ae: 2

Page 12: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

H-Mine (Example) (Cont.)

a:3, c:3, d:4, e:3, g:2,a:3, c:3, d:4, e:3, g:2,ac:2, ad:3, ae:2,ac:2, ad:3, ae:2,

acd:2,acd:2,ade:2,ade:2,

cd:3, ce:2,cd:3, ce:2,cde:2,cde:2,

de:3, dg:2,de:3, dg:2,deg:2,deg:2,eg: 2eg: 2

TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m

300300 a, b, d, e, g, a, b, d, e, g, kk

400400 a, c, d, ha, c, d, h

min_sup_count = 2min_sup_count = 2

OutputOutput

Page 13: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

FP-growth(Example)

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

min_support = 3

TID Items bought (ordered) frequent items100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}200 {a, b, c, f, l, m, o} {f, c, a, b, m}300 {b, f, h, j, o, w} {f, b}400 {b, c, k, s, p} {c, b, p}500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

f-c-a-m-p

Page 14: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Computation of Support Upper Bounds

corollarycorollary

Page 15: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Mining Frequent Patterns with UFP-tree

Goal: It avoids recursively constructing conditional FP-trees.

Page 16: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Trie Tree

Page 17: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Experiment Results

Page 18: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Experiment Results

Page 19: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Experiment Results

Page 20: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Experiment Results

Page 21: KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data

Conclusion

In this tests, we found UApriori and UH-mine are both efficient in mining frequent itemsets.