top down fp-growth for association rule mining
DESCRIPTION
Top Down FP-Growth for Association Rule Mining. By Ke Wang. Introduction. Classically, for rule A B : support: computed by count ( AB ) frequent --- if pass minimum support threshold confidence: computed by count ( AB ) / count( A ) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/1.jpg)
1
Top Down FP-Growth for Association Rule Mining
ByKe Wang
![Page 2: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/2.jpg)
2
Introduction
• Classically, for rule A B :– support: computed by count( AB )
• frequent --- if pass minimum support threshold
– confidence: computed by count( AB ) / count(A )• confident – if pass minimum confidence
threshold
• How to mine association rules?– find all frequent patterns– generate rules from the frequent
patterns
![Page 3: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/3.jpg)
3
Introduction
• Limitations of current research– use uniform minimum support
threshold– only use support as pruning measure
• Our contribution– improve efficiency– adopt multiple minimum supports– introduce confidence pruning
![Page 4: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/4.jpg)
4
Related work -- Frequent pattern
mining• Apriori algorithm– method: use anti-monotone property of
support to do pruning, i.e.• if length k pattern is infrequent, its length
k+1 super-pattern can never be frequent
• FP-growth algorithm--better than Apriori– method:
• build FP-tree to store database• mine FP-tree in bottom-up order
![Page 5: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/5.jpg)
5
Related work -- Association rule
mining• Fast algorithms trying to
guarantee completeness of frequent patterns
• Parallel algorithms & association rule based query languages
• Various association rule mining problems– multi-level multi-dimension rule– constraints on specific item
![Page 6: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/6.jpg)
6
TD-FP-Growth for frequent pattern mining• Similar tree structure as FP-growth
– Compressed tree to store the database– nodes on each path of the tree are
globally ordered
• Different mining method VS.FP-growth– FP-growth: bottom-up tree mining – TD-FP-Growth : top-down tree mining
![Page 7: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/7.jpg)
7
TD-FP-Growth for frequent pattern mining
b: 2
root
b: 1 c: 1
a: 3
e: 1
c: 1e: 1
c: 1
e: 1
Header Table H
FP-tree and header table H
b, ea, b, c, e
b, c, ea, c, d
aminsup = 2
Entry value count side-link
a
b
c
e
3
3
3
3
Construct a FP-tree:
![Page 8: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/8.jpg)
8
FP-tree and header table H
b, ea, b, c, e
b, c, ea, c, d
aminsup = 2
Header Table H
item Head of node-link
a
b
c
e
TD-FP-Growth for frequent pattern miningFP-growth: bottom-up mining
b: 2
root
b: 1 c: 1
a: 3
e: 1
c: 1e: 1
c: 1
e: 1
(b: 1)(b: 1, c: 1)(a: 1, b: 1, c: 1)
e’s conditional pattern base
Mining order:e, c, b, a
![Page 9: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/9.jpg)
9
TD-FP-Growth for frequent pattern mining• FP-growth: bottom-up mining
(b: 1)(b: 1, c: 1)(a: 1, b: 1, c: 1)
e’s conditional pattern base
root
b: 3
c: 2e’s conditional FP-tree
item Head of node-link
b
c
drawback!• both e’s conditional pattern base and conditional FP-tree are stored in memory
• mine e’s conditional FP-tree recursively
• conditional pattern bases and FP-trees are built for all other items and their super-patterns
![Page 10: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/10.jpg)
10
TD-FP-Growth for frequent pattern mining• TD-FP-Growth : adopt top-down
mining strategy– motivation: avoid building extra
databases and sub-trees as FP-growth does
– method: process nodes on the upper level before those on the lower level
– result: any modification happened on the upper level nodes would not affect the lower level nodes
See example
![Page 11: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/11.jpg)
11
TD-FP-Growth for frequent pattern mining
b, ea, b, c, e
b, c, ea, c, d
aminsup = 2
Header Table H
CT-tree and header table H
Entry value count side-link
a
b
c
e
3
3
3
3
b: 2
root
b: 1 c: 1
a: 3
e: 1
c: 1e: 1
c: 1
e: 1
Mining order:a, b, c, e
![Page 12: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/12.jpg)
12
CT-tree for frequent pattern mining
b, ea, b, c, e
b, c, ea, c, d
aminsup = 2
a: 2
b: 1root
b: 2 a: 3
Entry value count side-link
a
b
c
e
3
3
3
3e: 1 c: 1 b: 1 c: 1
CT-tree and header table H
e: 1
sub-header-table H_c
Entry value count
side-link
a
b
2
2
![Page 13: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/13.jpg)
13
CT-tree for frequent pattern mining• Completeness
– for entry i in H, we mine all the frequent patterns that end up with item i, no more and no less
• Complete set of frequent patterns:{a }
{b }{c }, {b, c }, {a, c } {e }, {b, e }, {c, e }, {b, c, e }
![Page 14: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/14.jpg)
14
TD-FP-Growth for frequent pattern mining• Comparing to FP-growth, TD-FP-
Growth is:– Space saving:
• only one tree and a few header tables• no extra databases and sub-trees
– Time saving:• does not build extra databases and sub-
trees• walk up path only once to update count
information for nodes on the tree and build sub-header-tables.
![Page 15: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/15.jpg)
15
TD-FP-Growth for association rule mining• Assumptions:
– There is a class-attribute in the database– Items in the class-attribute called class-
items, others are non-class-items– Each transaction is associated a class-item – Only class-item appears in the right-hand
of the ruleTransaction ID
non-class-attribute
class-attribute
1 a, b… C1
2 d… C2
3 e, d, f… C3
… … …
example rule:a, b Ci
![Page 16: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/16.jpg)
16
TD-FP-Growth for association rule mining--multi mini support• Why?
– Use uniform minimum support, computation of count considers only number of appearance
– Uniform minimum support is unfair to items that appears less but worth more. • Eg. responder vs. non-responder
• How?– Use different support threshold for
different class
![Page 17: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/17.jpg)
17
TD-FP-Growth for association rule mining -- multi mini support• multiple VS. uniform
– C1 : 4, C 2 : 2– rules with relative minsup = 50%
proportional to each class -- multiplier in performance• uniform minimum support: absolute minsup
= 1; – 11 nodes tree, 23 rules
• multiple minimum supports: absolute minsup1 = 2; absolute minsup2 = 1;
– 7 nodes tree, 9 rules– more effective and space-saving– time-saving --- show in performance
c, f, C1
b, e, C2
b, e, f, C1
a, c, f, C1
c, e, C2
b, c, d, C1
![Page 18: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/18.jpg)
18
TD-FP-Growth for association rule mining--conf pruning• Motivation
– make use of the other constraint of association rule: confidence, to speed up mining
• Method– confidence is not anti-monotone– introduce: acting constraint of
confidence, which is anti-monotone– push it inside the mining process
![Page 19: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/19.jpg)
19
TD-FP-Growth for association rule mining--conf pruning
conf(A B) = count(AB) / count(A) >= minconf
count(AB) >= count(A) * minconf
count(AB) >= minsup * minconf
(anti-monotone & weaker)
--- the acting constraint of confidence for the original confidence constraint of rule A B
• support of rule is computed by: count(A) • count(AB): class-count of itemset A related to class B
![Page 20: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/20.jpg)
20
TD-FP-Growth for association rule mining--conf pruning
c, f, C1
b, e, C2
b, e, f, C1
a, c, f, C1
a, c, d, C2
minsup = 2minconf= 60%
root
Entry value i
count (i) count(i,C1) count(i,C2) side-link
a
b
c
ef
2
2
3
23
1
1
2
13
1
1
1
10
…
…
…
……
Header table H: count(i) = count(i, C1) + count(i, C2)
…
count(e) >= minsup; However,both count(e, C1) & count(e, C2) < minsup * minconf;
terminate mining for e!
Entry value i
count (i) count(i,Ci) count(i,C2) side-link
If no confidence pruning b 2 1 1
sub-header-table H_e
![Page 21: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/21.jpg)
21
Performance• Choose several data sets from UC_Irvine
Machine Learning Database Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html.
name of dataset
# of transactions
# of items in each
transactionclass distribution
# of distinct items
Dna-train 2000 6123.2%, 24.25%,
52.55%240
Connect-4 67557 439.55%, 24.62%,
65.83%126
Forest 581012 130.47%, 1.63%, 2.99%,
3.53%, 6.15%, 36.36%, 48.76%
15916
![Page 22: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/22.jpg)
22
Performance—frequent pattern
results on Forest
0
20
40
60
80
100
0% 2% 4% 6% 8% 10%
support threshold
CT-tree
Fp-growth
Apriori
![Page 23: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/23.jpg)
23
Performance — mine rules with multiple minimum supports
multiple sup on Forest
10
100
1000
10000
100000
0% 1% 2% 3% 4% 5%
multiplier (minconf =90%)
CT-multi-supApri-uni-supCT-uni-sup
relative minsup, proportional to each class
FP-growth is only for frequent
pattern mining
![Page 24: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/24.jpg)
24
Performance — mine rules with confidence pruning
conf-pruning on Forest
0
100
200
300
0.04% 0.05% 0.06% 0.07% 0.08% 0.09% 0.10%
support threshold(minconf = 90%)
CT-conf-prune
Apriori
CT-no-conf-prune
![Page 25: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/25.jpg)
25
Conclusions and future work• Conclusions of TD-FP-Growth algorithm
– more efficient in finding both frequent patterns and association rules
– more effective in mining rules by using multiple minimum supports
– Introduce a new pruning method: confidence pruning, and push it inside the mining process; thus further speed up mining
![Page 26: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/26.jpg)
26
Conclusions and future work• Future work
– Explore other constraint-based association rule mining method
– Mine association rules with item concept hierarchy
– Apply TD-FP-Growth to applications based on association rule mining• Clustering• Classification
![Page 27: Top Down FP-Growth for Association Rule Mining](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813295550346895d992b5b/html5/thumbnails/27.jpg)
27
Reference• (1) R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between
sets of items in large databases. Proc. 1993 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’93), pages 207-216, Washington, D.C., May 1993.
• (2) U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.). Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
• (3) H. Toivonen. Sampling large databases for association rules. Proc. 1996 Int. Conf. Very Large Data Bases (VLDB’96), pages 134-145, Bombay, India, September 1996.
• (4) R. Agrawal and S. Srikant. Mining sequential patterns. Proc. 1995 Int. Conf. Data Engineering (ICDE’95), pages 3-14, Taipei, Taiwan, March 1995.
• (5) J. Han, J. Pei and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’00), pages 1-12, Dallas, TX, May 2000.
• (6) J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. Proc. 2001 ACM-SIGMOD Int. Conf., Santa Barbara, CA, May 2001.
And more!