sequential pattern mining using a bitmap representation jay ayres, johannes gehrke, tomi yiu, and...
TRANSCRIPT
![Page 1: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/1.jpg)
1
Sequential PAttern Mining Using A Bitmap Representation
Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason FlannickDept. of Computer Science Cornell University
(SIGKDD 2002)
Presenter李佩書 P76034525 楊璨瑜 P76034672 陳奕廷 P78031125 李昕純 Q56034035
2014/11/20
![Page 2: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/2.jpg)
2
Outline
1. Introduction
2. The SPAM algorithm
3. Data representation
4. Experimental
5. Conclusion & Discussion
2014/11/20
![Page 3: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/3.jpg)
3
Introduction
2014/11/20
![Page 4: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/4.jpg)
4
Sequential Patterns
• R. Agrawal and R. Srikant.(In ICDE 1995)• Algorithm : AprioriALL, AprioriSOME, PrefixSpan…
2014/11/20
![Page 5: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/5.jpg)
5
Problem
• Mining sequential patterns• Given a minimum support minSup• Find all frequent sequential patterns Sa
• supD(Sa) minSup
2014/11/20
![Page 6: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/6.jpg)
6
SPAM Algorithm
• Sequential PAttern Mining Algorithm• The first DFS(depth-first search) strategy for mining sequential patterns
• Vertical bitmap representation for simple, efficient counting.
2014/11/20
![Page 7: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/7.jpg)
7
The SPAM Algorithm
2014/11/20
![Page 8: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/8.jpg)
8
Lexicographic Tree• Sequence-extended Sequence (S-step)
• Generate by adding a new transaction consisting of a single item to the end of sequence
• Ex: ({a, b, c}, {a, b})→({a, b, c}, {a, b}, {a})
• Itemset-extended sequence (I-step)• Generate by adding an item to the last itemset in the sequence• Ex 1: ({a, b, c}, {a, b}) →({a, b, c}, {a, b, d})• Ex 2: ({a, b, c}, {a, b, d}) →({a, b, c}, {a, b, d, c})
• Identifies two sets of each node n• Sn: the set of candidate items for S-step extensions• In: the set of candidate items for I-step extensions
2014/11/20
![Page 9: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/9.jpg)
9
I={a,b}
2014/11/20
![Page 10: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/10.jpg)
10
Pruning
• Apriori-Based• Minimizing the size of Sn and In• Pruning candidate by DFS.
S-step Pruning I-step Pruning
2014/11/20
![Page 11: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/11.jpg)
11
S-step Pruning
S({a}) = {a, b, c, d}I({a}) = {b, c, d}S({a}, {a}) = S({a}, {b}) = {a, b, c, d}I({a}, {a}) = {b, c, d}I({a}, {b}) = {c, d}
2014/11/20
![Page 12: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/12.jpg)
12
I-step Pruning
S({a, b}) = S({a, d}) = {a, b}I({a}, {b}) = {c, d}I({a}, {d}) = {}
2014/11/20
![Page 13: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/13.jpg)
132014/11/20
![Page 14: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/14.jpg)
14
Data Representation
2014/11/20
![Page 15: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/15.jpg)
15
If the size of a sequence between 2k+1 and 2k+1 2k+1-bit sequence
• We store each candidate sequence as a vertical bitmap• Each customer is assigned a fixed slice of each bitmap
for all of its transactions
2014/11/20
![Page 16: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/16.jpg)
16
Bitmap of itemset{a}
1000
0100
1000
{b}
1110
1100
1100
{a,b}
1000
0100
1000
&
2014/11/20
![Page 17: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/17.jpg)
17
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
2014/11/20
![Page 18: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/18.jpg)
18
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
2014/11/20
![Page 19: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/19.jpg)
19
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
2014/11/20
![Page 20: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/20.jpg)
20
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
2014/11/20
![Page 21: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/21.jpg)
21
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
2014/11/20
![Page 22: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/22.jpg)
22
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
1
2014/11/20
![Page 23: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/23.jpg)
23
Bitmap of sequence• Define B(s) as the bitmap for sequence s.
• In sequence s• If the last itemset is in transaction j
and the other itemsets is in transaction before j • Then set 1 , otherwise set 0
• Example1:
Customer ID
Transaction ID
Itemset
1 1 {b}
1 2 {d}
1 3 {e}
1 4 {c}
({b},{c})
0
0
0
1
2014/11/20
![Page 24: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/24.jpg)
24
• Example2
Customer ID
Transaction ID
Itemset
1 1 {a,b,d}
1 3 {b,c,d}
1 6 {b,c,d}
-- -- --
({a},{b,d})
0
1
1
0
2014/11/20
![Page 25: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/25.jpg)
25
S-step ProcessStep 1 : S-Step Process to construct the transformed bitmap ({a})s
Step 2 : ANDing B({a})s and B({b})s
Support=2
2014/11/20
![Page 26: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/26.jpg)
26
S-step ProcessStep 1:S-Step Process to construct the transformed bitmap ({a})s
Step 2:ANDing B({a}) s and B({b})s
2014/11/20
![Page 27: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/27.jpg)
27
I-step Process
Support=2
2014/11/20
![Page 28: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/28.jpg)
28
I-step Process
2014/11/20
![Page 29: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/29.jpg)
29
Experimental
2014/11/20
![Page 30: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/30.jpg)
30
Comparison With SPADE and PrefixSpan
Method-1• Compare for various minimum support values on
Small datasetsMedium datasetsLarge datasets
• Methods-2Compare several parameters in the dataset Number of customersNumber of transactions per customerNumber of items per transactionAverage length of the maximal sequences
2014/11/20
![Page 31: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/31.jpg)
312014/11/20
![Page 32: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/32.jpg)
32
Conclusion & Discussion
2014/11/20
![Page 33: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/33.jpg)
33
CONCLUSION
• ALGORITHM• Outperforms SPADE and PrefixSpan on large datasets• Faster then SPADE and PrefixSpan
• DATA REPRESENTATION• Bitmap representation• S-step/I-step traversal• S-step/I-step pruning • Especially efficient when the sequential patterns are
very long
2014/11/20
![Page 34: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/34.jpg)
34
Implement SPAM algorithm
SPMF is an mining mining frameworkWritten in Java/Open-source data http://www.philippe-fournier-viger.com/spmf/index.php
2014/11/20
![Page 35: Sequential PAttern Mining Using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University](https://reader035.vdocuments.net/reader035/viewer/2022070308/551c3034550346ad4f8b62c6/html5/thumbnails/35.jpg)
35
DISCUSSION
1. SPAM assumes that the entire database completely fit into main memory, what is the solution ?
2. Why they set the size of a sequence between 2k+1 and 2k+1 ?
2014/11/20