mfi-transsw+€¦ · mfi-transsw+: efficiently mining frequent itemsets in clickstreams franklin...
Post on 29-Jul-2020
3 Views
Preview:
TRANSCRIPT
MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams
Franklin Anderson de Amorim
17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016
Marco Antonio CasanovaGisele Rabello Lopes
Bernardo Pereira Nunes
MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams
Franklin Anderson de Amorim
17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016
Marco Antonio CasanovaGisele Rabello Lopes
Bernardo Pereira Nunes
Agenda
1. Frequent Itemsets and Data Streams 2. MFI-TransSW+ algorithm 3. ClickRec Recommendation System 4. Experiments and results.
Frequent Itemsets
{bread,milk,coffee},{bread,milk,cheese},{bread,cheese}
Item transaction
Itemsets k=2 Support
bread, milk 2
bread, coffee 1
milk, coffee 1
bread, cheese 2
milk, cheese 1
X is frequent if and only if sup(X) ≥ N · s, were N is the number of transactions and s is a limit, defined by the user, called minimum support.
s = 0.5
Frequent itemset
N = 3If a set I of items is frequent, then so is every subset of I.
Data Streams
{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f},{a,b,c}...
Data stream
{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f}
Data Stream - Sliding Windows
Sliding window
window size = 6,{a,b,c}
MFI-TransSW & MFI-TransSW+
MFI-TransSW
• Process sliding windows • Uses bit vectors
bit(x)=101001
(original algorithm)
MFI-TransSW
Phases1. Load window 2. Slide window 3. Generate frequent itemsets
T1=(acd) ,T4=(be)
bit(a)=1
bit(b)=0
bit(c)=1
bit(d)=1
bit(e)=0
MFI-TransSW
01
0
0
1
window size=3
1
1011
11
00
1
, T3=(abce), T2=(bce)
Data stream Loading and sliding window
bit(a)=1
bit(b)=0
bit(c)=1
bit(d)=1
bit(e)=0
T1=(acd) ,T4=(be)
MFI-TransSW
left bit-shift
01
0
0
11
1011
11
00
1
, T3=(abce), T2=(bce)
Data stream
window size=3
Loading and sliding window
bit(a)=101
bit(b)=011
bit(c)=111
bit(d)=100
bit(e)=011
freq(a)=2
freq(b)=2
freq(c)=3
freq(e)=1
freq(f)=2
MFI-TransSW
window size=3 s=0.5
Mining frequent itemsets
bit(a)=101
bit(b)=011
MFI-TransSW
freq(a)=2
freq(b)=2
bit(a <and> b)=001 freq(a <and> b)=1
bitwise AND
window size=3 s=0.5
Mining frequent itemsets
• Fast • Finds all frequent itemsets • No false positives or false negatives • On-demand generation of frequent
itemsets • Small memory footprint
MFI-TransSW
(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b)
({a}),({b,c}),({a,b})
,(user-2,a)
({a}),({a,b,c}),({a,b})Transactions
Clickstream
MFI-TransSW+
List of UID's
0 1 2
user-1 user-2 user-3
0 1 2
bit(a) 1 0 1bit(b) 0 1 1bit(c) 0 1 0
bit(a) 1 1 1
MFI-TransSW+(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a)Clickstream
0 1 2
bit(a) 1 1bit(b) 1 1bit(c) 0 1 0
List of UID's
0 1 2
user-2 user-3
List of Bit Vectors per User
0 1 2
0,1,2 0,1
0
1
2
,(user-4,b)
window size=3
1001
user-1user-4
01
• Process clickstreams • Uses bit vectors as circular lists • More efficient “clean and update" • Faster
MFI-TransSW+
ClickRec
A news article realtime recommendation system based on web clickstreams and semantic annotations.
ClickRec
ClickRec01100100 01100001 01110100
1) Data Streams Processor
Clickstream 2) Frequent Itemsets Miner
3) Recommender
MFI-TransSW+ MFI-TransSW+
ClickRec
(user-1, {a,b,c})
ClickRec
(user-1, {<tag1>, <tag2>,<tag3>,<tag4>})
(user-1, {a,b,c})
ClickRec
(user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})
ClickRec<messi> <neymar>
<c.ronaldo> <barcelona> <messi>
TF-IDF
TF-IDF
<neymar> <barcelona> <messi>
<c.ronaldo> <chelsea> <messi>
<c.ronaldo> <barcelona> <robben>
<neymar> <chelsea> <robben>
Frequent itemsets
Experiments
1. Real world clickstream from one of the largest news Web sites in Brazil
2. Total = 24 hours of clickstream = 25 million “clicks" (pageviews)
3. Two editorials: sports and entertainment
Experiments
1. Load a window with w transactions 2. Execute 10k slidings 3. Measure the time to execute item 2
ExperimentsMFI-TransSW vs MFI-TransSW+
MFI-TransSW vs MFI-TransSW+Ex
ecut
ion
time
(sec
onds
)
MFI-TranSW MFI-TranSW+
0,41
41,45
Window Size = 1.000
100x faster
Tim
es fa
ster
Window Size
1,00
0
2,00
0
3,00
0
4,00
0
5,00
0
6,00
0
7,00
0
8,00
0
9,00
0
10,0
00
816x
666x623x
521x476x
413x
337x286x
216x
102x
MFI-TransSW vs MFI-TransSW+
Experiments
Window sizeExecution time (seconds)
MFI-TranSW MFI-TranSW+1.000 41,45 0,412.000 136,74 0,633.000 272,24 0,954.000 395,55 1,185.000 533,10 1,296.000 761,31 1,607.000 996,10 1,918.000 1.295,16 2,089.000 1.484,10 2,23
10.000 1.928,76 2,36
MFI-TransSW vs MFI-TransSW+
1. Divide clickstream in pairs of two consecutive hours A. The first hour is used to mine the frequent itemsets
B. The second hour is used to extract a sample of 10k users (the sample users must have accessed more than one page)
2. Test recommendations C. Feed the first page accessed by the user to ClickRec,
which recommends 10 pages to the user
D. Verify if the user accessed one of the recommendations
ExperimentsClickRec
ExperimentsClickRec
Hit
rate
0%
5%
10%
15%
20%
25%
30%
35%
40%
0:00
vs
1:00
6:00
vs
7:00
12:0
0 vs
13:
00
18:0
0 vs
19:
00
Sports editorial
Morning Afternoon NightLate Night
ExperimentsClickRec
Hit
rate
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0:00
vs
1:00
6:00
vs
7:00
12:0
0 vs
13:
00
18:0
0 vs
19:
00
Entertainment editorial
Morning Afternoon NightLate Night
Conclusion
Conclusion
MFI-TransSW+ • Processes clickstreams • Uses bit vectors as circular lists • Up to 2 orders of magnitude faster than
the original algorithm (MFI-TransSW)
Conclusion
ClickRec • Based on MFI-TransSW+ • Uses semantic annotations • Generates recommendations in
realtime • Hit rate > 20%
References
[Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32, 1994. 3, 4.1.3
[Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3
[Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20
Thanks!
top related