mfi-transsw+€¦ · mfi-transsw+: efficiently mining frequent itemsets in clickstreams franklin...

Post on 29-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

Franklin Anderson de Amorim

17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016

Marco Antonio CasanovaGisele Rabello Lopes

Bernardo Pereira Nunes

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

Franklin Anderson de Amorim

17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016

Marco Antonio CasanovaGisele Rabello Lopes

Bernardo Pereira Nunes

Agenda

1. Frequent Itemsets and Data Streams 2. MFI-TransSW+ algorithm 3. ClickRec Recommendation System 4. Experiments and results.

Frequent Itemsets

{bread,milk,coffee},{bread,milk,cheese},{bread,cheese}

Item transaction

Itemsets k=2 Support

bread, milk 2

bread, coffee 1

milk, coffee 1

bread, cheese 2

milk, cheese 1

X is frequent if and only if sup(X) ≥ N · s, were N is the number of transactions and s is a limit, defined by the user, called minimum support.

s = 0.5

Frequent itemset

N = 3If a set I of items is frequent, then so is every subset of I.

Data Streams

{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f},{a,b,c}...

Data stream

{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f}

Data Stream - Sliding Windows

Sliding window

window size = 6,{a,b,c}

MFI-TransSW & MFI-TransSW+

MFI-TransSW

• Process sliding windows • Uses bit vectors

bit(x)=101001

(original algorithm)

MFI-TransSW

Phases1. Load window 2. Slide window 3. Generate frequent itemsets

T1=(acd) ,T4=(be)

bit(a)=1

bit(b)=0

bit(c)=1

bit(d)=1

bit(e)=0

MFI-TransSW

01

0

0

1

window size=3

1

1011

11

00

1

, T3=(abce), T2=(bce)

Data stream Loading and sliding window

bit(a)=1

bit(b)=0

bit(c)=1

bit(d)=1

bit(e)=0

T1=(acd) ,T4=(be)

MFI-TransSW

left bit-shift

01

0

0

11

1011

11

00

1

, T3=(abce), T2=(bce)

Data stream

window size=3

Loading and sliding window

bit(a)=101

bit(b)=011

bit(c)=111

bit(d)=100

bit(e)=011

freq(a)=2

freq(b)=2

freq(c)=3

freq(e)=1

freq(f)=2

MFI-TransSW

window size=3 s=0.5

Mining frequent itemsets

bit(a)=101

bit(b)=011

MFI-TransSW

freq(a)=2

freq(b)=2

bit(a <and> b)=001 freq(a <and> b)=1

bitwise AND

window size=3 s=0.5

Mining frequent itemsets

• Fast • Finds all frequent itemsets • No false positives or false negatives • On-demand generation of frequent

itemsets • Small memory footprint

MFI-TransSW

(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b)

({a}),({b,c}),({a,b})

,(user-2,a)

({a}),({a,b,c}),({a,b})Transactions

Clickstream

MFI-TransSW+

List of UID's

0 1 2

user-1 user-2 user-3

0 1 2

bit(a) 1 0 1bit(b) 0 1 1bit(c) 0 1 0

bit(a) 1 1 1

MFI-TransSW+(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a)Clickstream

0 1 2

bit(a) 1 1bit(b) 1 1bit(c) 0 1 0

List of UID's

0 1 2

user-2 user-3

List of Bit Vectors per User

0 1 2

0,1,2 0,1

0

1

2

,(user-4,b)

window size=3

1001

user-1user-4

01

• Process clickstreams • Uses bit vectors as circular lists • More efficient “clean and update" • Faster

MFI-TransSW+

ClickRec

A news article realtime recommendation system based on web clickstreams and semantic annotations.

ClickRec

ClickRec01100100 01100001 01110100

1) Data Streams Processor

Clickstream 2) Frequent Itemsets Miner

3) Recommender

MFI-TransSW+ MFI-TransSW+

ClickRec

(user-1, {a,b,c})

ClickRec

(user-1, {<tag1>, <tag2>,<tag3>,<tag4>})

(user-1, {a,b,c})

ClickRec

(user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})

ClickRec<messi> <neymar>

<c.ronaldo> <barcelona> <messi>

TF-IDF

TF-IDF

<neymar> <barcelona> <messi>

<c.ronaldo> <chelsea> <messi>

<c.ronaldo> <barcelona> <robben>

<neymar> <chelsea> <robben>

Frequent itemsets

Experiments

1. Real world clickstream from one of the largest news Web sites in Brazil

2. Total = 24 hours of clickstream = 25 million “clicks" (pageviews)

3. Two editorials: sports and entertainment

Experiments

1. Load a window with w transactions 2. Execute 10k slidings 3. Measure the time to execute item 2

ExperimentsMFI-TransSW vs MFI-TransSW+

MFI-TransSW vs MFI-TransSW+Ex

ecut

ion

time

(sec

onds

)

MFI-TranSW MFI-TranSW+

0,41

41,45

Window Size = 1.000

100x faster

Tim

es fa

ster

Window Size

1,00

0

2,00

0

3,00

0

4,00

0

5,00

0

6,00

0

7,00

0

8,00

0

9,00

0

10,0

00

816x

666x623x

521x476x

413x

337x286x

216x

102x

MFI-TransSW vs MFI-TransSW+

Experiments

Window sizeExecution time (seconds)

MFI-TranSW MFI-TranSW+1.000 41,45 0,412.000 136,74 0,633.000 272,24 0,954.000 395,55 1,185.000 533,10 1,296.000 761,31 1,607.000 996,10 1,918.000 1.295,16 2,089.000 1.484,10 2,23

10.000 1.928,76 2,36

MFI-TransSW vs MFI-TransSW+

1. Divide clickstream in pairs of two consecutive hours A. The first hour is used to mine the frequent itemsets

B. The second hour is used to extract a sample of 10k users (the sample users must have accessed more than one page)

2. Test recommendations C. Feed the first page accessed by the user to ClickRec,

which recommends 10 pages to the user

D. Verify if the user accessed one of the recommendations

ExperimentsClickRec

ExperimentsClickRec

Hit

rate

0%

5%

10%

15%

20%

25%

30%

35%

40%

0:00

vs

1:00

6:00

vs

7:00

12:0

0 vs

13:

00

18:0

0 vs

19:

00

Sports editorial

Morning Afternoon NightLate Night

ExperimentsClickRec

Hit

rate

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

0:00

vs

1:00

6:00

vs

7:00

12:0

0 vs

13:

00

18:0

0 vs

19:

00

Entertainment editorial

Morning Afternoon NightLate Night

Conclusion

Conclusion

MFI-TransSW+ • Processes clickstreams • Uses bit vectors as circular lists • Up to 2 orders of magnitude faster than

the original algorithm (MFI-TransSW)

Conclusion

ClickRec • Based on MFI-TransSW+ • Uses semantic annotations • Generates recommendations in

realtime • Hit rate > 20%

References

[Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32, 1994. 3, 4.1.3

[Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3

[Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20

Thanks!

top related