stagger: periodicity mining of data streams using expanding sliding windows mohamed g. elfeky walid...

14
STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/02 1 Chen Yi-Chun

Upload: josef-dandridge

Post on 15-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

STAGGER: Periodicity Mining of Data Streams using Expanding Sliding

Windows

Mohamed G. ElfekyWalid G.Aref

Ahmed K. Elmagarmid

ICDM 2006

2007/10/02 1Chen Yi-Chun

Outline

• Motivation• Previous Approach

– SPD algorithm– Max-Subpattern Tree

• Approximate Incremental Technique• Conclusion

2007/10/02 2Chen Yi-Chun

Motivation

abcabcabcabcabc….

p=3 p=3

Single sliding window

Smaller w, real-time output supportedLager w, long period found possible

Real-time output and long period found

……………………….

Multiple sliding window is proposed

p=3abc,*b*,a**,…

p=3abc,*b*,a**,…

p=3,5abc,*b*,a**,…

Period detection : SPD algorithm is usedPatterns mining : max-subpattern tree is used

2007/10/02 3Chen Yi-Chun

Periodicity Detection

• : the projection of a data stream S according to a period p starting from position l ,where n is the length of S.

• Ex. If S= abcabbabdb

,( 1)

= , ,...,p l l l p n ll p

p

S e e e

4,1

3,0

( ) bbb

( ) aaab

S

S

outlier2007/10/02 4Chen Yi-Chun

Cont.

• : the number of times the symbol s occurs in two consecutive positions in the data stream

• Ex. If S = abbaaabaa

• indicates how often the sysbol s occurs every p timestamps in a data stream S

2 ( , )F s S

2 2( , ) 3, ( , ) 1F a S F b S

2 ,( , ( ))

( ) / 1p lF s S

n l p

2007/10/02 5Chen Yi-Chun

Cont.

• If a data stream S of length n contains a symbol s and

• Then s is said to be periodic in S with a period of length p at position l with respect to periodicity threshold

• Ex. S= abcabbabdb ,

– The symbol a is periodic with a period of length 3 at position 0 where respect to a periodicity threshold

– The pattern a * * is a frequent single periodic pattern of length 3

2 ,( , ( ))

( ) / 1p lF s S

n l p

2 3,0( , ( )) 2

(10 0) / 3 1 3

F a S

2 / 3

2007/10/02 6Chen Yi-Chun

SPD-algorithm

• To detect the symbols that are periodic with period length p within S

• Shift S by p positions , denoted as • Ex. If S = a b c a b b a b c b• .. = * * * a b c a b b a

( )pS

(3)S

2007/10/02 7Chen Yi-Chun

SPD algorithm in Time-Series

a:001 b:010 c:100

(a c c c a b b)

P=1 ………..

P=4 …………………………………………

=XXX

=YYY

Reference “Periodicity Detection in Time Series Databases” [TKDE05]

2007/10/02 8Chen Yi-Chun

Single Window with SPD

0 0 1 0 0 1

1 0 0 0 0 1 0 0 10 1 0 0 0 1

Shift 1 slide 2

12(a c c c a b b)

2007/10/02 9Chen Yi-Chun

Multi Windows with SPD

output

output

output

output

Smaller w, real-time output supportedLager w, long period found possible

2007/10/02 10Chen Yi-Chun

Max-Subpattern Tree

Reference “Incremental, Online, and Merge Mining of Partial Periodic Patternsin Time-Series Databases” [TKDE04]Reference “Efficient Mining of Partial Periodic Patterns in Time Series Database” [ICDE99]

abdeacdfabdjacdsabdxakdyFor p=4

d*}**,***,*c***,*b*{aF1

*c}da{b,Cmax

0

1 12

c b

23

2007/10/02

Approximate Incremental Tech.

Streaming data = > maintain the max-subpattern tree over the new data

Q=a{b,c}d* Q’=a{b,e}df

Intersection with Q and Q’ is abd* (equal to Q without c)

Difference from Q’ and abd* are e and f (equal to Q’ adding f and e)

The approximation happens on the insertion step2007/10/02

Hysteresis Threshold

• A pattern q will lose all the history information as soon as it becomes infrequent. When q becomes frequent again, it will be treated as a newly appeared frequent pattern.

• As a pattern is – Frequent i.e. the frequency is above the

higher threshold– Infrequent i.e. the frequency is below the

lower threshold– The frequencies are above the lower

threshold are kept in the tree. 2007/10/02 13

Conclusion

• Discover potential periodicity rates in data streams

• Use a incremental tree-structure to mining periodic patterns

• Use two thresholds to preserving the history of candidate frequent patterns

2007/10/02 14Chen Yi-Chun