stagger: periodicity mining of data streams using expanding sliding windows mohamed g. elfeky walid...
TRANSCRIPT
STAGGER: Periodicity Mining of Data Streams using Expanding Sliding
Windows
Mohamed G. ElfekyWalid G.Aref
Ahmed K. Elmagarmid
ICDM 2006
2007/10/02 1Chen Yi-Chun
Outline
• Motivation• Previous Approach
– SPD algorithm– Max-Subpattern Tree
• Approximate Incremental Technique• Conclusion
2007/10/02 2Chen Yi-Chun
Motivation
abcabcabcabcabc….
p=3 p=3
Single sliding window
Smaller w, real-time output supportedLager w, long period found possible
Real-time output and long period found
……………………….
Multiple sliding window is proposed
p=3abc,*b*,a**,…
p=3abc,*b*,a**,…
p=3,5abc,*b*,a**,…
Period detection : SPD algorithm is usedPatterns mining : max-subpattern tree is used
2007/10/02 3Chen Yi-Chun
Periodicity Detection
• : the projection of a data stream S according to a period p starting from position l ,where n is the length of S.
• Ex. If S= abcabbabdb
,( 1)
= , ,...,p l l l p n ll p
p
S e e e
4,1
3,0
( ) bbb
( ) aaab
S
S
outlier2007/10/02 4Chen Yi-Chun
Cont.
• : the number of times the symbol s occurs in two consecutive positions in the data stream
• Ex. If S = abbaaabaa
• indicates how often the sysbol s occurs every p timestamps in a data stream S
2 ( , )F s S
2 2( , ) 3, ( , ) 1F a S F b S
2 ,( , ( ))
( ) / 1p lF s S
n l p
2007/10/02 5Chen Yi-Chun
Cont.
• If a data stream S of length n contains a symbol s and
• Then s is said to be periodic in S with a period of length p at position l with respect to periodicity threshold
• Ex. S= abcabbabdb ,
– The symbol a is periodic with a period of length 3 at position 0 where respect to a periodicity threshold
– The pattern a * * is a frequent single periodic pattern of length 3
2 ,( , ( ))
( ) / 1p lF s S
n l p
2 3,0( , ( )) 2
(10 0) / 3 1 3
F a S
2 / 3
2007/10/02 6Chen Yi-Chun
SPD-algorithm
• To detect the symbols that are periodic with period length p within S
• Shift S by p positions , denoted as • Ex. If S = a b c a b b a b c b• .. = * * * a b c a b b a
( )pS
(3)S
2007/10/02 7Chen Yi-Chun
SPD algorithm in Time-Series
a:001 b:010 c:100
(a c c c a b b)
P=1 ………..
P=4 …………………………………………
=XXX
=YYY
Reference “Periodicity Detection in Time Series Databases” [TKDE05]
2007/10/02 8Chen Yi-Chun
Single Window with SPD
0 0 1 0 0 1
1 0 0 0 0 1 0 0 10 1 0 0 0 1
Shift 1 slide 2
12(a c c c a b b)
2007/10/02 9Chen Yi-Chun
Multi Windows with SPD
output
output
output
output
Smaller w, real-time output supportedLager w, long period found possible
2007/10/02 10Chen Yi-Chun
Max-Subpattern Tree
Reference “Incremental, Online, and Merge Mining of Partial Periodic Patternsin Time-Series Databases” [TKDE04]Reference “Efficient Mining of Partial Periodic Patterns in Time Series Database” [ICDE99]
abdeacdfabdjacdsabdxakdyFor p=4
d*}**,***,*c***,*b*{aF1
*c}da{b,Cmax
0
1 12
c b
23
2007/10/02
Approximate Incremental Tech.
Streaming data = > maintain the max-subpattern tree over the new data
Q=a{b,c}d* Q’=a{b,e}df
Intersection with Q and Q’ is abd* (equal to Q without c)
Difference from Q’ and abd* are e and f (equal to Q’ adding f and e)
The approximation happens on the insertion step2007/10/02
Hysteresis Threshold
• A pattern q will lose all the history information as soon as it becomes infrequent. When q becomes frequent again, it will be treated as a newly appeared frequent pattern.
• As a pattern is – Frequent i.e. the frequency is above the
higher threshold– Infrequent i.e. the frequency is below the
lower threshold– The frequencies are above the lower
threshold are kept in the tree. 2007/10/02 13