part 1 similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and...
TRANSCRIPT
![Page 1: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/1.jpg)
Kumamoto U CMU CS
Similarity search, pattern discovery and
summarizationYasushi Sakurai (Kumamoto University) Yasuko Matsubara (Kumamoto University) Christos Faloutsos (Carnegie Mellon University)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 1
Part1
![Page 2: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/2.jpg)
Kumamoto U CMU CS
Outline
• Motivation • Similarity Search and Indexing • Feature extraction • Linear forecasting • Streaming pattern discovery • Automatic mining
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 2
![Page 3: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/3.jpg)
Kumamoto U CMU CS
Stream mining • Applications
– Sensor monitoring – Network analysis – Financial and/or business transaction data – Web access and media service logs – Moving object tracking – Industrial manufacturing
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 3
![Page 4: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/4.jpg)
Kumamoto U CMU CS
Stream mining • Requirements
– Fast high performance and quick response
– Nimble low memory consumption, single scan
– Accurate good approximation for pattern discovery and feature extraction
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 4
Time
![Page 5: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/5.jpg)
Kumamoto U CMU CS
Monitoring data streams • Correlation coefficient
• Correlation coefficient and the (Euclidean)
distance
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 5
∑ =−=
n
t t xxx1
2)()(σ)()()()(
1
yxyyxxn
t tt
σσρ
⋅
−⋅−=∑ =
∑ =−−=
n
t tt yx1
2)ˆˆ(211ρ )(/)(ˆ xxxx tt σ−=
020406080100
1 3 5 7 9 11
![Page 6: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/6.jpg)
Kumamoto U CMU CS
Monitoring data streams • Correlation monitoring [Zhu+, vldb02]
– DFT coefficients for each basic window – Correlation coefficient of each sliding window
computed from the `sketch’ (DFT coefs)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 6
DFTcoefsDFTcoefs DFTcoefs
DFTcoefsDFTcoefs DFTcoefs
DFTcoefs
DFTcoefs
Sliding window
SequenceX
SequenceY
DennisShasha
![Page 7: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/7.jpg)
Kumamoto U CMU CS
Monitoring data streams • Grid structure (to avoid checking all pairs)
– DFT coefficients yields a vector – High correlation -> closeness in the vector space
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 7
2
Vector VY of sequence Y
ε Correlation coefficients and the Euclidean distance
∑ =−−=
n
t tt yx1
2)ˆˆ(211ρ
Vector VX of sequence X
![Page 8: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/8.jpg)
Kumamoto U CMU CS
Monitoring data streams • Lag correlation [Sakurai+, sigmod05]
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 8
CCF (Cross-Correlation Function)
![Page 9: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/9.jpg)
Kumamoto U CMU CS
Monitoring data streams • Lag correlation [Sakurai+, sigmod05]
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 9
correlated with lag l=1300
CCF (Cross-Correlation Function)
l=1300
![Page 10: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/10.jpg)
Kumamoto U CMU CS
Lag correlation • Definition of ‘score’, absolute value of R(l)
• Lag correlation – Given a threshold γ , – A local maximum – The earliest such maximum, if more maxima exist
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 10
)()( lRlscore =∑∑
∑−
=+=
+= −
−−
−−=
ln
t tn
lt t
n
lt ltt
yyxx
yyxxlR
12
12
1
)()(
))(()(
γ>)(lscore
![Page 11: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/11.jpg)
Kumamoto U CMU CS
Lag correlation • Why not naïve?
– Compute correlation coefficient for each lag l = {0, 1, 2, 3, …, n/2}
• But – O(n) space – O(n2) time – or O(n log n) time w/ FFT
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 11
Lag
Cor
rela
tion
n/20t=n Time
![Page 12: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/12.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Use colored windows – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …}
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 12
Lag
Cor
rela
tion
Level
h=0 t=n Time
Multi-scale windows
22
23
2223 2120
![Page 13: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/13.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …}
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 13
Lag
Cor
rela
tion
Level
h=0 t=n Time
h=0 Y
X
l=0
![Page 14: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/14.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …}
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 14
Lag
Cor
rela
tion
Level
h=0 t=n Time
h=0 Y
X
l=1
![Page 15: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/15.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …}
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 15
Lag
Cor
rela
tion
Level
h=1 th=n/2 Time
h=1 Y
X
l=2
![Page 16: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/16.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …}
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 16
Lag
Cor
rela
tion
Level
h=2 Time
h=2 Y
X th=n/4
l=4
![Page 17: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/17.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …}
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 17
Lag
Cor
rela
tion
Level
h=3 Time
h=3 Y
X th=n/8
l=8
![Page 18: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/18.jpg)
Kumamoto U CMU CS
Lag correlation • BRAID
– Geometric lag probing + smoothing – Keep track of only a geometric progression of the lag
values: l = {0, 1, 2, 4, 8, …, 2h, …} – Use a cubic spline to interpolate
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 18
Lag
Cor
rela
tion
Level
h=0 t=n Time
![Page 19: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/19.jpg)
Kumamoto U CMU CS
Lag correlation • Why not naïve?
– Compute correlation coefficient for each lag l = {0, 1, 2, 3, …, n/2}
• But – O(n) space – O(n2) time – or O(n log n) time w/ FFT
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 19
Lag
Cor
rela
tion
n/20t=n Time
BRAID • O(log n) space • O(1) time
Multi-scale
windows
![Page 20: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/20.jpg)
Kumamoto U CMU CS
BRAID in the real world • Bridge structural health monitoring
– Structural monitoring using vibration/shock sensors – Keep track of lag correlations for sensor data
streams
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 20
![Page 21: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/21.jpg)
Kumamoto U CMU CS
BRAID in the real world • Bridge structural health monitoring
– Goal: real-time anomaly detection for disaster prevention
– Several thousands readings (per sec) from several hundreds sensor nodes
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 21
Structural health monitoring Vibration/shock sensor
• Uses BRAID • Metropolitan Expressway
(Tokyo, Japan)
![Page 22: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/22.jpg)
Kumamoto U CMU CS
BRAID in the real world • Bridge structural health monitoring with BRAID
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 22
Metropolitan Expressway (Tokyo, Japan)
Tokyo Gate Bridge (Tokyo, Japan)
Can Tho Bridge (Vietnam)
![Page 23: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/23.jpg)
Kumamoto U CMU CS
Feature extraction from streams
• Find hidden variables from streams [Papadimitriou+, vldb2005]
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 23
water distribution network
normal operation
May have hundreds of measurements, but it is unlikely they are completely unrelated!
major leak
![Page 24: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/24.jpg)
Kumamoto U CMU CS
Feature extraction from streams
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 24
Phase 1 Phase 2 Phase 3
: : : : : :
: : : : : :
water distribution network
normal operation major leak
May have hundreds of measurements, but it is unlikely they are completely unrelated!
chlo
rine
con
cent
ratio
ns
sensors near leak
sensors away from leak
hidden variables
![Page 25: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/25.jpg)
Kumamoto U CMU CS
Motivation
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 25
actual measurements (n streams)
k hidden variable(s)
We would like to discover a few “hidden (latent) variables” that summarize the key trends
Phase 1
: : : : : :
: : : : : :
chlo
rine
con
cent
ratio
ns
Phase 1
k=1
![Page 26: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/26.jpg)
Kumamoto U CMU CS
Motivation
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 26
We would like to discover a few “hidden (latent) variables” that summarize the key trends
chlo
rine
con
cent
ratio
ns
Phase 1 Phase 1 Phase 2 Phase 2
actual measurements (n streams)
k hidden variable(s)
k=2
: : : : : :
: : : : : :
![Page 27: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/27.jpg)
Kumamoto U CMU CS
Motivation
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 27
We would like to discover a few “hidden (latent) variables” that summarize the key trends
chlo
rine
con
cent
ratio
ns
Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3
actual measurements (n streams)
k hidden variable(s)
k=1
: : : : : :
: : : : : :
![Page 28: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/28.jpg)
Kumamoto U CMU CS
How to capture correlations?
First sensor
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 28
20oC
30oC
Tem
pera
ture
T1
time
![Page 29: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/29.jpg)
Kumamoto U CMU CS
How to capture correlations?
First sensor Second sensor
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 29
20oC
30oC
Tem
pera
ture
T2
time
![Page 30: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/30.jpg)
Kumamoto U CMU CS
20oC 30oC
How to capture correlations?
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 30
20oC
30oC
Temperature T1
Correlations: Let’s take a closer look at the first three value-pairs…
Tem
pera
ture
T2
![Page 31: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/31.jpg)
Kumamoto U CMU CS
20oC 30oC
How to capture correlations?
First three lie (almost) on a line in the space of value-pairs…
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 31
20oC
30oC
Tem
pera
ture
T2
Temperature T1
• O(n) numbers for the slope, and
• One number for each value-pair (offset on line)
time=1
time=2
time=3
![Page 32: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/32.jpg)
Kumamoto U CMU CS
How to capture correlations?
Other pairs also follow the same pattern: they lie (approximately) on this line
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 32
20oC 30oC
20oC
30oC
Tem
pera
ture
T2
Temperature T1
![Page 33: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/33.jpg)
Kumamoto U CMU CS
Incremental update
For each new point
• Project onto current line
• Estimate error
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 33
error
20oC 30oC
20oC
30oC
Tem
pera
ture
T2
Temperature T1 New value
![Page 34: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/34.jpg)
Kumamoto U CMU CS
Incremental update
For each new point • Project onto
current line • Estimate error • Rotate line in the
direction of the error and in proportion to its magnitude
è O(n) time http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 34
error
20oC
30oC
20oC 30oC
Tem
pera
ture
T2
Temperature T1 New value
![Page 35: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/35.jpg)
Kumamoto U CMU CS
Incremental update
For each new point • Project onto
current line • Estimate error • Rotate line in the
direction of the error and in proportion to its magnitude
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 35
20oC
30oC
20oC 30oC
Tem
pera
ture
T2
Temperature T1
![Page 36: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/36.jpg)
Kumamoto U CMU CS
Related work
• Wavelet over streams [Gilbert+,vldb01] [Guha+,vldb04]
• Fourier representations [Gilbert+, stoc02] • KNN [Koudas+, 04] [Korn+, vldb02] • Histograms [Guha+, stoc01] • Clustering [Guha+, focs00] [Aggarwal+, vldb03] • Sketches [Indyk+, vldb00] [Cormode+, J. Algorithms
05] • … • …
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 36
![Page 37: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/37.jpg)
Kumamoto U CMU CS
Related work
• Heavy hitters [Cormode+, vldb03] • Data embedding [Indyk+, focs00] • Burst detection [Zhu+, kdd03] • Segmentation [Keogh+, icdm01] • Multiple scale analysis [Papadimitriou+,
sigmod06] • Fractal [Korn+, sigmod06] • Time warping [Sakurai+, icde07]… • …
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 37
GrahamCormode
Tutorial@PODS’15
DiveshSrivastava
Tutorial@SIGMOD’15
![Page 38: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/38.jpg)
Kumamoto U CMU CS
Outline
• Motivation • Similarity Search and Indexing • Feature extraction • Streaming pattern discovery • Linear forecasting • Automatic mining
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 38
![Page 39: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/39.jpg)
Kumamoto U CMU CS
Time
Motivation Given: co-evolving time-series
– e.g., MoCap (leg/arm sensors)
“Chicken dance”
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 39
![Page 40: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/40.jpg)
Kumamoto U CMU CS
Time
Motivation Given: co-evolving time-series
– e.g., MoCap (leg/arm sensors)
“Chicken dance”
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 40
Q. Any distinct patterns?
Q. If yes, how many? Q. What kind?
![Page 41: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/41.jpg)
Kumamoto U CMU CS
Motivation Challenges: co-evolving sequences
– Unknown # of patterns (e.g., beaks) – Different durations
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 41
Time
…
![Page 42: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/42.jpg)
Kumamoto U CMU CS
Motivation Goal: find patterns that agree with human intuition
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 42
Input
![Page 43: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/43.jpg)
Kumamoto U CMU CS
Motivation Goal: find patterns that agree with human intuition
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 43
Beaks
Wings
Tailfeathers
ClapsBeaks
Wings
Tailfeathers
Claps
Input
Output
![Page 44: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/44.jpg)
Kumamoto U CMU CS
Motivation Goal: find patterns that agree with human intuition
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 44
Beaks
Wings
Tailfeathers
ClapsBeaks
Wings
Tailfeathers
Claps
Input
Output
NOmagicnumbers!
Automatic!
![Page 45: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/45.jpg)
Kumamoto U CMU CS
Why: Automatic mining No magic numbers! ... because,
Big data mining: -> we cannot afford human intervention!!
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 45
Manual (use magic) - sensitive to the parameter tuning - long tuning steps (hours, days, …)
Automatic (no magic numbers) - no expert tuning required
![Page 46: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/46.jpg)
Kumamoto U CMU CS
How: Automatic miningGoal: fully-automatic modeling • Given: data X • Find: a compact description (model M) of X
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 46
Data(X) Idealmodel(M)
Q. How can we find the best model M?
![Page 47: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/47.jpg)
Kumamoto U CMU CS
How: Automatic miningGoal: fully-automatic modeling • Given: data X • Find: a compact description (model M) of X
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 47
Data(X) Idealmodel(M)
Q. How can we find the best model M?
Answer: MDL!
![Page 48: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/48.jpg)
Kumamoto U CMU CSSolution: MDL (Minimum description length)
Solution: Minimize total encoding cost $ !– Occam’s razor (i.e., law of parsimony) – Fully automatic parameter optimization – No over-fitting
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 48
[Bishop: PR&ML]
M=0 M=1 M=3 M=9
Idealmodel
![Page 49: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/49.jpg)
Kumamoto U CMU CSSolution: MDL (Minimum description length)
Solution: Minimize total encoding cost $ !
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 49
[Bishop: PR&ML]
CM=0 CM=1 CM=3 CM=9
(Ideal!)$$$ $$ $$$$$
CostM(M)+Costc(X|M)
Modelcost Codingcost(error)
min( )CostT(X;M)=
Totalcost
CC=$$$$$ CC=$$$ CC=$ CC=0
![Page 50: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/50.jpg)
Kumamoto U CMU CS
AutoPlait: Automatic Mining of Co-evolving
Time SequencesYasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University), Christos Faloutsos (CMU)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 50
[Matsubara+ SIGMOD’14]
![Page 51: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/51.jpg)
Kumamoto U CMU CS
Problem definition Goal: find patterns that agree with human intuition
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 51
Beaks
Wings
Tailfeathers
ClapsBeaks
Wings
Tailfeathers
Claps
Input
Output
![Page 52: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/52.jpg)
Kumamoto U CMU CS
Problem definition• Bundle : set of d co-evolving sequences
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 52
X = {x1,..., xn}d ×n
Bundle X (d=4)
given
![Page 53: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/53.jpg)
Kumamoto U CMU CS
Problem definition• Segment: convert X -> m segments, S
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 53
Segment (m=8)
s1 s2 s3 s4 s5 s6 s7 s8
S = {s1,..., sm}hidden
![Page 54: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/54.jpg)
Kumamoto U CMU CS
Problem definition• Regime: segment groups:
: model of regime r
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 54
Θ = {θ1,θ2,...θr,Δr×r}
θrRegimes (r=4)
θ1θ2
wingsbeaks
θ3θ4
hidden
![Page 55: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/55.jpg)
Kumamoto U CMU CS
Problem definition• Segment-membership: assignment
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 55
Segment-membership
(m=8)
F = { f1,..., fm}
F = { 2, 4, 1, 3, 2, 4, 1, 3 }
hidden
![Page 56: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/56.jpg)
Kumamoto U CMU CS
Problem definition• Given: bundle X
• Find: compact description C of X
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 56
C = {m, r,S,Θ,F}
X = {x1,..., xn}
![Page 57: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/57.jpg)
Kumamoto U CMU CS
Problem definition• Given: bundle X
• Find: compact description C of X
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 57
C = {m, r,S,Θ,F}
X = {x1,..., xn}
m segments
r regimes Segment-membership
![Page 58: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/58.jpg)
Kumamoto U CMU CS
Main ideasGoal: compact description of X
without user intervention!!
Challenges: Q1. How to generate ‘informative’ regimes ? Q2. How to decide # of regimes/segments ?
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 58
C = {m, r,S,Θ,F}
![Page 59: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/59.jpg)
Kumamoto U CMU CS
Main ideasGoal: compact description of X
without user intervention!!
Challenges: Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ? Idea (2): Model description cost
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 59
C = {m, r,S,Θ,F}
![Page 60: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/60.jpg)
Kumamoto U CMU CS
Idea (1): MLCM: multi-level chain model
Q1. How to generate ‘informative’ regimes ?
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 60
beaks
wings
clapsModel
Sequences Regimes
![Page 61: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/61.jpg)
Kumamoto U CMU CS
Idea (1): MLCM: multi-level chain model
Q1. How to generate ‘informative’ regimes ?
Idea (1): Multi-level chain model
– HMM-based probabilistic model – with “across-regime” transitions
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 61
beaks
wings
clapsModel
Sequences Regimes
![Page 62: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/62.jpg)
Kumamoto U CMU CS
Idea (1): MLCM: multi-level chain model
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 62
Regimes r=2 Regime 1 (k=3) Regime 2 (k=2)
Θ = {θ1,θ2,...θr,Δr×r}
r regimes (HMMs)
across-regime transition prob.
(θi = {π,A,B})
Single HMM parameters
state 1
a1;11
a1;33
state 3
state 2
a1:22
a1;12
a1;23
a1;31state 1 a2;11
state 2
a2;22a2;12
a2;21
Regime switch
δ12
δ21
δ11
θ1
Θ
θ2
δ22
Regime1 “beaks” Regime2 “wings”
![Page 63: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/63.jpg)
Kumamoto U CMU CSIdea (2): model description cost
Q2. How to decide # of regimes/segments ?
Idea (2): Model description cost • Minimize encoding cost • find “optimal” # of segments/regimes
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 63
![Page 64: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/64.jpg)
Kumamoto U CMU CSIdea (2): model description cost
Idea: Minimize encoding cost!
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 64
Good compression
Good description
1 2 3 4 5 6 7 8 910
CostMCostCCostT
(# of r, m)
CostM(M) + Costc(X|M)
Model cost Coding cost
min ( )
![Page 65: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/65.jpg)
Kumamoto U CMU CSIdea (2): model description cost
Total cost of bundle X, given C
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 65
C = {m, r,S,Θ,F}
![Page 66: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/66.jpg)
Kumamoto U CMU CSIdea (2): model description cost
Total cost of bundle X, given C
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 66
C = {m, r,S,Θ,F}
Model description cost of Θ
Coding cost of X given Θ
# of segments/regimes
segment lengths
segment- membership F
duration/dimensions
![Page 67: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/67.jpg)
Kumamoto U CMU CS
AutoPlaitOverview
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 67
Iteration 0 r=1, m=1
Start!
1234567Iteration
X
01234567Iteration
r=1
![Page 68: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/68.jpg)
Kumamoto U CMU CS
AutoPlaitOverview
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 68
Iteration 1 r=2, m=4
f1 = 2
X
θ1θ2
f2 =1 f3 = 2 f4 =1
01234567Iteration
r=1
r=2
Split
![Page 69: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/69.jpg)
Kumamoto U CMU CS
AutoPlaitOverview
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 69
Iteration 1 r=2, m=4
Iteration 2 r=3, m=6
f1 = 2
X
θ1θ2
f2 =1 f3 = 2 f4 =1
θ1θ3θ2
Xf1 = 2 f2 = 3 f3 =1 f4 = 2 f5 = 3 f6 =1
Split
01234567Iteration
r=1
r=2
r=3
![Page 70: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/70.jpg)
Kumamoto U CMU CS
AutoPlaitOverview
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 70
Iteration 1 r=2, m=4
Iteration 2 r=3, m=6
Iteration 4 r=4, m=8
f1 = 2
X
X
θ1θ2
f2 =1 f3 = 2 f4 =1
θ1θ3θ2
X
θ1θ3θ4θ2
f1 = 2 f2 = 3 f3 =1 f4 = 2 f5 = 3 f6 =1
f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1Split
1234567Iteration
01234567Iteration
r=1
r=2
r=3
r=4
![Page 71: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/71.jpg)
Kumamoto U CMU CS
AutoPlaitAlgorithms 1. CutPointSearch
Find good cut-points/segments 2. RegimeSplit
Estimate good regime parameters Θ 3. AutoPlait
Search for the best number of regimes (r=2,3,4…)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 71
Inner-most loop
Inner loop
Outer loop
![Page 72: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/72.jpg)
Kumamoto U CMU CS
1. CutPointSearchGiven:
- bundle - parameters of two regimes
Find: cut-points of segment sets S1, S2,
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 72
XΘ = {θ1,θ2,Δ}
X
θ1θ2
X{θ1,θ2,Δ}
{S1,S2} = argmaxS1,S2
P(X | S1,S2,Θ)
S1 = {s2, s4}S2 = {s1, s3}
Inner-most loop
![Page 73: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/73.jpg)
Kumamoto U CMU CS
1. CutPointSearch
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 73
...
...
...
...
...
state 1
state 2
state 3
state 2
state 1
t = 3 t = 4 t = 5 t = 6X
t =1 t = 2 ...
L2;1(3) = {3}
L2;2 (5) = {3}
L2;1(4) = {3} L2;1(5) = {4}
L2;2 (6) = {4}
δ12
L2;2 (4) = {4}
δ12
θ1
θ2
L1;1(1) = φ
L1;3(2) = φ
L1;1(3) = φ
switch??switch??
P(X |Θ)
DPalgorithmtocomputelikelihood:
![Page 74: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/74.jpg)
Kumamoto U CMU CS
1. CutPointSearchTheoretical analysis Scalability
- It takes time (only single scan) - n: length of X - d: dimension of X - k: # of hidden states in regime
Accuracy It guarantees the optimal cut points - (Details in paper)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 74
O(ndk2 )
![Page 75: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/75.jpg)
Kumamoto U CMU CS
2. RegimeSplitGiven:
- bundle
Find: two regimes 1. find cut-points of segment sets: S1, S2 2. estimate parameters of two regimes:
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 75
X
Θ = {θ1,θ2,Δ}
Inner loop
X
θ1θ2
X
![Page 76: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/76.jpg)
Kumamoto U CMU CS
2. RegimeSplitTwo-phase iterative approach - Phase 1: (CutPointSearch)
- Split segments into two groups : - Phase 2: (BaumWelch)
- Update model parameters:
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 76
S1,S2
Θ = {θ1,θ2,Δ}
X
θ1θ2 {θ1,θ2,Δ}
S1 = {s2, s4}S2 = {s1, s3}
Phase 1
Phase 2
![Page 77: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/77.jpg)
Kumamoto U CMU CS
3. AutoPlaitGiven:
- bundle
Find: r regimes (r=2, 3, 4, … )
- i.e., find full parameter set
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 77
X
C = {m, r,S,Θ,F}
Outer loop
X
![Page 78: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/78.jpg)
Kumamoto U CMU CS
3. AutoPlaitSplit regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 78
r=2, m=4
r=3, m=6
r=4, m=8
f1 = 2
X
θ1θ2
f2 =1 f3 = 2 f4 =1
θ1θ3θ2
Xf1 = 2 f2 = 3 f3 =1 f4 = 2 f5 = 3 f6 =1
X
θ1θ3θ4θ2
f1 = 2 f2 = 4 f3 = 3 f4 =1 f5 = 2 f6 = 4 f7 = 3 f8 =1
r =minrCostT (X;m, r,S,Θ,F)
![Page 79: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/79.jpg)
Kumamoto U CMU CS
Results• Mocap data • WebClick data • Google Trends
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 79
![Page 80: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/80.jpg)
Kumamoto U CMU CS
Q1. Sense-makingMoCap data
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 80
AutoPlait (NO magic numbers)
DynaMMo(Lietal.,KDD’09) pHMM(Wangetal.,SIGMOD’11)
![Page 81: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/81.jpg)
Kumamoto U CMU CS
Q1. Sense-makingMoCap data
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 81
AutoPlait (NO magic numbers)
![Page 82: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/82.jpg)
Kumamoto U CMU CS
Q2. Accuracy (a) Segmentation (b) Clustering
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 82
AutoPlait needs “no magic numbers”
Target
![Page 83: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/83.jpg)
Kumamoto U CMU CS
Q3. ScalabilityWall clock time vs. data size (length) : n
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 83
AutoPlait scales linearly, i.e., O(n)
AutoPlait
DynaMMopHMM
![Page 84: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/84.jpg)
Kumamoto U CMU CS
App. Event discovery (GoogleTrend)
Anomaly detection (flu-related topics,10 years)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 84
![Page 85: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/85.jpg)
Kumamoto U CMU CS
App. Event discovery (GoogleTrend)
Anomaly detection (flu-related topics,10 years)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 85
AutoPlait detects 1 unusual spike in 2009 (i.e., swine flu)
![Page 86: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/86.jpg)
Kumamoto U CMU CS
App. Event discovery (GoogleTrend)
Turning point detection (seasonal sweets topics)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 86
![Page 87: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/87.jpg)
Kumamoto U CMU CS
App. Event discovery (GoogleTrend)
Turning point detection (seasonal sweets topics)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 87
Trend suddenly changed in 2010 (release of android OS “Ginger bread”, “Ice Cream Sandwich”)
![Page 88: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/88.jpg)
Kumamoto U CMU CS
App. Event discovery (GoogleTrend)
Trend discovery (game-related topics)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 88
![Page 89: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/89.jpg)
Kumamoto U CMU CS
App. Event discovery (GoogleTrend)
Trend discovery (game-related topics)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 89
It discovers 3 phases of “game console war” (Xbox&PlayStation/Wii/Mobile social games)
![Page 90: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/90.jpg)
Kumamoto U CMU CS
Industrial contribution• Automobile sensor data
– location, velocity, longitudinal/lateral acceleration
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 90
![Page 91: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/91.jpg)
Kumamoto U CMU CS
Code at• http://www.cs.kumamoto-u.ac.jp/~yasuko/
software.html
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 91
![Page 92: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/92.jpg)
Kumamoto U CMU CS
Part 1 – Conclusions
• Motivation • Similarity Search and Indexing • Feature extraction • Linear forecasting • Streaming pattern discovery • Automatic mining
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 92
![Page 93: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/93.jpg)
Kumamoto U CMU CS
Part 1 – Conclusions
• Motivation • Similarity Search and Indexing
– Euclidean/time-warping – extract features – index (SAM, R-tree)
• Feature extraction – SVD, ICA, DFT, DWT (multi-scale
windows) http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 93
![Page 94: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/94.jpg)
Kumamoto U CMU CS
Part 1 – Conclusions
• Linear forecasting – AR, RLS
• Streaming pattern discovery – RLS, “incremental” wavelet transform – Multi-scale windows
• Automatic mining – MDL
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 94
![Page 95: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/95.jpg)
Kumamoto U CMU CS
References • Yunyue Zhu, Dennis Shasha ``StatStream: Statistical Monitoring of
Thousands of Data Streams in Real Time'‘ VLDB, August, 2002. pp. 358-369.
• Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos. Streaming Pattern Discovery in Multiple Time-Series. VLDB 2005.
• Yasushi Sakurai, Spiros Papadimitriou, Christos Faloutsos. BRAID: Stream Mining through Group Lag Correlations. SIGMOD 2005.
• Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, Martin Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. VLDB 2001.
• Sudipto Guha, Nick Koudas, Amit Marathe, Divesh Srivastava. Merging the Results of Approximate Match Operations. VLDB 2004.
• Anna C. Gilbert, Sudipto Guha, Piotr Indyk, S. Muthukrishnan, Martin Strauss. Near-optimal sparse fourier representations via sampling. STOC 2002.
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ © 2015 Sakurai, Matsubara & Faloutsos 95
![Page 96: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/96.jpg)
Kumamoto U CMU CS
References• Nick Koudas, Beng Chin Ooi, Kian-Lee Tan, Rui Zhang.
Approximate NN queries on Streams with Guaranteed Error/performance Bounds. VLDB 2004.
• Flip Korn, S. Muthukrishnan, Divesh Srivastava. Reverse Nearest Neighbor Aggregates Over Data Streams. VLDB 2002.
• Sudipto Guha, Nick Koudas, Kyuseok Shim. Data-streams and histograms. STOC 2001.
• Sudipto Guha, Nina Mishra, Rajeev Motwani, Liadan O'Callaghan. Clustering Data Streams. FOCS 2000.
• Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data Streams. VLDB 2003.
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 96
![Page 97: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/97.jpg)
Kumamoto U CMU CS
References• Piotr Indyk, Nick Koudas, S. Muthukrishnan. Identifying
Representative Trends in Massive Time Series Data Sets Using Sketches. VLDB 2000.
• Graham Cormode, S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55 (1), 2005.
• Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava. Finding Hierarchical Heavy Hitters in Data Streams. VLDB 2003.
• Piotr Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. FOCS 2000.
• Yunyue Zhu, Dennis Shasha. Efficient elastic burst detection in data streams. KDD 2003.
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 97
![Page 98: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/98.jpg)
Kumamoto U CMU CS
References• Eamonn J. Keogh, Selina Chu, David M. Hart, Michael J. Pazzani.
An Online Algorithm for Segmenting Time Series. ICDM 2001. • Spiros Papadimitriou, Philip S. Yu. Optimal multi-scale patterns in
time series streams. SIGMOD 2006. • Flip Korn, S. Muthukrishnan, Yihua Wu. Modeling skew in data
streams. SIGMOD 2006. • Yasushi Sakurai, Christos Faloutsos, Masashi Yamamuro. Stream
Monitoring under the Time Warping Distance. ICDE 2007.
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 98
![Page 99: Part 1 Similarity search pattern discovery andyasuko/...tut-part1-b.pdf · pattern discovery and summarization Yasushi Sakurai (Kumamoto University) ... , Matsubara & Faloutsos 1](https://reader034.vdocuments.net/reader034/viewer/2022042805/5f5d65a4a1b5c52e1802c574/html5/thumbnails/99.jpg)
Kumamoto U CMU CS
Similarity search, pattern discovery and
summarizationYasushi Sakurai (Kumamoto University) Yasuko Matsubara (Kumamoto University) Christos Faloutsos (Carnegie Mellon University)
http://www.cs.kumamoto-u.ac.jp/~yasuko/TALKS/15-SIGMOD-tut/ ©2015Sakurai,Matsubara&Faloutsos 99
Part1