university of macau quick-motif: an efficient and scalable framework for exact motif discovery...
TRANSCRIPT
University of Macau
Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery
Yuhong Li
Department of Computer and Information Science
University of Macau, Macau
2 University of Macau
■ Most similar subsequence pair in a Time Series
■ Applications A core subroutine for activity discovery, e.g., elder care,
surveillance and sports training. Clustering enumerated motifs is more meaningful than
clustering all the subsequences in a long time series.
Quick-Motif: What is Motif ?
3 University of Macau
■ Exact Motif Discovery Input: time series and target motif length Output: most similar subsequence pair in terms of normalized
Euclidean distance.
■ Avoid trivial match Non-overlapping Adjacent subsequence pairs are expected to similar to each
other naturally.
Quick-Motif: Formal Definition
Timeline𝑖+ℓ−1𝑖0 𝑚−1
time series subsequence time series
4 University of Macau
Quick-Motif: Naïve Solution
Subsequences of length
Time complexity is O().
Motif most similar subsequence pair
Subsequences of length
normalize
Sliding window size = , Step size =
… …
Test all subsequence pairs
…
5 University of Macau
Quick-Motif: Existing Solutions ■ Reference-based Index (MK) [Mueen & Keogh, SDM 2009]
Good: Prune unpromising pairs by batches. Bad: time distance computations.
■ Smart Brute Force (SBF) [Mueen, ICDM 2013] Good: time distance computations. Bad: examine all subsequence pairs.
…
𝑂 (ℓ) 𝑂 (1)
…
?
6 University of Macau
Quick-Motif: Fast Distance Computation
■ Incremental distance computation.
9 subsequence pairs 16 subsequence pairs
�̂�0
�̂�1�̂�2
�̂�3�̂�4
�̂�20
�̂�21�̂�22�̂�23
�̂�24
…
……
�̂�0 �̂�1 �̂�2 �̂�3 �̂�4�̂�20
�̂�21
�̂�22�̂�23�̂�24
7 University of Macau
Quick-Motif: Pruning of Subsequence Pairs
■ Group every w consecutive subsequences as a PAA MBR.
PAA feature space 𝑓 1
𝑓 2 = 5
𝑀 15
𝑀 25
𝑀 35
Minimum distance between two PAA MBRs Distance LBs. If distance LB is smaller than Further refinement.
minDist
8 University of Macau
Quick-Motif: Filter-and-Refinement
■ Naïve Solution. Check the distance LBs for all -MBR pairs. The time complexity is , is the PAA dimensionality.
■ How to Efficiently Find Surviving -MBR Pairs? Enable batch pruning. Discover the true motif as soon as possible to improve the
pruning ability.
9 University of Macau
Quick-Motif: Filter-and-Refinement
■ Enable Batch Pruning Hierarchical Structure Offer reasonable grouping quality, thus good pruning ability. Can be constructed very efficiently.
PAA feature space 𝑓 1
𝑓 2𝑀 3
𝑤𝑀 8
𝑤
𝑀 6𝑤
𝑀 4𝑤
𝑀 2𝑤
𝑀 0𝑤
𝑀 7𝑤
𝑀 5𝑤
𝑀 1𝑤
Hilbert curve sort list
𝑀 4𝑤𝑀 6
𝑤𝑀 0𝑤𝑀 2
𝑤𝑀 7𝑤𝑀 5
𝑤𝑀 3𝑤𝑀 1
𝑤𝑀 8𝑤
𝑀𝑎 𝑀𝑏 𝑀 𝑐
𝑀 𝑟𝑜𝑜𝑡
Level 1
Level 2
minDist
10 University of Macau
Quick-Motif: Filter-and-Refinement
■ Discover true motif as soon as possible Locality-based Search Strategy
Good locality
Bad locality
Locality-based search vs Best-first search
Locality-based Best-first
Surviving pairs 0.1256M 0.1249M
Heap size N/A 2.78M
# pushes 11.73 M (queue) 6.75 M (heap)
Resp. time 1.56 s 6.32 s
Hilbert curve sort list
𝑀 4𝑤𝑀 6
𝑤𝑀 0𝑤𝑀 2
𝑤𝑀 7𝑤𝑀 5
𝑤𝑀 3𝑤𝑀 1
𝑤𝑀 8𝑤
𝑀𝑎 𝑀𝑏 𝑀 𝑐
𝑀 𝑟𝑜𝑜𝑡
Level 1
Level 2
Leaf nodes
11 University of Macau
Quick-Motif: Experimental Evaluation
■ Programming Language: C++ Machine: Ubuntu 12.04, 4GB RAM
■ Datasets RW: Random generate. EEG: Reflect the activity of neurons, length 180204. ECG: The Koski ECG. Length 144002. EPG: Sequence that traces insect behaviour, length 106950 TAO: Sea surface temperatures, length 374071.
12 University of Macau
(a), Effect of on ECG (b), Effect of on EEG
(c), Effect of on EPG (d), Effect of on TAO
Quick-Motif: Performance Evaluation
13 University of Macau
Thanks
Q A
input hidden output