time series shapelets: a new primitive for data mining lexiang ye and eamonn keogh university of...
TRANSCRIPT
![Page 1: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/1.jpg)
Time Series Shapelets: A New Primitive for Data Mining
Lexiang Ye and Eamonn KeoghUniversity of California, Riverside
KDD 2009
Presented by: Zhenhui Li
![Page 2: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/2.jpg)
Classification in Time Series
• Application: Finance, Medicine
• 1-Nearest Neighbor– Pros: accurate, robust, simple– Cons: time and space complexity (lazy learning); results are not
interpretable
0 200 400 600 800 1000 1200
![Page 3: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/3.jpg)
Solution
• Shapelets– time series subsequence– representative of a class– discriminative from other classes
![Page 4: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/4.jpg)
MOTIVATING EXAMPLE
![Page 5: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/5.jpg)
false nettles
stinging nettles
false nettles
Shapelet
stinging nettlesfalse nettles stinging nettles
Leaf Decision Tree
Shapelet Dictionary
5.1
yes no
I
I
0 1
![Page 6: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/6.jpg)
BRUTE-FORCE ALGORITHM
![Page 7: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/7.jpg)
ca
Candidates Pool
Extract subsequences of all possible lengths
![Page 8: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/8.jpg)
Testing the utility of a candidate shapelet
• Arrange the time series objects– based on the distance from candidate
• Find the optimal split point (maximal information gain)
• Pick the candidate achieving best utility as the shapelet
Split Point
0
candidate
Information gain
![Page 9: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/9.jpg)
Problem
• Total number of candidate
• Each candidate: compute the distance between this candidate and each training sample
• Trace dataset– 200 instances, each of length 275– 7,480,200 shapelet candidates– approximately three days
MAXLEN
MINLENl DTi
i
lT )1(
Candidates Pool
![Page 10: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/10.jpg)
Speedup
• Distance calculations from time series objects to shapelet candidates are the most expensive part
• Reduce the time in two ways– Distance Early Abandon
• reduce the distance computation time between two time series
– Admissible Entropy Pruning• reduce the number of distance calculatations
0
candidate
![Page 11: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/11.jpg)
DISTANCE EARLY ABANDON
![Page 12: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/12.jpg)
0 10 20 30 40 50 60 70 80 90 100
T
S
![Page 13: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/13.jpg)
0 10 20 30 40 50 60 70 80 90 100
best matching location Dist= 0.4Dist= 0.4S
T
![Page 14: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/14.jpg)
0 10 20 30 40 50 60 70 80 90 100
T
S
calculation abandoned at this point
Dist> 0.4Dist> 0.4
![Page 15: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/15.jpg)
Distance Early Abandon
• We only need the minimum Dist
• Method– Keep the best-so-far distance– Abandon the calculation if the current distance is
larger than best so far.
![Page 16: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/16.jpg)
ADMISSIBLE ENTROPY PRUNING
![Page 17: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/17.jpg)
Admissible Entropy Pruning
• We only need the best shapelet for each class• For a candidate shapelet
– We don’t need to calculate the distance for each training sample
– After calculating some training samples, the upper bound of information gain < best candidate shapelet
– Stop calculation– Try next candidate
![Page 18: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/18.jpg)
0
false nettlesstinging nettles
![Page 19: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/19.jpg)
0
0
I=0.42I=0.42
I= 0.29I= 0.29
![Page 20: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/20.jpg)
false nettles stinging nettles
Leaf Decision Tree
Shapelet Dictionary
5.1
yes no
I
I
0 1
false nettles
stinging nettles
false nettles
false nettles
Shapelet
stinging nettles
ClassificationClassification
![Page 21: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/21.jpg)
EXPERIMENTAL EVALUATION
![Page 22: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/22.jpg)
Performance Comparison
Original Lightning DatasetLength 2000
Training 2000
Testing 18000
![Page 23: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/23.jpg)
Projectile Points
![Page 24: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/24.jpg)
11.24
85.47
Shapelet Dictionary
(Clovis)
(Avonlea)
I
II
0 200 400
0
1.0
Arrowhead Decision Tree
I
21
II
0
Clovis Avonlea
Method Accuracy Time
Shapelet 0.80 0.33
Rotation Invariant Nearest Neighbor 0.68 1013
![Page 25: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/25.jpg)
Wheat SpectrographySpectrography
0 200 400 600 800 1000 1200
0
0.5
1
one sample from each class
Wheat DatasetLength 1050
Training 49
Testing 276
![Page 26: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/26.jpg)
2 4 0 1 3 6 5
I
II
III IV
V
VI
100 200 3000
0.1
0.2
0.3
0.4
0.0
I
II
III
IV
V
VI
Shapelet Dictionary
Wheat Decision Tree
Method Accuracy Time
Shapelet 0.720 0.86
Nearest Neighbor 0.543 0.65
![Page 27: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/27.jpg)
the Gun/NoGun Problem
Method Accuracy Time
Shapelet 0.933 0.016
Rotation Invariant Nearest Neighbor 0.913 0.064
0 50 100
0
238.94
Shapelet Dictionary
Gun Decision Tree
(No Gun)
No Gun
Gun
I
I
1 0
![Page 28: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/28.jpg)
Conclusions
• Interpretable results
• more accurate/robust
• significantly faster at classification
![Page 29: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/29.jpg)
Discussions - Comparison
Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative Frequent Pattern Analysis for Effective Classification” (ICDE'07)
Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", (ICDE'08)
Similarities:• motivation: Discriminative frequent pattern = Shapelet• technique: Use upper bound of information gain to speed upDifferences:• application: general feature selection v.s. time series (no explicit features)• split node: binary (contain/not contain a pattern) v.s. numeric value (smaller/larger than a value)
![Page 30: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/30.jpg)
Discussions – other topics
• Similar ideas could be applied to other research topics– graph– image– spatio-temporal– social network– ….
![Page 31: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/31.jpg)
Discussions – other topics
• Graph classification:
Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu, “Mining Significant GraphPatterns by Scalable Leap Search”, Proc. 2008 ACM SIGMOD Int. Conf. onManagement of Data (SIGMOD'08), Vancouver, BC, Canada, June 2008.
![Page 32: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/32.jpg)
Discussions – other topics
• moving object classification
Discriminative sub-movement
![Page 33: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/33.jpg)
Discussions – other topics
• Social network– classify normal/spamming users
![Page 34: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/34.jpg)
Discussions – other topics
![Page 35: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/35.jpg)
Discussions – other topics
• Social network– classify normal/spamming users– How to find discriminative features on social network?
• social network structure• user behaviour
![Page 36: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/36.jpg)
Discussions – other topics
• For different applications, this idea could be adapted to improve the performance; but not easily adapted.
![Page 37: Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li](https://reader035.vdocuments.net/reader035/viewer/2022070306/55176e1f55034645368b4b48/html5/thumbnails/37.jpg)
Thank You
Question?