abdullah mueen eamonn keogh university of california, riverside

18
Abdullah Mueen Eamonn Keogh University of California, Riverside Online Discovery and Maintenance of Time Series Motifs

Upload: michi

Post on 05-Jan-2016

44 views

Category:

Documents


2 download

DESCRIPTION

Online Discovery and Maintenance of Time Series Motifs. Abdullah Mueen Eamonn Keogh University of California, Riverside. Time Series Motifs. Repeated pattern in a time series Utility: Activity/Event discovery Summarization Classification Application - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Abdullah Mueen Eamonn KeoghUniversity of California, Riverside

Online Discovery and Maintenance

of Time Series Motifs

Page 2: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Time Series Motifs

• Repeated pattern in a time series

• Utility:– Activity/Event discovery– Summarization– Classification

• Application– Data Center Chiller Management (Patnaik, et al. KDD 09)– Designing Smart Cane for Elders (Vahdatpour, et al. IJCAI 09)

0 2000 4000 6000 80000

10

20

30

Page 3: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Online Time Series Motifs

• Streaming time series• Sliding window of the recent history

– What minute long trace repeated in the last hour?

Page 4: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Problem FormulationDiscovery

100 200 300 400 500 600 700 800 900 1000

-8000

-7500

-7000

. . .

12345678...

873

The most similar pair of non-overlappingsubsequences

Maintenance

12345678...

873874

time:100112345678...

873874875

time:100212345678...

873874875876

time:100312345678...

873874875876877

time:100412345678...

873874875876877878

time:1005

Update motif pair afterevery time tick

time

Page 5: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Challenge

• A subsequence is a high dimensional point– The dynamic closest pair of points problem

• Closest pair may change upon every update• Naïve approach: Do quadratic comparisons.

4

5

2

7

3

6

89

4

5

2

7

3

6

8

1

4

5

2

7

3

6

8

1

Deletion Insertion

Page 6: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Our Approach

• Goal: Algorithm with Linear update time• Previous method for dynamic closest pair

(Eppstein,00) – A matrix of all-pair distances is maintained

• O(w2) space required

– Quad-tree is used to update the matrix• Maintain a set of neighbors and reverse

neighbors for all points• We do it in O( ) spaceww

Page 7: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Outline

• Methods• Evaluation• Extensions• Case Studies• Conclusion

Page 8: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Maintaining Motif• Smallest nearest neighbor → Closest pair• Upon insertion

– Find the nearest neighbor; Needs O(w) comparisons.

• Upon deletion– Find the next NN of all the reverse NN

4

5

2

7

3

6

8

1

4

5

2

7

3

6

8

1

Insertion

9

4

5

2

7

3

6

8

1

Deletion

9

Page 9: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Data Structure

1 2 3 4 5 6 7 8FIFO Sliding window

N-listsNeighbors in

order of distances

8 7 21 8 1 24

5 1 17 3 3 67

7 6 75 2 4 55

1 4 83 7 5 72

4 8 42 5 2 13

6 1 38 1 8 38

3 5 66 4 6 46

R-listsPoints that should be updated

Total number of nodes in R-lists is w

5 1 3 24

8 67

4

5

2

7

3

6

8

1

4

7

Still O(w2) space

(ptr, distance)

Page 10: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

ObservationsWhile inserting

Updating NN of old points is not necessary A point can be removed from the neighbor list if it

violates the temporal order

Averagelength O( )w

Page 11: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Example4

5

2

7

3

6

8

1

1 1 21 3 1 2

52

83

2 43 5 3 6

4 7

5

6

4

7

6

1 2 3 4 5 6 7 8

2 3 32 3 2 6

3 7 9

5

54 4 6 8

5 7

6

68

4

2 3 4 5 6 7 8 91

4

5

2

7

3

6

8

1

9

4

5

2

7

3

6

89

10

3 34 3 6 6

57 9

54 4 7 8

5

6

4

6 8

5

7

8

9

3 4 5 6 7 8 9 1021

10

Page 12: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Evaluation

0

0.02

0.04

0.06

0.08

0.1

0.12

0.140.16

0.18

Fast Pair Insect

EEGEOGRW

Ave

rage

Upd

ate

Tim

e (S

ec)

0 0.5 1 1.5 2 2.5 3 3.5 4x 104

Window Size (w)0 200 400 600 800 1000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Insect

EEG

EOGRW

Fast Pair

Ave

rage

Upd

ate

Tim

e (S

ec)

Motif Length (m)

8x

0 0.5 1 1.5 2 2.5 3 3.5 4x 104

102030405060708090100110

Insect

EEG

EOG

RW

Ave

rage

Len

gth

of N

-lis

t

Window Size (w)

•Up to 8x speedup from general dynamic closest pair• Stable space cost per point with increasing window size

0 200 400 600 800 10000

50

100

150

200

250

300

350

InsectEEG

EOG

RW

Motif Length (m)

Ave

rage

Len

gth

of N

-lis

t

Page 13: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Outline

• Methods• Evaluation• Extensions• Case Studies• Conclusion

Page 14: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Extension

1 1 21 3 1 2

52

83

2 43 5 3 6

4 5

5 7

6

4

7

6

1 2 3 4 5 6 7 8

1 1 21 3 1 2

52

83

2 43 5 3 6

4 5

5 7

6

4

7

6

1 2 3 4 5 6 7 8

1 1 21 3 1 2

52

83

2 43 5 3 6

4 5

5 7

6

4

7

6

1 2 3 4 5 6 7 8

• Multidimensional Motif• Maintain motif in Multiple Synchronous time series• Points in one series can have neighbors in others

• Arbitrary Data Rate• Load shedding: Skip points if can’t handle

0.40.50.60.70.80.91.0

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fraction of Data Used

Fraction of M

otifs Discovered

EOGRandom Walk

EEGInsect FIFO for each streams

Page 15: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Case Study 1: Online Compression

• Replace all the occurrences of a motif by a pointer to that motif.

• For signals with regular repetitions– Higher compression rate with less error.

0 500 1000 1500 2000

Raw

PLA

MR

A

BA AB B B B B

Dictionary

Page 16: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Case Study 2: Closing The Loop

• Check if a robot closed a loop• Convert the stream of video frames to stream

of color histograms.• Most similar color histograms are good

candidates for loop detection.

0 400 800 1200 16000481216

0 400 800 1200 16000481216

Start Loop Closure Detected

Page 17: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Conclusion

• First attempt to maintain time series motif online– Maintains minutes long repetition in hours long

sliding window• Linear update time with less space cost than

quad-tree based method– O( ) Vs O(w2)

• Faster than general dynamic closest pair solution

ww

Page 18: Abdullah Mueen     Eamonn  Keogh University of California, Riverside

Thank You

Code and Data: http://www.cs.ucr.edu/~mueen/OnlineMotif