forecasting duration intervals of scientific workflow activities based on time-series patterns

35
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia {xliu, jchen, kliu, yyang}@swin.edu.au Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Upload: otylia

Post on 17-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns. Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services

Swinburne University of Technology, Melbourne, Australia

{xliu, jchen, kliu, yyang}@swin.edu.au

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Page 2: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

2

Introduction Time-Series Forecasting Time-Series Patterns

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Motivation The Pattern Game Evaluation

Conclusion

Content

Page 3: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Time Series Forecasting A time series is a set of observations made sequentially

through time. Marketing time series Temperature time series System performance time series

Time-series forecasting is to predict the likely outcome of the time series in the near future, given knowledge of the most recent outcomes

CPU load, network load, activity durations

What’s this time series about, mind taking a guess?AUD/USD (1 day in 1 year): from www.xe.com

Page 4: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Time Series Forecasting

It was on the rise, but who knows the crises #%#&…

Homer Simpson’s forecasting line

Page 5: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Time Series Pattern A pattern is a type of theme of recurring events or

objects which repeats in a predictable manner Time series patterns can be regarded as a set of

time series segments which re-occurs in a statistic sense

Page 6: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

6

Introduction Time-Series Forecasting Time-Series Patterns

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Motivation Pattern Based Time-Series Forecasting Strategy Evaluation

Conclusion

Where Are We

Page 7: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Motivation Scientific workflow activity durations are important for

scientific workflow scheduling, temporal verification and many other time related QoS functionalities

From the initial job submission to the final completion, comprising the execution time and vast scientific workflow overheads: data transfer overheads, middleware overheads, loss of parallelism overheads and etc*.

Dynamic performance of underlying infrastructures, e.g. grid computing, peer to peer, cloud computing…

* R. Prodan and T. Fahrigne, Analysis of Scientific Workflow Overheads in Grid Environments, TPDS, 2008)

Page 8: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Problems Current work mainly utilises linear time-series models, such as

MA (Moving Average), AR (Autoregressive), Box-Jenkins… Focusing on CPU load prediction for the execution time of computation

intensive activities Data intensive activities? Many other overheads?

Forecasting point values Duration intervals are more applicable in practice

Requiring large sample size Difficult for scientific workflow activities with constrained concurrent instances

and long-term durations

Frequent turning points Significantly deteriorates the effectiveness of linear time-series models

Page 9: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

9

Introduction Time-Series Forecasting Time-Series Patterns

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Motivation Pattern Based Time-Series Forecasting Strategy Evaluation

Conclusion

Where Are We

Page 10: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Duration-Series Patterns

Page 11: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Strategy Overview Duration series building

A periodical sampling plan to increase the sample size

Duration pattern recognition A non-linear time-series segmentation algorithm to identify potential

pattern set checking validity final pattern set

Duration pattern matching Similarity search for the closet pattern give the latest duration

sequence

Duration interval forecasting Duration interval forecasting based on the statistics of the matched

duration pattern

Pattern based time-series forecasting strategy

Page 12: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Step 1: Duration Series Building A periodical sampling plan where the samples with their

submission time belonging to the same observation time unit of each period are joined together to address the problem of limited sample size.

A representative duration series is built with the sample mean of each unit.

Periodical sampling

Page 13: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Step 2: Pattern Recognition Discovering potential pattern set

K-MaxSDev time-series segmentation algorithm

K-MaxSDev: a hybrid time-series segmentation algorithm based on Bottom-Up, Sliding Windows and Top-Down

K: the initial value for equal segmentation MaxSDev (Maximum Standard Deviation): the testing

criterion for time-series segmentation K and MaxSDev can be specified with empirical functions

provided in the paper (Formula 1 and Formula 2)

Page 14: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: Bottom-Up Process

Initial K equal segmentation

Page 15: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: Sliding Window Process

Sliding Window to merge neighbouring segments

Page 16: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: Sliding Window Process

Testing the standard deviation of the new segment SDev with MaxSDev

Page 17: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: Sliding Window Process

If SDev ≥ MaxSDev, testing failed, stay separated

Failed

Page 18: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: Sliding Window Process

If SDev < MaxSDev, testing successful, merge to form a larger segment

Successful

Page 19: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: Top-Down ProcessAfter Sliding Window process, split those segments which cannot be merged with any neighbours

Page 20: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

K-MaxSDev: IterationRepeat Sliding Window and Top-Down until all segments cannot be merged with neighbouring segments.

Page 21: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Pattern ValidationValidating the final segments with Min_pattern_length to ensure its statistic effectiveness. If failed, marked with ‘invalid’, otherwise, marked with ‘valid’.

Page 22: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Turning Points Discovery

Turning points are specified as either the mean of the invalid pattern or the first value of the next valid pattern.

K-MaxSDev ensures the violations of MaxSDev only occur on the edge of two adjacent segments where the deviations exceed the threshold of MaxSDev

Page 23: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Step 3: Pattern Matching The latest duration sequence with SDev and Mean, can

be classified into three types Type 1: SDev>MaxSDev

Cannot match any valid patterns and must contain at least one turning point

First locate the turning points and then conduct pattern matching

If SDev<MaxSDev, searching for the matched pattern based on Mean. The matched pattern with PSDev and PMean

Type 2: SDev ≥ PSDev Typ3 3: SDev < PSDev

Page 24: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Step 4: Interval Forecasting The user specified confidence value is α% with λ probability

percentile, the predicted mean of the next value is M and its standard deviation is S.

The interval of the next value is predicted to be (M- λS, M+ λS) For Type 2: PSDev ≤SDev<MaxSDev

The next value of the sequence will probably be a turning point since it is on the edge of two different patterns. The value of the turning point is TP.

M = TP, S = MaxSDev

For Type 3: SDev<PSDev The next value of the sequence can be predicted with the statistical

features of the matched pattern M = PMean, S= PSDev

Page 25: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

25

Introduction Time-Series Forecasting Time-Series Patterns

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Motivation Pattern Based Time-Series Forecasting Strategy Evaluation

Conclusion

Where Are We

Page 26: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Simulation Environment SwinDeW-G: a peer-to-peer based grid workflow system

running on the SwinGrid (Swinburne service Grid) platform

Page 27: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Duration Series BuildingSample: 15 duration-series, length 8 hour (8:00am~8:00pm), observation unit every 15 mins. Parameters: K=12, MaxSDev=2.24, Min_Pattern_Length=3

Page 28: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Duration Series Building

Page 29: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Pattern Recognition

Page 30: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Patten Validation and Turning Points Discovery

Page 31: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Forecasting Performance Testing on 30 duration sequences with random

length of 3 to 5.

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

LowerLimit

ActualValue

UpperLimit

Ac

tiv

ity

Du

rati

on

s

Sequence No

Predicted Duration Intervals

Page 32: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Comparison of Prediction Errors

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

PatternBasedMEAN

LAST

Pre

dic

tio

n E

rro

rs

Sequence No

0

5

10

15

20

25

30

5 10 15 20 25 30Number of Sequences

Su

m o

f E

rro

rs PatternBased

MEAN

LASTPattern Based

LAST

MEAN

MEAN: Use the mean value of the duration sequence as prediction

LAST: Use the last value of the duration sequence as prediction

Page 33: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

33

Introduction Time-Series Forecasting Time-Series Patterns

Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

Motivation Pattern Based Time-Series Forecasting Strategy Evaluation

Conclusion

Where Are We

Page 34: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

Conclusion Scientific workflow activity durations are much more complicated

than that of conventional computation tasks Conventional linear time-series forecasting models suffers from

limited sample size and frequent turning points Time-series pattern based forecasting strategy

Duration series building Duration pattern recognition and turning point discovery Duration pattern matching Duration interval forecasting

Our strategy is more scalable with sample size and robust with turning points

Page 35: Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns

X. Liu, J. Chen, K. Liu and Y. Yang, Time-Series Patterns (eScience08), Indianapolis USA, 12 December 2008

35

The End

Thanks! Any Questions?