introduction to time series data and analysis

Introduction to Time Series Data andAnalysis

Simon TaylorDepartment of Mathematics and Statistics

20 April 2016

Contents

What is Time Series Data?Analysis Tools

Trace PlotAuto-Correlation FunctionSpectrum

Time Series ModelsMoving AverageAuto-Regressive

Further Topics

What is Time Series Data?

A time series is a set of observations made sequentially throughtime.

Examples:• Changes in execution time, RAM or bandwidth usage.• Times a software has run in consecutive periods of time.• Financial, geophysical, marketing, demographic, etc.

The objectives in time series analysis are:Description How does the data vary over time?Explanation What causes the observed variation?Prediction What are the future values of the series?Control Aim to improve control over the process.

Common Questions

Q: How important is preservingdata order?

A: Very! Changing data orderbreaks the dependencebetween measurements.

Q: How frequent do I need to takemeasurements?

A: It depends:

• Too sparse, risk missing thedependence structure.

• Too frequent, swamped with noise.

−5

05

n

1Hz

−5

05

n

10H

z

0 10 20 30 40 50

−5

05

n

100H

z

Time

Figure: Sampling Frequency

Why is time series important inbenchmarking?

Q: Can I use simple summarystatistics?

A: You can, but they onlydescribe overall properties.

Q: Can’t I just interpolatebetween data points?

A: Signals are often subject touncontrollable random noise.Error from interpolation maybe large if noise is large.

−3

−2

−1

01

23

X

−3

−2

−1

01

23

Y

0 200 400 600 800 1000

−3

−2

−1

01

23

Z

Time

Figure: Three times series withx̄ = 0 and s2 = 1.

Analysis Tools – Trace Plot

A trace plot is a graph of the measurements against time.

Easy to visually identify key features:

• Trends – Long-term trend in the mean level.

• Seasonality – Regular peaks & falls in the measurements.

• Outliers – Unusual measurements that are inconsistent withthe rest of the data.

• Discontinuities – Abrupt change to the underlying process.

Analysis Tools – Auto-correlation function

Correlation measures thelinear dependence between twodata sets.

Auto-correlation measures thecorrelation between all datapairs at lag k apart.

rk =

∑T−kt=1 (xt − x̄)(xt+k − x̄)

(T − 1)s2 ,

where x̄ and s2 is the samplemean and variance.

Figure: Lag 5 ACF calculation.

Analysis Tools – Spectrum

The spectrum describes howthe power in a time seriesvaries across frequencies.

I(ω) =1πT

∣∣∣∣∣T∑

t=1

xtei2πtω

∣∣∣∣∣2

,

for ω ∈ (0,1/2].

Identifies prominent seasonaland cyclic variation.

0 10 20 30 40 50

Time

1/4

1/5

1/6

1/7

1/8

X

0.0 0.1 0.2 0.3 0.4 0.5

010

2030

4050

FrequencyS

pect

rum

Figure: Fourier decomposition andspectrum of time series Xt .

Time series models

Let X1:T = {X1, . . . ,XT} denote a sequence of T measurements.

A time series is stationary if the distribution of any pair of subsetseparated by lag k , X1:t and X1+k ,t+k , are the same.

A time series is weakly stationary if the first two moments areconstant over time:

E[Xt ] = µ and Cov(Xt ,Xt+k ) = γ(k).

Gaussian White Noise Process, GWNP

The time series {Zt} follows a Gaussian white noise process if:

Zt ∼ N (0, σ2), t = 1, . . . ,T

Gaussian White Noise Process

Figure: Gaussian white noise process.

MA(q) process

Moving Average Process of Order q, MA(q)

The process {Xt} is a moving average process of order q if:

Xt = β0Zt + β1Zt−1 + · · ·+ βqZt−q

where {Zt} is a GWNP and β0, . . . , βq are constants (β0 = 1).

Expectation: E[Xt ] = 0.

Auto-covariance:

Cov(Xt ,Xt+k ) =

{σ2∑q−|k |

i=0 βiβi+|k |, |k | = 0, . . . ,q;0, otherwise.

MA(q) process

Figure: Left: MA(1), β1 = 0.9. Right: MA(2), (β1, β2) = (−0.4,0.9).

AR(p) process

Autoregressive Process of Order p, AR(p)

The process {Yt} is an autoregressive process of order p if:

Yt = α1Yt−1 + · · ·+ αpYt−p + Zt

where {Zt} is a GWNP and α1, . . . , αp are constants.

Expectation: E[Xt ] = 0.

Auto-covariance for AR(1):

Cov(Xt ,Xt+k ) = σ2 α|k |1

1− α21, provided |α1| < 1.

AR(p) process

Figure: Left: AR(1), α1 = 0.9. Right: AR(2), (α1, α2) = (0.8,−0.64).

Non-stationary process

0 200 400 600 800 1000

−10

010

20

Random Walk

Time

0 200 400 600 800 1000

−20

020

Non−stationary AR(2)

Time

0 200 400 600 800 1000

−8

−6

−4

−2

02

AR(1) w/ Mean Change

Time

0 200 400 600 800 1000

−3

−2

−1

01

23

Concatenated Haar MA

Time

Figure: Examples of non-stationary processes.

On-going Research in Time Series

local Whittlenonstationary

residual autocorrelationsleast squares estimation

fractional integrationheteroscedasticunobserved components

model selectionspectral analysis

ARIMA

state−space model

empirical distribution function

linear time seriesFisher information matrix

Stochastic volatility

ARVARMA

conditional heteroscedasticity

simulation

kernel−density estimation

deterministic trend CUSUM test

TAR

nonlinear AR

Brownian motion

consistency

count data

likelihood ratio test

fractional cointegration

Markov chain

Additive outliers

random coefficients

EM algorithm

stationary process

bilinear model

parameter estimation

neural network

QMLE

Bayesian inference

time series

integer−valued time series

estimation

high−frequency data

spatio−temporal

periodogram

long memory

Gaussian process

power

wavelet

RJMCMChypothesis testing semiparametric estimation

goodness−of−fit

threshold model

factor modelleast squares

score test

point process

GARCH

ARMAPortmanteau test

Dickey−Fuller test

changepoint

robustness

spectral density

INAR

cointegration

Monte Carlo experimentnonstationary time series

efficiency

partial autocorrelationnonparametric estimation

MLE

CLT

structural break

ARFIMA

smoothing

Seasonal unit roots

VAR

missing data

infinite variance

periodic time series

block bootstrap

spectral density matrix

unit roots

forecastingmaximum likelihood

ARCH

periodically correlated process

prediction

confidence interval

stationarity

structural change

outlier

AIC

Whittle likelihood

seasonalitysubsampling

Asymptotic distributionLagrange multiplier

Asymptotic normality

stationary test

temporal aggregationnon−Gaussian

Autocorrelation

bootstrap

multivariate time series

locally stationary processes

heavy tailKalman filtering

MCMC

nonlinear

unit root test

MA

identification

ergodicity

Figure: Keyword cloud from the Journal of Time Series Analysis,2002–2015. Red: Models, Navy: Properties, Grey: Inference &Methods

Further Reading

• Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008) Timeseries analysis: Forecasting and control. 4th ed., JohnWiley & Sons.

• Chatfield, C. (2004) The Analysis of Time Series: AnIntroduction. 6th ed., CRC press.

• Signal processing toolbox, MATLABr

(http://uk.mathworks.com/products/signal/)

• Statsmodels, python(http://statsmodels.sourceforge.net/)

Simon Taylor

Department of Mathematics and Statistics

20 April 2016

introduction to time series data and analysis

Documents