bootstrapping expos methods

28
Bootstrapping EXPOS methods: a case study Clara Cordeiro DM/FCT, University of Algarve, Portugal based on joint work with Professor M. Manuela Neves (supervisior) DM/ISA, Technical University of Lisbon, Portugal June 23, 2009 ISF 2009, Hong Kong 21-24 June Clara Cordeiro Bootstrapping EXPOS methods 1 / 28

Upload: linahuertas

Post on 17-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Caso de estudio: Metodo Bootstrapping en series de tiempo

TRANSCRIPT

Page 1: Bootstrapping EXPOS Methods

Bootstrapping EXPOS methods: a case study

Clara Cordeiro

DM/FCT, University of Algarve, Portugal

based on joint work with Professor M. Manuela Neves (supervisior)DM/ISA, Technical University of Lisbon, Portugal

June 23, 2009

ISF 2009, Hong Kong 21-24 June

Clara Cordeiro Bootstrapping EXPOS methods 1 / 28

Page 2: Bootstrapping EXPOS Methods

Outline

1 Introduction

2 Overview of the EXPOS methods

3 Overview of the Bootstrap

4 Procedure

5 Case study

6 Closing coments

7 References

Clara Cordeiro Bootstrapping EXPOS methods 2 / 28

Page 3: Bootstrapping EXPOS Methods

Introduction

Motivation

Boot motivation:

Is a resampling scheme that has a huge application in many fields;

Is a very popular methodology because of its simplicity and niceproperties;

There has been a great development in the area of dependent data.

EXPOS motivation:

It refers to a set of methods that can be used to model and to obtainforecasts;

Is a versatile approach that continually updates a forecastemphasizing the most recent experience;

Are the most widely used forecasting methods.

Clara Cordeiro Bootstrapping EXPOS methods 3 / 28

Page 4: Bootstrapping EXPOS Methods

Introduction

Idea

The idea is to join these two approaches and to construct acomputational algorithm to obtain:

forecasts;

forecast intervals (@work...);

and in a long plan use it as a missing data imputation (@work...).

What to use?

Use EXPOS methods to select the model that better fits a data set;

Use Boot to resampling and reconstructed a replica of the originaldata set;

Use software and packages to build the Boot.EXPOS procedure.

Clara Cordeiro Bootstrapping EXPOS methods 4 / 28

Page 5: Bootstrapping EXPOS Methods

Introduction

Work chronology

Past studies:

forecasting using Holt-Winters method and some depend bootstrapapproaches; sieve bootstrap has revealed a good option;

model selection increased including SES, Holt’s linear and HWadditive and multiplicative methods;

first sketch of a computational algorithm where some considerationsare made: stationarity, BoxCox transformations, differencing, ...;

applied to the M3 competition; good behavior among six well-knowmethods but problems with small data sets;

...

Clara Cordeiro Bootstrapping EXPOS methods 5 / 28

Page 6: Bootstrapping EXPOS Methods

Introduction

Work chronology

Recent studies:

EXPOS selection has been augmented to a choice of thirty methods(additive and multiplicative error term);

case study of 40 well-known time series is performed;

BoxCox transformation with 0 < λ < 1 is used;

run procedure using the M3 competition time series;

with this large set of model selection, “better” point forecasts wereobtained. Bootstrap intervals are narrower. BCa bootstrap intervalscan improve the results (@work...);

...

Clara Cordeiro Bootstrapping EXPOS methods 6 / 28

Page 7: Bootstrapping EXPOS Methods

Introduction

Objective of the project

In sume:

Consider EXPOS in modeling time series, instead of the traditionalARIMA class.

Combines the use of EXPOS methods with the bootstrapmethodology - Boot.EXPOS.

Use the Boot.EXPOS procedure to obtain forecasts.

Test it on some well-known data sets and then use it on large datasets.

The forecast performance is evaluated using some accuracy measuresand the results are compared with other forecasting EXPOS methods.

Clara Cordeiro Bootstrapping EXPOS methods 7 / 28

Page 8: Bootstrapping EXPOS Methods

Follow up: EXPOS methods

Clara Cordeiro Bootstrapping EXPOS methods 8 / 28

Page 9: Bootstrapping EXPOS Methods

Overview of the EXPOS methods

Some notes...

A times series is a combination of a pattern and some random error.

According to the characteristics that a times series reveals one ofthose EXPOS methods is chosen by the AIC criterion.

The exponential smoothing parameters (α, β, γ) are estimated byminimizing the MSE.

The goal is to find the best EXPOS model and to separate thepattern (trend or/and seasonality) from the error term.

Clara Cordeiro Bootstrapping EXPOS methods 9 / 28

Page 10: Bootstrapping EXPOS Methods

Overview of the EXPOS methods

The exponential smoothing classification

Historical evolution: Pegel (1969), extended by Gardner (1985),Hyndman et al. (2002) and Taylor (2003).

Seasonal ComponentTrend N A MComponent (None) (Additive) (Multiplicative)

N (None) N,N N,A N,MA (Additive) A,N A,A A,MAd (Additive damped) Ad,N Ad,A Ad,MM (Multiplicative) M,N M,A M,MMd (Multiplicative damped) Md,N Md,A Md,M

For each method in the framework, additive error and multiplicative errorversions are considered.

Clara Cordeiro Bootstrapping EXPOS methods 10 / 28

Page 11: Bootstrapping EXPOS Methods

Follow up: Bootstrap methodology

Clara Cordeiro Bootstrapping EXPOS methods 11 / 28

Page 12: Bootstrapping EXPOS Methods

Overview of the Bootstrap

What is Bootstrap?

The bootstrap methodology (Efron 1979) (IID bootstrap) is aversatile approach and so it could be applied to many models.

Advantage: relies on few assumptions and is easy to implement;Disadvantage: is time consuming.

Data structure: i.i.d. or dependent.

Generally not too hard to do it.

Draw random samples with replacement;Extract results, such as predictions;Iterative calculation;Accumulate the results.

Clara Cordeiro Bootstrapping EXPOS methods 12 / 28

Page 13: Bootstrapping EXPOS Methods

Overview of the Bootstrap

Questioning Boot

Does it really work?

Yes!Key of success is to make sure that the bootstrap resampling correctlymimics the original samplingKey assumption is independence

Bootstrap always work?No!

It just works much more often than any of the common alternatives

Cases when it fails

Resampling done incorrectly, failing to preserve the original sampling

structure

Data are dependent, but resampling done as though they were

independent

Some really weird statistics, like the maximum, that depend on very

small features of the data

Clara Cordeiro Bootstrapping EXPOS methods 13 / 28

Page 14: Bootstrapping EXPOS Methods

Overview of the Bootstrap

Bootstrap and Dependent Data

The majority of bootstrap methods for dependent data suggests theuse of blocks, in order to keep the dependence structure.

The blocks are resampled as in the independent case and within theblocks the dependence structure is kept.

However, if the time series process is driven from iid innovationsanother way of resampling can be used; then the IID bootstrap caneasily be extended to the dependent case.

The autoregressive AR(p) is a commonly example of such a process.

Clara Cordeiro Bootstrapping EXPOS methods 14 / 28

Page 15: Bootstrapping EXPOS Methods

Connecting Boot & EXPOS

Any good model should yield residuals that do not show significantpatterns.

Most of exponential smoothing models do not yield white noiseresiduals.

In fact it is commonly found some pattern left in the residuals.

In order to model such left-over patterns an autoregressive process isused to filter the EXPOS residuals series.

Because of the iid nature of the AR residuals, the IID bootstrap caneasily be extended to the dependent case.

These model-based resampling for time series is based on resamplingfrom the AR residuals.

Clara Cordeiro Bootstrapping EXPOS methods 15 / 28

Page 16: Bootstrapping EXPOS Methods

Example

Unemployment benefits in Australia

Year

mon

thly

tota

l of p

eopl

e

1960 1970 1980 1990

0e+

002e

+05

4e+

056e

+05

8e+

05

0e+

003e

+05

6e+

05

obse

rved

0e+

003e

+05

6e+

05

leve

l

−50

0050

0015

000

slop

e

−15

000

010

000

1955 1960 1965 1970 1975 1980 1985 1990

seas

on

Time

Decomposition by ETS(A,Ad,A) method

dole: EXPOS resid

1955 1960 1965 1970 1975 1980 1985 1990

−40

000

040

000

0 5 10 15 20 25

−0.

20.

00.

20.

4

Lag

AC

F

0 5 10 15 20 25

−0.

20.

00.

20.

4

Lag

PAC

F

dole: AR resid

1955 1960 1965 1970 1975 1980 1985 1990

−30

000

020

000

0 5 10 15 20 25

−0.

15−

0.05

0.05

0.15

Lag

AC

F

0 5 10 15 20 25

−0.

15−

0.05

0.05

0.15

Lag

PAC

F

1 time series dole

2 “best” EXPOSselect by AIC

3 EXPOS residuals

4 AR residuals

Clara Cordeiro Bootstrapping EXPOS methods 16 / 28

Page 17: Bootstrapping EXPOS Methods

Inside the procedure

Simulation examples

Simulation 1Resampling (#1)

0 100 200 300 400 500

−30

000

−10

000

010

000

3000

0

AR reconstruction (#1)

0 100 200 300 400

−20

000

020

000

4000

0

AR reconstruction (#1)+components

1955 1960 1965 1970 1975 1980 1985 1990

0e+

002e

+05

4e+

056e

+05

Forecasts from ETS(A,Ad,A)

1960 1970 1980 1990

0e+

002e

+05

4e+

056e

+05

8e+

05

Simulation 2Resampling (#2)

0 100 200 300 400 500

−30

000

−10

000

010

000

3000

0

AR reconstruction (#2)

0 100 200 300 400

−30

000

−10

000

010

000

2000

030

000

AR reconstruction (#2)+components

1955 1960 1965 1970 1975 1980 1985 1990

0e+

002e

+05

4e+

056e

+05

Forecasts from ETS(A,Ad,A)

1960 1970 1980 1990

0e+

002e

+05

4e+

056e

+05

8e+

05

Clara Cordeiro Bootstrapping EXPOS methods 17 / 28

Page 18: Bootstrapping EXPOS Methods

Inside the procedure

In each b = 1, · · · ,B simulations, h forecast are achieved andrecorded in a matrix

f11 f12 · · · f1hf21 f22 · · · f2h...

.... . .

...fB1 fB2 · · · fBh

At the end of the B replications, the forecasts achieved throughBoot.EXPOS procedure are the mean over each column

For example, for the dole time series

2 4 6 8 10 12

7000

0075

0000

8000

0085

0000

Forecast comparison: serie dole

Forecast horizon

Mon

thly

tota

l

1 2 3 4 5 6 7 8 9 10 11 12

7000

0075

0000

8000

0085

0000

real values EXPOS Boot.EXPOS

Clara Cordeiro Bootstrapping EXPOS methods 18 / 28

Page 19: Bootstrapping EXPOS Methods

Procedure Boot.EXPOS

Complete description

Remark: previous EXPOS fit required

1 Adjust an AR model to the residuals with increasing order p selectedby AIC criterion;

2 Obtain the AR residuals and center them;

3 Draw a random sample from the centered residuals;

4 Use AR model recursively for obtaining a bootstrap series of theresiduals;

5 Construct a time series replica using the EXPOS components and theprevious bootstrap series;

6 Obtain h step-ahead forecasts from the new time series using theEXPOS fit;

7 Repeat Step 3 to Step 6, B times;

8 For each h, obtain the mean of the B forecasts.

Clara Cordeiro Bootstrapping EXPOS methods 19 / 28

Page 20: Bootstrapping EXPOS Methods

Follow up: Case study

Clara Cordeiro Bootstrapping EXPOS methods 20 / 28

Page 21: Bootstrapping EXPOS Methods

Case study

The selection

Six time series were chosen:

the first serie, nav, refers to the number of airplanes that flight in theFlight Information Region (FIR) of Lisbon;

pigs, ukdeaths and writing are three series that can be found inpackage fma;

UKDriverDeaths in package datasets;

gas in the package forecast.

Clara Cordeiro Bootstrapping EXPOS methods 21 / 28

Page 22: Bootstrapping EXPOS Methods

Case study

The data

Number of airplains in the FIR Lisbon

YEAR

mon

thly

tota

l

1985 1990 1995 2000 2005

1000

020

000

3000

0

Number of pigs slaughtered in Victoria, Australia

YEAR

mon

thly

tota

l1980 1985 1990 1995

4000

060

000

8000

010

0000

1200

00

Deaths and serious injuries on UK roads

YEAR

mon

thly

tota

l

1976 1978 1980 1982 1984

1200

1400

1600

1800

2000

2200

Sales of printing and writing paper

YEAR

thou

sand

s of

Fre

nch

fran

cs

1968 1970 1972 1974 1976 1978

200

400

600

800

1000

Road casualties in Great Britain

YEAR

mon

thly

tota

ls o

f car

driv

ers

1970 1975 1980 1985

1000

1500

2000

2500

Australian gas production

YEAR

1960 1970 1980 1990

010

000

3000

050

000

Clara Cordeiro Bootstrapping EXPOS methods 22 / 28

Page 23: Bootstrapping EXPOS Methods

Case study

Computing

for each time series the “best” EXPOS method is selected using AIC

EXPOS residuals are extracted and tested for white noise hypotheses

apply Boot.EXPOS

obtain the forecasts for the h period ahead

evaluate the performance of Boot.EXPOS procedure using someaccuracy measures:

Acronyms Definition Formula

RMSE Root Mean Squared Error√

(mean(E 2t ))

MAE Mean Absolute Error mean(|Et |)

MAPE Mean Absolute Percentage Error mean(100∣

Et

Yt

∣)

Clara Cordeiro Bootstrapping EXPOS methods 23 / 28

Page 24: Bootstrapping EXPOS Methods

Case study

Forecast results

2 4 6 8 10 12

3000

035

000

4000

0

Forecast comparison: serie Nav

Forecast horizon

Mon

thly

tota

l

1 2 3 4 5 6 7 8 9 10 11 12

3000

035

000

4000

0

real values EXPOS Boot.EXPOS

2 4 6 8 10 12

8500

095

000

1050

0011

5000

Forecast comparison: serie pigs

Forecast horizon

Mon

thly

tota

l1 2 3 4 5 6 7 8 9 10 11 12

8500

095

000

1050

0011

5000

real values EXPOS Boot.EXPOS

2 4 6 8 10 12

1200

1400

1600

1800

Forecast comparison: serie ukdeaths

Forecast horizon

Mon

thly

tota

l

1 2 3 4 5 6 7 8 9 10 11 12

1200

1400

1600

1800

real values EXPOS Boot.EXPOS

2 4 6 8 10 12

400

600

800

1000

Forecast comparison: serie writing

Forecast horizon

Mon

thly

tota

l

1 2 3 4 5 6 7 8 9 10 11 12

400

600

800

1000

real values EXPOS Boot.EXPOS

2 4 6 8 10 12

1000

1200

1400

1600

1800

Forecast comparison: serie UKDriverDeaths

Forecast horizon

Mon

thly

tota

l

1 2 3 4 5 6 7 8 9 10 11 12

1000

1200

1400

1600

1800

real values EXPOS Boot.EXPOS

2 4 6 8 10 12

4000

050

000

6000

070

000

Forecast comparison: serie gas

Forecast horizon

Mon

thly

tota

l1 2 3 4 5 6 7 8 9 10 11 12

4000

050

000

6000

070

000

real values EXPOS Boot.EXPOS

Clara Cordeiro Bootstrapping EXPOS methods 24 / 28

Page 25: Bootstrapping EXPOS Methods

Case study

Accuracy results

Accuracy measures

Serie n s h EXPOS fit method RMSE MAE MAPE

nav 279 12 12 (M,A,M) EXPOS 3661.23 3369.51 10.15Boot.EXPOS 3456.60 3128.17 9.44

pigs 188 12 12 (A,N,A) EXPOS 9377.28 7554.21 7.48Boot.EXPOS 8653.24 6475.87 6.49

ukdeaths 120 12 12 (M,N,M) EXPOS 156.84 143.16 10.13Boot.EXPOS 89.84 71.28 4.89

writing 120 12 12 (A,A,A) EXPOS 58.61 44.96 5.97Boot.EXPOS 57.21 43.95 5.92

UKDriverDeaths 192 12 12 (M,N,A) EXPOS 205.63 198.49 14.68Boot.EXPOS 87.78 70.60 5.09

gas 476 12 12 (M,Md,M) EXPOS 2773.72 2097.73 4.22Boot.EXPOS 2348.16 1908.15 3.84

Clara Cordeiro Bootstrapping EXPOS methods 25 / 28

Page 26: Bootstrapping EXPOS Methods

Closing coments

About this study:

For these time series the Boot.EXPOS has a good behavior inobtaining forecasts.

The “optimal” combination of EXPOS methods and bootstrapresampling could provide more accurate forecasts.

About others studies:

Considering other data sets that we have analyzed, the Boot.EXPOShas also a good performance.

If transformation is applied to the data, more series have goodaccuracy results.

Clara Cordeiro Bootstrapping EXPOS methods 26 / 28

Page 27: Bootstrapping EXPOS Methods

Acknowledgements: To the IIF for the travel award that make it possibleto participate in the ISF 2009.

Thank you!

Complete references on the next slide.

Clara Cordeiro Bootstrapping EXPOS methods 27 / 28

Page 28: Bootstrapping EXPOS Methods

References

Brockwell, P. and Davis, R., Introduction to Time Series and Forescasting, 2nd edition, Springer-Verlag New York, 2002.

Cordeiro, C. and Neves, M., Bootstrap and exponential smoothing working together in forecasting time series,

Proceedings in Computational Statistics (COMPSTAT 2008), Paula Brito (editor), Physica-Verlag, 2008,pp. 891–899.

Cordeiro, C. and Neves, M., Forecasting time series with Boot.EXPOS procedure, to be appear in REVSTAT statistical

Journal, June, 2009.

Davison, A.C. and Hinkley, D.V., Bootstrap Methods and their Application, Cambridge University Press, 1998.

Efron, B, Bootstrap Methods: Another Look at the Jackknife, The Annals of Statistics, 7 (1979), pp. 1–26.

Gardner Jr, E.S., Exponential Smoothing: The State of the Art-Part II, International Journal of Forecasting, 22 (2006),

pp. 637–666.

Hyndman, R., forecast: Forecasting functions for time series; software available at

http://www.robjhyndman.com/Rlibrary/forecast/.

Hyndman, R., expsmooth: Data sets for ”Forecasting with exponential smoothing”; software available at

http://www.robjhyndman.com/Rlibrary/forecast/.

Hyndman, R., fma: Data sets from ”Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman

(1998); software available at http://www.robjhyndman.com/Rlibrary/forecast/.

Hyndman, R., Koehler, A., Ord, J. and Snyder, R., Forecasting with Exponential Smoothing: The State Space

Approach, Springer-Verlag Inc, 2008.

Lahiri, S.N., Resampling Methods for Dependente Data, Springer Verlag Inc, 2003.

R Develpment core team, R: A Language and Environment for Statistical Computing; software available at

http://www.R-project.org.

Trapletti, A. and Hornik, K., tseries: Time Series Analysis and Computational Finance; R package version 0.10-18, 2009.

Clara Cordeiro Bootstrapping EXPOS methods 28 / 28