bootstrapping expos methods
DESCRIPTION
Caso de estudio: Metodo Bootstrapping en series de tiempoTRANSCRIPT
Bootstrapping EXPOS methods: a case study
Clara Cordeiro
DM/FCT, University of Algarve, Portugal
based on joint work with Professor M. Manuela Neves (supervisior)DM/ISA, Technical University of Lisbon, Portugal
June 23, 2009
ISF 2009, Hong Kong 21-24 June
Clara Cordeiro Bootstrapping EXPOS methods 1 / 28
Outline
1 Introduction
2 Overview of the EXPOS methods
3 Overview of the Bootstrap
4 Procedure
5 Case study
6 Closing coments
7 References
Clara Cordeiro Bootstrapping EXPOS methods 2 / 28
Introduction
Motivation
Boot motivation:
Is a resampling scheme that has a huge application in many fields;
Is a very popular methodology because of its simplicity and niceproperties;
There has been a great development in the area of dependent data.
EXPOS motivation:
It refers to a set of methods that can be used to model and to obtainforecasts;
Is a versatile approach that continually updates a forecastemphasizing the most recent experience;
Are the most widely used forecasting methods.
Clara Cordeiro Bootstrapping EXPOS methods 3 / 28
Introduction
Idea
The idea is to join these two approaches and to construct acomputational algorithm to obtain:
forecasts;
forecast intervals (@work...);
and in a long plan use it as a missing data imputation (@work...).
What to use?
Use EXPOS methods to select the model that better fits a data set;
Use Boot to resampling and reconstructed a replica of the originaldata set;
Use software and packages to build the Boot.EXPOS procedure.
Clara Cordeiro Bootstrapping EXPOS methods 4 / 28
Introduction
Work chronology
Past studies:
forecasting using Holt-Winters method and some depend bootstrapapproaches; sieve bootstrap has revealed a good option;
model selection increased including SES, Holt’s linear and HWadditive and multiplicative methods;
first sketch of a computational algorithm where some considerationsare made: stationarity, BoxCox transformations, differencing, ...;
applied to the M3 competition; good behavior among six well-knowmethods but problems with small data sets;
...
Clara Cordeiro Bootstrapping EXPOS methods 5 / 28
Introduction
Work chronology
Recent studies:
EXPOS selection has been augmented to a choice of thirty methods(additive and multiplicative error term);
case study of 40 well-known time series is performed;
BoxCox transformation with 0 < λ < 1 is used;
run procedure using the M3 competition time series;
with this large set of model selection, “better” point forecasts wereobtained. Bootstrap intervals are narrower. BCa bootstrap intervalscan improve the results (@work...);
...
Clara Cordeiro Bootstrapping EXPOS methods 6 / 28
Introduction
Objective of the project
In sume:
Consider EXPOS in modeling time series, instead of the traditionalARIMA class.
Combines the use of EXPOS methods with the bootstrapmethodology - Boot.EXPOS.
Use the Boot.EXPOS procedure to obtain forecasts.
Test it on some well-known data sets and then use it on large datasets.
The forecast performance is evaluated using some accuracy measuresand the results are compared with other forecasting EXPOS methods.
Clara Cordeiro Bootstrapping EXPOS methods 7 / 28
Follow up: EXPOS methods
Clara Cordeiro Bootstrapping EXPOS methods 8 / 28
Overview of the EXPOS methods
Some notes...
A times series is a combination of a pattern and some random error.
According to the characteristics that a times series reveals one ofthose EXPOS methods is chosen by the AIC criterion.
The exponential smoothing parameters (α, β, γ) are estimated byminimizing the MSE.
The goal is to find the best EXPOS model and to separate thepattern (trend or/and seasonality) from the error term.
Clara Cordeiro Bootstrapping EXPOS methods 9 / 28
Overview of the EXPOS methods
The exponential smoothing classification
Historical evolution: Pegel (1969), extended by Gardner (1985),Hyndman et al. (2002) and Taylor (2003).
Seasonal ComponentTrend N A MComponent (None) (Additive) (Multiplicative)
N (None) N,N N,A N,MA (Additive) A,N A,A A,MAd (Additive damped) Ad,N Ad,A Ad,MM (Multiplicative) M,N M,A M,MMd (Multiplicative damped) Md,N Md,A Md,M
For each method in the framework, additive error and multiplicative errorversions are considered.
Clara Cordeiro Bootstrapping EXPOS methods 10 / 28
Follow up: Bootstrap methodology
Clara Cordeiro Bootstrapping EXPOS methods 11 / 28
Overview of the Bootstrap
What is Bootstrap?
The bootstrap methodology (Efron 1979) (IID bootstrap) is aversatile approach and so it could be applied to many models.
Advantage: relies on few assumptions and is easy to implement;Disadvantage: is time consuming.
Data structure: i.i.d. or dependent.
Generally not too hard to do it.
Draw random samples with replacement;Extract results, such as predictions;Iterative calculation;Accumulate the results.
Clara Cordeiro Bootstrapping EXPOS methods 12 / 28
Overview of the Bootstrap
Questioning Boot
Does it really work?
Yes!Key of success is to make sure that the bootstrap resampling correctlymimics the original samplingKey assumption is independence
Bootstrap always work?No!
It just works much more often than any of the common alternatives
Cases when it fails
Resampling done incorrectly, failing to preserve the original sampling
structure
Data are dependent, but resampling done as though they were
independent
Some really weird statistics, like the maximum, that depend on very
small features of the data
Clara Cordeiro Bootstrapping EXPOS methods 13 / 28
Overview of the Bootstrap
Bootstrap and Dependent Data
The majority of bootstrap methods for dependent data suggests theuse of blocks, in order to keep the dependence structure.
The blocks are resampled as in the independent case and within theblocks the dependence structure is kept.
However, if the time series process is driven from iid innovationsanother way of resampling can be used; then the IID bootstrap caneasily be extended to the dependent case.
The autoregressive AR(p) is a commonly example of such a process.
Clara Cordeiro Bootstrapping EXPOS methods 14 / 28
Connecting Boot & EXPOS
Any good model should yield residuals that do not show significantpatterns.
Most of exponential smoothing models do not yield white noiseresiduals.
In fact it is commonly found some pattern left in the residuals.
In order to model such left-over patterns an autoregressive process isused to filter the EXPOS residuals series.
Because of the iid nature of the AR residuals, the IID bootstrap caneasily be extended to the dependent case.
These model-based resampling for time series is based on resamplingfrom the AR residuals.
Clara Cordeiro Bootstrapping EXPOS methods 15 / 28
Example
Unemployment benefits in Australia
Year
mon
thly
tota
l of p
eopl
e
1960 1970 1980 1990
0e+
002e
+05
4e+
056e
+05
8e+
05
0e+
003e
+05
6e+
05
obse
rved
0e+
003e
+05
6e+
05
leve
l
−50
0050
0015
000
slop
e
−15
000
010
000
1955 1960 1965 1970 1975 1980 1985 1990
seas
on
Time
Decomposition by ETS(A,Ad,A) method
dole: EXPOS resid
1955 1960 1965 1970 1975 1980 1985 1990
−40
000
040
000
0 5 10 15 20 25
−0.
20.
00.
20.
4
Lag
AC
F
0 5 10 15 20 25
−0.
20.
00.
20.
4
Lag
PAC
F
dole: AR resid
1955 1960 1965 1970 1975 1980 1985 1990
−30
000
020
000
0 5 10 15 20 25
−0.
15−
0.05
0.05
0.15
Lag
AC
F
0 5 10 15 20 25
−0.
15−
0.05
0.05
0.15
Lag
PAC
F
1 time series dole
2 “best” EXPOSselect by AIC
3 EXPOS residuals
4 AR residuals
Clara Cordeiro Bootstrapping EXPOS methods 16 / 28
Inside the procedure
Simulation examples
Simulation 1Resampling (#1)
0 100 200 300 400 500
−30
000
−10
000
010
000
3000
0
AR reconstruction (#1)
0 100 200 300 400
−20
000
020
000
4000
0
AR reconstruction (#1)+components
1955 1960 1965 1970 1975 1980 1985 1990
0e+
002e
+05
4e+
056e
+05
Forecasts from ETS(A,Ad,A)
1960 1970 1980 1990
0e+
002e
+05
4e+
056e
+05
8e+
05
Simulation 2Resampling (#2)
0 100 200 300 400 500
−30
000
−10
000
010
000
3000
0
AR reconstruction (#2)
0 100 200 300 400
−30
000
−10
000
010
000
2000
030
000
AR reconstruction (#2)+components
1955 1960 1965 1970 1975 1980 1985 1990
0e+
002e
+05
4e+
056e
+05
Forecasts from ETS(A,Ad,A)
1960 1970 1980 1990
0e+
002e
+05
4e+
056e
+05
8e+
05
Clara Cordeiro Bootstrapping EXPOS methods 17 / 28
Inside the procedure
In each b = 1, · · · ,B simulations, h forecast are achieved andrecorded in a matrix
f11 f12 · · · f1hf21 f22 · · · f2h...
.... . .
...fB1 fB2 · · · fBh
At the end of the B replications, the forecasts achieved throughBoot.EXPOS procedure are the mean over each column
For example, for the dole time series
2 4 6 8 10 12
7000
0075
0000
8000
0085
0000
Forecast comparison: serie dole
Forecast horizon
Mon
thly
tota
l
1 2 3 4 5 6 7 8 9 10 11 12
7000
0075
0000
8000
0085
0000
real values EXPOS Boot.EXPOS
Clara Cordeiro Bootstrapping EXPOS methods 18 / 28
Procedure Boot.EXPOS
Complete description
Remark: previous EXPOS fit required
1 Adjust an AR model to the residuals with increasing order p selectedby AIC criterion;
2 Obtain the AR residuals and center them;
3 Draw a random sample from the centered residuals;
4 Use AR model recursively for obtaining a bootstrap series of theresiduals;
5 Construct a time series replica using the EXPOS components and theprevious bootstrap series;
6 Obtain h step-ahead forecasts from the new time series using theEXPOS fit;
7 Repeat Step 3 to Step 6, B times;
8 For each h, obtain the mean of the B forecasts.
Clara Cordeiro Bootstrapping EXPOS methods 19 / 28
Follow up: Case study
Clara Cordeiro Bootstrapping EXPOS methods 20 / 28
Case study
The selection
Six time series were chosen:
the first serie, nav, refers to the number of airplanes that flight in theFlight Information Region (FIR) of Lisbon;
pigs, ukdeaths and writing are three series that can be found inpackage fma;
UKDriverDeaths in package datasets;
gas in the package forecast.
Clara Cordeiro Bootstrapping EXPOS methods 21 / 28
Case study
The data
Number of airplains in the FIR Lisbon
YEAR
mon
thly
tota
l
1985 1990 1995 2000 2005
1000
020
000
3000
0
Number of pigs slaughtered in Victoria, Australia
YEAR
mon
thly
tota
l1980 1985 1990 1995
4000
060
000
8000
010
0000
1200
00
Deaths and serious injuries on UK roads
YEAR
mon
thly
tota
l
1976 1978 1980 1982 1984
1200
1400
1600
1800
2000
2200
Sales of printing and writing paper
YEAR
thou
sand
s of
Fre
nch
fran
cs
1968 1970 1972 1974 1976 1978
200
400
600
800
1000
Road casualties in Great Britain
YEAR
mon
thly
tota
ls o
f car
driv
ers
1970 1975 1980 1985
1000
1500
2000
2500
Australian gas production
YEAR
1960 1970 1980 1990
010
000
3000
050
000
Clara Cordeiro Bootstrapping EXPOS methods 22 / 28
Case study
Computing
for each time series the “best” EXPOS method is selected using AIC
EXPOS residuals are extracted and tested for white noise hypotheses
apply Boot.EXPOS
obtain the forecasts for the h period ahead
evaluate the performance of Boot.EXPOS procedure using someaccuracy measures:
Acronyms Definition Formula
RMSE Root Mean Squared Error√
(mean(E 2t ))
MAE Mean Absolute Error mean(|Et |)
MAPE Mean Absolute Percentage Error mean(100∣
∣
∣
Et
Yt
∣
∣
∣)
Clara Cordeiro Bootstrapping EXPOS methods 23 / 28
Case study
Forecast results
2 4 6 8 10 12
3000
035
000
4000
0
Forecast comparison: serie Nav
Forecast horizon
Mon
thly
tota
l
1 2 3 4 5 6 7 8 9 10 11 12
3000
035
000
4000
0
real values EXPOS Boot.EXPOS
2 4 6 8 10 12
8500
095
000
1050
0011
5000
Forecast comparison: serie pigs
Forecast horizon
Mon
thly
tota
l1 2 3 4 5 6 7 8 9 10 11 12
8500
095
000
1050
0011
5000
real values EXPOS Boot.EXPOS
2 4 6 8 10 12
1200
1400
1600
1800
Forecast comparison: serie ukdeaths
Forecast horizon
Mon
thly
tota
l
1 2 3 4 5 6 7 8 9 10 11 12
1200
1400
1600
1800
real values EXPOS Boot.EXPOS
2 4 6 8 10 12
400
600
800
1000
Forecast comparison: serie writing
Forecast horizon
Mon
thly
tota
l
1 2 3 4 5 6 7 8 9 10 11 12
400
600
800
1000
real values EXPOS Boot.EXPOS
2 4 6 8 10 12
1000
1200
1400
1600
1800
Forecast comparison: serie UKDriverDeaths
Forecast horizon
Mon
thly
tota
l
1 2 3 4 5 6 7 8 9 10 11 12
1000
1200
1400
1600
1800
real values EXPOS Boot.EXPOS
2 4 6 8 10 12
4000
050
000
6000
070
000
Forecast comparison: serie gas
Forecast horizon
Mon
thly
tota
l1 2 3 4 5 6 7 8 9 10 11 12
4000
050
000
6000
070
000
real values EXPOS Boot.EXPOS
Clara Cordeiro Bootstrapping EXPOS methods 24 / 28
Case study
Accuracy results
Accuracy measures
Serie n s h EXPOS fit method RMSE MAE MAPE
nav 279 12 12 (M,A,M) EXPOS 3661.23 3369.51 10.15Boot.EXPOS 3456.60 3128.17 9.44
pigs 188 12 12 (A,N,A) EXPOS 9377.28 7554.21 7.48Boot.EXPOS 8653.24 6475.87 6.49
ukdeaths 120 12 12 (M,N,M) EXPOS 156.84 143.16 10.13Boot.EXPOS 89.84 71.28 4.89
writing 120 12 12 (A,A,A) EXPOS 58.61 44.96 5.97Boot.EXPOS 57.21 43.95 5.92
UKDriverDeaths 192 12 12 (M,N,A) EXPOS 205.63 198.49 14.68Boot.EXPOS 87.78 70.60 5.09
gas 476 12 12 (M,Md,M) EXPOS 2773.72 2097.73 4.22Boot.EXPOS 2348.16 1908.15 3.84
Clara Cordeiro Bootstrapping EXPOS methods 25 / 28
Closing coments
About this study:
For these time series the Boot.EXPOS has a good behavior inobtaining forecasts.
The “optimal” combination of EXPOS methods and bootstrapresampling could provide more accurate forecasts.
About others studies:
Considering other data sets that we have analyzed, the Boot.EXPOShas also a good performance.
If transformation is applied to the data, more series have goodaccuracy results.
Clara Cordeiro Bootstrapping EXPOS methods 26 / 28
Acknowledgements: To the IIF for the travel award that make it possibleto participate in the ISF 2009.
Thank you!
Complete references on the next slide.
Clara Cordeiro Bootstrapping EXPOS methods 27 / 28
References
Brockwell, P. and Davis, R., Introduction to Time Series and Forescasting, 2nd edition, Springer-Verlag New York, 2002.
Cordeiro, C. and Neves, M., Bootstrap and exponential smoothing working together in forecasting time series,
Proceedings in Computational Statistics (COMPSTAT 2008), Paula Brito (editor), Physica-Verlag, 2008,pp. 891–899.
Cordeiro, C. and Neves, M., Forecasting time series with Boot.EXPOS procedure, to be appear in REVSTAT statistical
Journal, June, 2009.
Davison, A.C. and Hinkley, D.V., Bootstrap Methods and their Application, Cambridge University Press, 1998.
Efron, B, Bootstrap Methods: Another Look at the Jackknife, The Annals of Statistics, 7 (1979), pp. 1–26.
Gardner Jr, E.S., Exponential Smoothing: The State of the Art-Part II, International Journal of Forecasting, 22 (2006),
pp. 637–666.
Hyndman, R., forecast: Forecasting functions for time series; software available at
http://www.robjhyndman.com/Rlibrary/forecast/.
Hyndman, R., expsmooth: Data sets for ”Forecasting with exponential smoothing”; software available at
http://www.robjhyndman.com/Rlibrary/forecast/.
Hyndman, R., fma: Data sets from ”Forecasting: methods and applications” by Makridakis, Wheelwright & Hyndman
(1998); software available at http://www.robjhyndman.com/Rlibrary/forecast/.
Hyndman, R., Koehler, A., Ord, J. and Snyder, R., Forecasting with Exponential Smoothing: The State Space
Approach, Springer-Verlag Inc, 2008.
Lahiri, S.N., Resampling Methods for Dependente Data, Springer Verlag Inc, 2003.
R Develpment core team, R: A Language and Environment for Statistical Computing; software available at
http://www.R-project.org.
Trapletti, A. and Hornik, K., tseries: Time Series Analysis and Computational Finance; R package version 0.10-18, 2009.
Clara Cordeiro Bootstrapping EXPOS methods 28 / 28