natcor forecast evaluation€¦ · static origin evaluation 22 10 20 30 40 50 60 70 80 350 400 450...
TRANSCRIPT
Forecast Evaluation
NATCOR
Forecasting with ARIMA models
Nikolaos Kourentzes
O u t l i n e
1. Bias measures
2. Accuracy measures
3. Evaluation schemes
4. Prediction intervals
5. Parameter selection
6. Method selection
2
F o r e c a s t E v a l u a t i o n
3
Forecast errors Inaccurate results Loss (performance, financial etc...)
Measuring the loss is important but often hard to track Forecasting error can be used
as a proxy.
Therefore is it important to track and evaluate forecast errors.
Forecast evaluation = key activity in the forecasting process. It is at the core of:
• Important for Forecast Monitoring
• Essential for Method Selection and Parameterisation
2014.01 2014.11 2014.21 2014.31 2014.41-500
-250
0
250
500
Err
or
Week
2014.01 2014.11 2014.21 2014.31 2014.410
100
200
300
400
500
600
700
800SKU A
Week
Sale
s
Sales (At)
Forecast (Ft)
F o r e c a s t E r r o r - D e f i n i t i o n
4
ttt FAe
e51 e20
e40
e45
Positive errors
Under-forecast
Negative errors
Over-forecast
M e a s u r e s o f B i a s
5
)(
1
1
t
n
t
t
emedianMdE
en
ME
Instead of considering the complete vector of errors we can aggregate them using:
Mean Error: Most common measure of forecast bias
Median Error
Measures of bias show whether we typically over- or under-forecast. Ideally this should be as close to zero as possible.
Forecasting method A ME: 12.67
Forecasting method B ME: -23.12
Forecasting method C ME: -0.076
2014.01 2014.11 2014.21 2014.31 2014.410
200
400
600
800
Positive bias – Under-forecasting
2014.01 2014.11 2014.21 2014.31 2014.410
200
400
600
800
Negative bias – Over-forecasting
2014.01 2014.11 2014.21 2014.31 2014.410
200
400
600
800
Unbiased
M e a s u r e s o f B i a s
6
2014.01 2014.11 2014.21 2014.31 2014.41
0
Err
or
Sale
s
Mean Error
2014.01 2014.11 2014.21 2014.31 2014.41
0
Err
or
Sale
s
Mean Error
2014.01 2014.11 2014.21 2014.31 2014.41
0
Err
or
Sale
s
Mean Error
Mean error = 149.9
Mean error = 0.1
Mean error = -150.1
In this case we typically forecast more than what we should. This forecast will lead to biased decisions.
In this case we typically forecast less than what we should. This forecast will lead to biased decisions.
This forecast shows no preference, therefore it is useful for objective decision making.
A N o t e o n M e a n a n d M e d i a n E r r o r s
7 -800 -600 -400 -200 0 200 400 600 8000
2
4
6
8
10
12
14
16
18
20
Error
Fre
quency
Distribution of errors
Mean Error
Median Error
It is well known that the mean is affected by outliers and asymmetric distributions more than the median. In the context of forecasting:
• Median insensitive to extremes (outliers), summarises better normal performance.
• Mean sensitive to extremes (outliers), useful when we are interested in them. • Substantial differences between mean and median errors error distribution
may have outliers or be asymmetric.
Outliers affect mean strongly
B i a s a n d M a g n i t u d e o f E r r o r s - A c c u r a c y
8
Mean Error = 0 does not tell us if we are accurate, merely whether we are biased. To overcome this we can calculate squared (et
2) or absolute errors (|et|), which do not cancel out once aggregated.
e1
e2
e3
Sum
Mean
- 7
+ 12
- 5
0
0
et et2 |et|
7
12
5
24
8
+ 49
+ 144
+ 25
+ 218
+ 72.67
No error?
M e a s u r e s o f A c c u r a c y – S c a l e D e p e n d e n t
9
Some common errors that can be defined using these operators are:
1. Mean Squared Error (MSE) • Sensitive to outliers (squares) • Non-intuitive (units are squared) • Scale dependent
2. Root Mean Squared Error (RMSE) • As MSE but resulting units not in squares • Scale dependent
3. Mean Absolute Error (MAE) • Robust to outliers • Scale dependent
n
i
ii FAn
MAE1
1
n
i
ii FAn
MSE1
21
n
i
ii FAn
RMSE1
21
Scale dependent errors can only be used to compare different methods on the same time series. If a time series is in “cars” errors are also in “cars”! Similar problems occur due to the scale. Should not be used for comparisons across different time series!
Obviously we can define median versions of the above errors.
A N o t e o n A b s o l u t e a n d S q u a r e d E r r o r s
10
-500 0 5000
5
10
15
20
Error
Fre
quency
Distribution of errors (et)
Mean
Median
0 200 400 600 8000
5
10
15
20
25
30
35
Absolute error
Fre
quency
Distribution of absolute errors (|et|)
Mean
Median
0 200000 400000 600000 8000000
20
40
60
80
100
Squared error
Fre
quency
Distribution of absolute errors (et2)
Mean
Median
Mean: 50,179 units2
Median: 14,022 units2
Mean: 158.8 units Median: 117.8 units
Mean: 224.0 units Median : 118.4 units
Notice how extreme the
outliers become
Squared errors are sensitive to outliers, as they are increased disproportionally to smaller errors. On the other hand, absolute errors do not rescale the errors and the contribution of outliers is not exaggerated.
P e r c e n t a g e E r r o r s – S c a l e I n d e p e n d e n t
11
In order to compare across different time series we define a series of scale independent measures; neither the level or the units of the original time series are important.
Percentage errors (PE) • Expresses errors as a ratio to actual level • Free of units • Requires a “meaningful” zero (0oC is not a
“meaningful” zero), so that the actuals do not become negative.
t
ttt
A
FAPE
Based on the percentage errors we can define percentage bias and accuracy metrics. These will be scale and unit independent and therefore allow comparisons and aggregations between time series.
P e r c e n t a g e E r r o r s – S c a l e I n d e p e n d e n t
12
1. Mean Absolute Percentage Error (MAPE) • Scale independent • Very intuitive (method is % wrong) • Biased: Positive and negative errors do not
count equally! • Requires non-zero and positive denominator
2. Symmetric Mean Absolute Error (sMAPE) • If you see it, avoid it! Has too many several
issues!
n
i i
ii
A
FA
nMAPE
1
1
There are also median versions of the absolute percentage errors.
Actual = 100 Forecast = 90 MAPE = |10|/100 = 10%
MAPE bias example
Actual = 90 Forecast = 100 MAPE = |-10|/90 = 11.111%
Some common errors that can be defined using these operators are:
P e r c e n t a g e E r r o r E x a m p l e
13
Period Actualt Forecastt et t At Ft = At - Ft
t+1 106 101 5 t+2 0 101 -101 t+3 102 101 1
PE APE = et/At = |PE| 4.72% 4.72%
Infinite Infinite 0.98% 0.98%
MAPE = Infinite!
If there is a zero (or a value close to zero) MAPE becomes infinite (or extremely large) Medians will typically allow you to calculate MAPE as infinite errors are ignored!
Period Actualt Forecastt et t At Ft = At - Ft
t+1 106 101 5 t+2 100 101 -1 t+3 102 101 1
PE APE = et/At = |PE| 4.72% 4.72% -1.00% 1.00% 0.98% 0.98%
MAPE = 2.23%
E x a m p l e o f S c a l e I n d e p e n d e n t E r r o r s
14
2014.01 2014.11 2014.21 2014.31 2014.410
200
400
600
800SKU A
Week
Sale
s
Sales (At)
Forecast (Ft)
MAE = 134.0 computers MAPE = 179.7%
MAE = 6,117,390 iron nails MAPE = 26.6%
2014.01 2014.11 2014.21 2014.31 2014.410
10000000
20000000
30000000
40000000
50000000
60000000SKU C
Week
Sale
s
Sales (At)
Forecast (Ft)
The MAE of the first series is dwarfed from the second’s, because the scale of the second series is in millions. MAE has also the issue of units.
R e l a t i v e E r r o r s
15
In order to compare against a benchmark method we define a series of relative error measures.
Relative errors (RE) • Expresses errors as a ratio to the errors of
another forecasting model, typically the naive.
• Free of units (scale independent) • Directly compares forecasting methods
Benchmarkt
ttt
FA
FARE
Geometric Mean Relative Absolute Error (GMRAE) • Absolute form of relative errors • Error < 1 method better than benchmark • Error > 1 method worse than benchmark • Error = 1 method as good as benchmark
nn
i Naivei
ii
FA
FAGMRAE
1
1
To summarise across time series we use again a GM of the GMRAE for each series
R e l a t i v e S u m m a r y E r r o r s
16
Another way to calculate GMRAE is to use mean of logarithms of RAEs.
An alternative error metric is AvRelMAE (Average Relative MAE). This is calculated as follows:
n
i Naivei
ii
nn
i Naivei
ii
FA
FA
nFA
FAGMRAE
1
1
1
log1
exp
n
i Benchmark
Forecast
nn
i Benchmark
Forecast
MAE
MAE
nMAE
MAElMAEAv
1
1
1
log1
expRe
The idea is the following: 1. Calculate MAE for each series (i = 1,…,n) 2. Calculate ratios with benchmark MAE 3. Average across different series with a GM (as we use ratios) • More robust to calculate than GMRAE, but less sensitive to individual errors • Same interpretation as GMRAE
E r r o r M e a s u r e R e m a r k s
17
• Can be calculated across any forecast horizon or aggregation of forecast horizons.
• Scale independent errors can be aggregated across time series as well.
• There is no best error measure! Depends on the data and question at hand.
• Error measures can help in assessing the bias, accuracy and robustness of a forecasting method, as well as its ranking against other methods.
• Different forecasting error measures may output different model rankings.
F o r e c a s t M o n i t o r i n g
18
Let us assume that we run two different forecasts and tracked their performance at every period. The blue forecast is more reactive and therefore better. But if this SKU was forecasted every month automatically without any human intervention, could we identify the problem?
2014.01 2014.11 2014.21 2014.31 2014.410
1000
2000
3000
4000
5000
6000SKU B - SES
Week
Sale
s
Sales
SES(0.05)
SES(0.25)
2014.01 2014.11 2014.21 2014.31 2014.410
1000
2000
3000
Week
MA
E
SES(0.05)
SES(0.25)
Very high errors should signal an alert for
manual intervention
The poor forecast produces very high
errors for prolonged periods
The alert occurs very fast
Out-of-sample t+1
to t+6 forecast
error
F o r e c a s t M o n i t o r i n g
19
We can monitor automated forecasts by tracking their errors. This can be implemented unstructured or in a control chart approach. The errors can be used raw or smoothed.
5 10 15 20 25 30 35 40 450
500
1000
1500
2000
1.96
-1.96
Period
MA
E
Periods with unexpectedly high
errors
O u t l i n e
1. Bias measures
2. Accuracy measures
3. Evaluation schemes
4. Prediction intervals
5. Parameter selection
6. Method selection
20
I n - a n d O u t - o f - S a m p l e
21
The historical observations can be split into two subsets: • In-sample: used for model building and parameterisation. • Out-of-sample: used for model evaluation. This is not used in building the model
and is not “seen” by the model. We use it to simulate true forecasts, instead of waiting for new unobserved values in order to evaluate the forecasting performance of alternative forecasting models.
10 20 30 40 50 60 70 80350
400
450
500
550
600
650
Month
Units
In-sample Out-of-sample
Use to build the model
Use to evaluate the model. Note that
forecast is multiple steps ahead
Forecast origin
S t a t i c O r i g i n E v a l u a t i o n
22
10 20 30 40 50 60 70 80350
400
450
500
550
600
650
Month
Uni
ts
The simplest evaluation produce a single forecast in the out-of-sample subset. Let the forecast horizon be 12 months and the holdout (out-of-sample) 24 months:
We have a forecast for t+1, t+2, ..., t+12. There is a single forecast origin, month 60. Let us assume that we are interested to forecast accurately t+12.... We have only 1 measurement: low confidence in our accuracy measurement This evaluation scheme is called static origin evaluation.
Forecast origin
Limitations:
1. One forecast per lead time needs long track history
2. Forecast susceptible to corruption “strange” origins or targets may affect quality of forecasts
3. Averaging over different lead times corrupts summary error statistic
S t a t i c O r i g i n E v a l u a t i o n
23
Origin
In-sample Out-of-sample (Holdout)
Use to fit the model Use to evaluate the model
Forecasts
R o l l i n g O r i g i n E v a l u a t i o n
Origin
In-sample Out-of-sample
A way to overcome these limitations is the Rolling Origin Evaluation scheme.
Origin
In-sample (increased) Out-of-sample
Origin
In-sample (increased) Out-of-sample
Roll origin
Roll origin
We roll the forecast origin until there is not enough out-of-sample to use for evaluation.
R o l l i n g O r i g i n E v a l u a t i o n
25
Rolling origin evaluation: 1. Provides more forecasts per origin 2. Overcomes limitations of fixed origin evaluation
• Provides more forecasting history per lead time for equal holdout sample • Does not need to average over lead time • Can overcome “strange” origins or targets
Forecast Lead Time
Holdout: 5 Holdout: 10 Fixed Origin Rolling Origin Fixed Origin Rolling Origin Number of forecasts
Number of forecasts
Number of forecasts
Number of forecasts
t+1 1 5 1 10 t+2 1 4 1 9 t+3 1 3 1 8 t+4 1 2 1 7 t+5 1 1 1 6
R o l l i n g O r i g i n E v a l u a t i o n
26
Using the previous example a rolling origin evaluation would look like:
10 20 30 40 50 60 70 80350
400
450
500
550
600
650
10 20 30 40 50 60 70 80350
400
450
500
550
600
650
Out-of-sample In-sample
Out-of-sample In-sample
Alp
ha
= 0
.05
A
lph
a =
0.2
0
Black dots are forecast origins
Visualising the rolling origin forecasts
makes it easier to appreciate the importance of
smooth forecasts that filter noise
E v a l u a t i o n S c h e m e s – S a m p l e S i z e
27
Having enough sample for model building and evaluation is crucial. Lack of sample severely restricts the selection of alternative models, as many require abundance of data. Eventually on really short time series we can only apply naive and simple average models. Models that require parameterisation perform better when there is ample data. The more available data the better the estimation of the parameters. Consider issues with setting the gamma parameter on seasonal exponential smoothing models. The same is true for evaluating forecasts. With large sample many errors can be calculated and therefore higher confidence on the estimated figure. Sample size also affects our understanding of time series components. How many observations are required to identify a seasonal time series?
E v a l u a t i o n S c h e m e s – S a m p l e S i z e
28
Lets try to find the t+1 forecast error of a method using absolute errors...
0 50 100 1500
5
10
15
20 MAE: 45.75
For 84 errors the MAE is 45.75, but let us assume that this is unknown. The 1st error is: 65.96 - Mean: 65.96, are we confident? The 2nd error is: 68.18 - Mean: 67.07, are we confident? The 3rd error is: 99.35 - Mean: 77.83, are we confident? ... The 20th error is: 112.68 - Mean: 57.30, are we confident? The 30th error is: 18.33 - Mean: 49.65, are we confident?
5 10 15 20 25 30 35 40 450
50
100
150
Observation
Absolu
te E
rror
(AE
)
AE
Cumulative MAE
Final MAE
E v a l u a t i o n S c h e m e R e m a r k s
29
• Forecasting accuracy of one model is meaningful only as a relative size to another model or benchmark. An error of 5% or 50% is non-informative without comparing it to benchmarks.
• Naive methods make simple and effective benchmarks.
• There are no set rules determining the size of the out-of-sample (or holdout), however:
o It should be at least as long as the forecast horizon
o Leave enough in-sample data for model building
o Provide enough forecasts of the forecast horizon of interest
o A simple heuristic is 80% in-sample 20% out-of-sample, but often is inappropriate.
O u t l i n e
1. Bias measures
2. Accuracy measures
3. Evaluation schemes
4. Prediction intervals
5. Parameter selection
6. Method selection
30
P r e d i c t i o n I n t e r v a l s
31
10 20 30 40 50 60 70 80350
400
450
500
550
600
650
700
Data
SES
10 20 30 40 50 60 70 80-150
-100
-50
0
50
100
150
et=A
t-F
t
0 5
x 10-3
-150
-100
-50
0
50
100
150Error PDF 0
0.0
1
0.0
2
0.0
3
0.0
4
0.0
5
Probability
-3
-2
-
2
3
68.2
%
95.5
%
99.7
%
We can use the error distribution of valid models to formulate prediction intervals of the forecasting methods
heahth szFPI 2/
P r e d i c t i o n I n t e r v a l s
32
0
0.01
0.02
0.03
0.04
0.05
Pro
babili
ty
-3-2 - 2 3
68.2%
95.5%
99.7%Starting from the sample standard deviation of the errors:
2
1
,
1
n
i
htihte een
sh
h
n
i
ihte MSEen
sh
2
1
, 01
For an unbiased model the mean of errors is zero.
The forecast is the expected value. Adding and subtracting to the forecast the quantity zα/2∙seh
gives us the prediction intervals. The standard score for normal distributions (valid models) is easy to calculate.
Prediction Interval
zα/2 - score
50% 0.67
90% 1.64
95% 1.96
99% 2.58
Using prediction intervals we can visualise the confidence in our forecasts.
P r e d i c t i o n I n t e r v a l s
33
80% and 90% prediction intervals
Observe that the prediction intervals vary with the series, method and horizon. We have more confidence to forecasts with tight PIs.
SKU C SKU A
UK Android Market Share US Air Passengers
P r e d i c t i o n I n t e r v a l s
34
hahteahth MSEzFszFPIh 2/2/
Calculating the PI formula can be complicated as it requires in-sample MSE of multiple forecast horizons (h):
This can be obtained by calculating the rolling origin in-sample MSE of the relevant forecast horizon. Alternatively it can be approximated using the following formula:
1 te MSEhsh
This is the square root of the horizon multiplied by the squared root of the 1-step ahead in-sample mean squared error. Note that there is substantial empirical evidence that this is a very rough approximation.
O u t l i n e
1. Bias measures
2. Accuracy measures
3. Evaluation schemes
4. Prediction intervals
5. Parameter selection
6. Method selection
35
P a r a m e t e r S e l e c t i o n
36
We have seen for the exponential smoothing methods that we can select the smoothing parameters based on the theoretical properties of the methods and the characteristics of the time series:
• Low parameters imply long weighted averages and therefore robustness against outliers and increased noise.
• Higher parameters imply shorter weighted averages, reacting faster to new information and handling better breaks in the series.
However it may be desirable to automate the parameter selection process. We can use in-sample error metrics for this purpose.
P a r a m e t e r S e l e c t i o n
37
2014.01 2014.11 2014.21 2014.31 2014.410
100
200
300
400
500
600
700
800SKU A - SES = 0.10
Week
Sale
s
Sales (At)
Forecast (Ft)
2014.01 2014.11 2014.21 2014.31 2014.410
100
200
300
400
500
600
700
800SKU A - SES = 0.40
Week
Sale
s
Sales (At)
Forecast (Ft)
MSE = 27,337 MSE = 35,580 < This parameter is
better, resulting in a better fit (lower error).
Based on this idea we can optimise model parameters (for any exponential smoothing method or generally).
P a r a m e t e r S e l e c t i o n : S E S e x a m p l e
38
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11200000
1220000
1240000
1260000
1280000
1300000
1320000
1340000
In-s
am
ple
MS
E
Alpha
MSE
Minimum
We can calculate the in-sample MSE for various values of alpha and identify the value that gives the lowest error.
α = 0.106
This result is very close to the one we chose manually (0.1). Obviously for more complex methods we can optimise several parameters simultaneously.
P a r a m e t e r S e l e c t i o n : S E S e x a m p l e
39
The same principle can be applied to choose the initialisation level value as well. Now we vary both initialisation and smoothing parameter.
Alpha = 0.3237
Level = 1697 2014.01 2014.11 2014.21 2014.31 2014.41
0
1000
2000
3000
4000
5000
6000SKU B - SES
Week
Sale
s
Sales (At)
Forecast (Ft)
As the number of parameters (including initialisation values) increases optimisation becomes more time consuming and requires more data.
P a r a m e t e r S e l e c t i o n
40
Will the optimal parameters be always the best? … In fact no.
There are many reasons for this:
• Optimisation is done in-sample. The correlation between in-sample error and out-of-sample has been shown to be low.
• Optimisation is (typically) done using t+1 MSE. In practice we forecast for longer horizons and our pragmatic cost functions are different.
• MSE by construction is very reactive to extreme errors, which may distort the error surface that we are search for the optimal values.
• Sample limitations as the number of parameters to optimise increases.
• Minimum error may not be the business objective. Companies may prefer consistency of forecasts across origins instead.
Optimisation is very useful for automation, however human experts should override identified parameters if they violate theory or objectives.
P a r a m e t e r S e l e c t i o n – T r y i t o u t
41 https://kourentzes.shinyapps.io/shinySES
Experiment with setting the alpha parameter: • Do you agree with the optimal value? • Does the in- and out-of-sample behave the same way?
P a r a m e t e r S e l e c t i o n R e m a r k s
42
• Optimisation for complex models is sensitive to the starting conditions (local optima). Different sets of initial values and parameters may give different results.
• Optimising on bias does not make sense, as positive and negative errors cancel out.
• Optimisation results may change depending on the error metric used. MSE is common, but other metrics may be useful.
Alpha
Initia
l le
vel
log(MSE)
0 0.2 0.4 0.6 0.8 11000
1500
2000
2500
3000
Minimum MAE
Minimum MSE
Minimum MAPE
13.5
14
14.5
15
15.5
Error Alpha Level MSE 0.3238 1697 MAE 0.3423 2046
MAPE 0.2574 2046
2014.01 2014.11 2014.21 2014.31 2014.410
1000
2000
3000
4000
5000
6000SKU B - SES
Week
Sale
s
MAE
MSE
MAPE
M e t h o d S e l e c t i o n
43
We can use similar principles to select the appropriate forecasting method.
There are two major approaches:
• Using information criteria (usable only within a family of methods, e.g. exponential smoothing).
• Using a validation holdout sample and rolling origin evaluation.
These compliment manual selection, based on understanding the characteristics of a time series.
M e t h o d S e l e c t i o n – W h y F i t E r r o r s D o N o t W o r k
44
For selecting between methods we cannot use the in-sample errors as we did we parameter selection. This is due to the fact that more complex models will tend to have lower fit errors, even if their forecasts perform worse.
2013.48 2014.01 2014.06 2014.11 2014.16 2014.21 2014.26 2014.31 2014.36 2014.410
1000
2000
3000
4000
5000
6000SKU B
Week
Sale
s
Sales
Level EXSM
Trend-Season EXSM
Level Exponential Smoothing In-sample MAEt+1: 707.56 Out-of-sample MAEt+1-t+6: 321.28
Trend-Seasonal Exponential Smoothing In-sample MAEt+1: 528.55 Out-of-sample MAEt+1-t+6: 1095.00
More complex models are more flexible and have higher potential to overfit compared to simpler models.
In-sample Holdout
<
<
M e t h o d S e l e c t i o n – I n f o r m a t i o n C r i t e r i a
45
One approach is to penalise the fit of more complex model for the number of parameters they have (= complexity). We define as information criteria:
)()ln( npQMSEIC
• ln(MSE) is the logarithm of the 1-step ahead in-sample MSE.
• p is the number of parameters (including initial values).
• Q(n) is the penalty function.
• n is the in-sample size.
nnQ /2)(
nnnQ /)ln()(
Akaike Information Criterion (AIC) Uses the following penalty function:
Bayesian Information Criterion (BIC) Uses the following penalty function: BIC penalises more larger models. For exponential smoothing no significant differences in performance.
I n f o r m a t i o n C r i t e r i a E x a m p l e
46
2013.48 2014.01 2014.06 2014.11 2014.16 2014.21 2014.26 2014.31 2014.36 2014.410
1000
2000
3000
4000
5000
6000SKU B
Week
Sale
s
Sales
Level EXSM
Trend-Season EXSM
Level Exponential Smoothing In-sample MSEt+1: 812,506.27 AIC: 13.7079 BIC: 13.7923 Out-of-sample MAEt+1-t+6: 321.28
Trend-Seasonal Exponential Smoothing In-sample MSEt+1: 413,584.09 AIC: 13.8326 BIC: 14.5926 Out-of-sample MAEt+1-t+6: 1095.00
In-sample Holdout
< < <
<
Both information criteria give us the correct answer.
M e t h o d S e l e c t i o n – H o l d o u t s a m p l e
47
For the other approach we simply measure the error at a subset of the series that is not used for fitting the models.
Level Exponential Smoothing In-sample MAEt+1: 707.56 Validation MAEt+1-t+6: 655.64 Out-of-sample MAEt+1-t+6: 321.28
Validation set: Rolling origin
forecasts
Out-of-sample: Rolling origin
forecasts
Trend-Seasonal Exponential Smoothing In-sample MAEt+1: 528.55 Validation MAEt+1-t+6: 1028.61 Out-of-sample MAEt+1-t+6: 1095.00
Again we get the correct answer!
48
M e t h o d S e l e c t i o n
Information Criteria • Pros
- Easy to calculate. • Cons
- Applicable only within a single family of methods. - Cannot always be aligned with the true cost function of the company.
Holdout Set • Pros
- Universal, can be used to select between any methods. - Can be fully aligned with the true cost function
• Cons - Lose sample for the validation set. - If sample size is not adequate for a reasonable rolling origin evaluation
then the results may not be reliable. - Computationally complex.
M e t h o d a n d P a r a m e t e r S e l e c t i o n R e m a r k s
49
• Forecast evaluation can help us to automate both method and parameter selection.
• There are several alternative options often they produce similar results
• Reliable fully automatic performance (use forecast monitoring)
• The key benefit of statistics is automation essential for modern business forecasting problems
• Experienced human experts can outperform automatic methods
• Understand structure of the time series
• Choose method and parameters appropriately
Thank you for your attention!
Questions?
Nikolaos Kourentzes Lancaster University Management School
Lancaster Centre for Forecasting - Lancaster, LA1 4YX email: [email protected]
Forecasting blog: http://nikolaos.kourentzes.com
www.forecasting-centre.com/
Full or partial reproduction of the slides is not permitted without author’s consent. Please contact [email protected] for more information.