the m4 competition in progress...o machine learning o combination o judgmental with contradicted...
TRANSCRIPT
![Page 1: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/1.jpg)
National Technical University of Athens- Forecasting & Strategy Unit38th International Symposium on Forecasting
Boulder Colorado, USA– June 2018
The M4 Competition in Progress
Evangelos SpiliotisSpyros MakridakisVassilios Assimakopoulos
National Technical University of AthensForecasting & Strategy UnitUniversity of NicosiaInstitute for the Future
Forecast. Compete. Excel.
![Page 2: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/2.jpg)
The quest for the holy grail
What do we forecast?
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The performance of the forecasting methods strongly depends on theo Domaino Frequencyo Lengtho Characteristicso ???of the time series being examined
as well as on various strategic decisions, such as forecasting horizon and computation time (complexity), and relevant information available
![Page 3: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/3.jpg)
The quest for the holy grail
What kind of method should we use?
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Too many types of methods and alternativeso Statisticalo Machine Learningo Combinationo Judgmentalwith contradicted results in the literature
Even if we knew which method is best for the examined application in general, lots of work would still be needed to properly select and parameterize our forecasting model, as well as to pre-process our data
![Page 4: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/4.jpg)
The quest for the holy grail
Is there a golden rule or some best practices?
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
“ignorance of research findings, bias, sophisticated statistical procedures, and the proliferation of big data, have led forecasters to violate the Golden Rule. As a result, …, forecasting practice in many fields has failed to improve over the past half-century”.
Golden rule of forecasting: Be conservative (Armstrong, et.al., 2015)
“identify the main determinants of forecasting accuracy considering seven time series features and the forecasting horizon”
‘Horses for Courses’ in demand forecasting (Petropoulos, et.al., 2014)
“investigate which individual model selection is beneficial and when this approach should be preferred to aggregate selection or combination”
Simple versus complex selection rules for forecasting many time series (Fildes & Petropoulos, 2015)
![Page 5: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/5.jpg)
Evaluating Forecasting Performance
We need benchmarks....
New methods and forecasting approaches must perform well on well-known,diverse and representative data sets
This is exactly the scope of forecasting competitions: Learn how to improvethe forecasting accuracy, and how such learning can be applied to advancethe theory and practice of forecasting
✓ Encourage researchers and practitioners develop new and more accurateforecasting methods
✓ Compare popular forecasting methods with new alternatives✓ Document state-of-the-art methods and forecasting techniques used in academia
and industry✓ Identify best practices✓ Set new research questions and try to provide proper answers
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
![Page 6: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/6.jpg)
Evaluating Forecasting Performance
Competitions will always be helpful....
➢ There will always be features of time series forecasting not previouslystudied under competition conditions
➢ There will always be new methods to be evaluated and validated➢ As new performance metrics and statistical test come into light, the results
of previous competitions will be always put under question➢ Technological advances affect the way forecasting is performed and enable
more advanced, complex and computational intensive approaches,previously inapplicable
➢ Exploding data influence forecasting and its applications (more data tolearn from, unstructured data sources, abnormal time series, newforecasting needs)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
![Page 7: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/7.jpg)
The history of time series forecasting competitions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Makridakis and Hibon (1979)• No participants• 111 time series (yearly, quarterly &
monthly)• 22 methods
Major findings• Simple methods do as well or better than sophisticated ones• Combining forecasts may improve forecasting accuracy• Special events have a negative impact on forecasting performance
Establishing the idea of forecasting competitions
![Page 8: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/8.jpg)
The history of time series forecasting competitions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Establishing the idea of forecasting competitions
G. Jenkins
G.J.A. Stern
Automatic forecasting may be useless and less accurate than humans, while combining forecasts quite risky
No-one wants that accurate forecasts nor has enough data to estimate them
M. B. Priestley
A model (simple data generation process) can perfectly describe and extrapolate your time series if identified and applied correctly
![Page 9: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/9.jpg)
The history of time series forecasting competitions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Makridakis et. al (1982)• Seven participants• 1001 time series (yearly, quarterly & monthly)• 15 methods (plus 9 variations)• Not real-time
M1: The first forecasting competition
Major findings• Statistically sophisticated or complex methods do not necessarily provide more accurate
forecasts than simpler ones.• The relative ranking of the performance of the various methods varies according to the
accuracy measure being used.• The accuracy when various methods are combined outperforms, on average, the
individual methods being combined and does very well in comparison to other methods.• The accuracy of the various methods depends on the length of the forecasting horizon
involved.
What's new?
•Real participants
•Many accuracy measures
![Page 10: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/10.jpg)
The history of time series forecasting competitions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Makridakis and Hibon (1993)• 29 time series• 16 methods (human forecasters,
automatic methods and combinations)
• Real time
Major findings• In most cases, forecasters failed to improve statistical forecasts based on
their judgment• Simple methods perform better in most of the cases, with the results being
in agreement with previous studies
What's new?
•Combine statistical methods with judgment
•Ask questions to the companies involved
•Learn from previous errors and revise next
forecasts accordingly
M2: Incorporating judgment
![Page 11: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/11.jpg)
The history of time series forecasting competitions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Makridakis and Hibon (2000)• 3003 time series• 24 methods • Not real time
Major findings• The results of the previous studies and competitions were largely
confirmed.• New methods, such as the Theta of Assimakopoulos & Nikolopoulos (2000),
and FSSs, such as the ForecastPro, have proven their forecasting capabilities• ANNs relatively inaccurate
What's new?
•More methods (NNs and FSSs)
•More series
M3: The forecasting benchmark
“The M3 series have become the de facto standard test base in forecasting research. When any new
univariate forecasting method is proposed, if it does not perform well on the M3 data compared to the results
on other published algorithms, it is unlikely to receive any further attention or adoption.”
(Kang, Hyndman & Smith-Miles, 2017)
![Page 12: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/12.jpg)
The history of time series forecasting competitions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Modern forecasting competitions
Neural network competitions (NN3 -2006)
Crone, Hibon & Nikolopoulos (2011)111 monthly M3 series & 59 submissions ✓ None CI method outperformed the
original M3 contestants✓ NNs may be inadequate for time
series forecasting, especially for short ones
✓ No “best-practices” identified for utilizing CI methods
Kaggle Competitions
Tourism Forecasting CompetitionAthanasopoulos & Hyndman (2010)
Web traffic (Wikipedia) competitionAnava & Kuznetsov (2017)
✓ feedback significantly improves forecasting accuracy by proving motivation and fruitful feedback
✓ fast results and conclusions
![Page 13: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/13.jpg)
Status que and next steps
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
✓ Forecasting and time series analysis are two different things
✓ Models that produce more accurate forecasts should be preferred from those of better statistical properties
✓ Simple models work – especially for short series
✓ Out-of-sample and in-sample accuracy may significantly differ (Avoid over-fitting)
✓ Automatic forecasting algorithms work rather well – especially for long time series
✓ Combining methods help us deal with uncertainty
So, what did we learn?
![Page 14: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/14.jpg)
Status que and next steps
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
✓ Which are the “best practices” nowadays?
✓ How do advances in technology and algorithms have affected forecasting?
✓ Are there any new methods that could really make a difference?
✓ How about prediction intervals?
✓ Similarities and differences between the various forecasting methods, including ML ones?
✓ Are the data of the forecasting competitions representative? Do other larger datasets support previous findings?
What would be also useful to learn (or verify) though M4?
![Page 15: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/15.jpg)
Today
Nov Dec 2018 Feb Mar Apr May Jun Jul Aug Sep Oct
Competition Announced
Nov 1, 2017Competition
StartsJan 1, 2018
Competition Ends May 31, 2018
Preliminary Results Jun 18, 2018Final Results and
WinnersSep 28, 2018
The M4 Competition
The dates
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
• There was also a deadline extension (1 week) to encourage more participations• Late submissions are not eligible for any prize
![Page 16: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/16.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The dataset (1/2)
Frequency Micro Industry Macro Finance Demographic Other Total
Yearly 6,538 3,716 3,903 6,519 1,088 1,236 23,000
Quarterly 6,020 4,637 5,315 5,305 1,858 865 24,000
Monthly 10,975 10,017 10,016 10,987 5,728 277 48,000
Weekly 112 6 41 164 24 12 359
Daily 1,476 422 127 1,559 10 633 4,227
Hourly - - - - - 414 414
Total 5,121 18,798 19,402 24,534 8,708 3,437 100,000
✓ The largest forecasting competition involving 100,000 business time series to provide conclusions of statistical significance
✓ High frequency data, including Weekly, Daily and Hourly series✓ Diverse time series collected from 23 reliable data sources & classified in 6 domains
*Data available at https://www.m4.unic.ac.cy/the-dataset/ or through the M4comp2018 R package
![Page 17: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/17.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
2D visualization of time series into the Feature Space of Kang et al., 2017 Frequency, Seasonality, Trend, Randomness, ACF1 & Box-Cox λ
Yearly
Monthly
Hourly
Quarterly
The dataset (2/2)
![Page 18: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/18.jpg)
The M4 Competition
✓ Produce point forecasts for the whole dataset – mandatory. Forecasting horizons as follows:• 6 for yearly• 8 for quarterly (2 years)• 18 for monthly (1.5 years)• 13 for weekly (3 months)• 14 for daily (2 weeks)• 48 for hourly data (2 days)
✓ Estimate prediction intervals (95% confidence) for the whole dataset –optional✓ Submit before deadline through the M4 site using a pre-defined file formal ✓ Submit the code used to generate the forecasts, as well as a detailed method
description for reasons of reproducibility - optional but highly recommended. The supplementary material must be uploaded at M4 GitHub* repo not later than 10th of June, 2018
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The rules
* https://github.com/M4Competition
![Page 19: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/19.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Evaluation: Point Forecasts
Overall Weighted Average (OWA) of two accuracy measures:• Mean Absolute Scaled Error (MASE)• symmetric Mean Absolute Percentage Error (sMAPE)
𝑀𝐴𝑆𝐸 =1
ℎ
σ𝑡=1ℎ 𝑌𝑡 − 𝑌𝑡
1𝑛 − 𝑚
σ𝑡=𝑚+1𝑛 𝑌𝑡 − 𝑌𝑡−𝑚
,where 𝑌𝑡 is the post sample value of the time series at point t, 𝑌𝑡 the estimated forecast, h the forecasting horizon and m the frequency of the data
𝑠𝑀𝐴𝑃𝐸 =1
ℎ
𝑡=1
ℎ2 𝑌𝑡 − 𝑌𝑡
𝑌𝑡 + 𝑌𝑡
➢ Estimate MASE and sMAPE per series by averaging the error computed perforecasting horizon
➢ Divide all Errors by that of Naïve 2 (Relative MASE and Relative sMAPE)➢ Compute the OWA by averaging the Relative MASE and the Relative sMAPE
![Page 20: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/20.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Evaluation: Prediction Intervals
Mean Scaled Interval Score (MSIS)
➢ A penalty is calculated at the points where the real values are outside the specified bounds
➢ The width of the prediction interval adds up to the penalty, if any, to get the IS➢ The IS estimated at the individual points are averaged to get the MIS value➢ MIS is scaled by dividing its value with the mean absolute seasonal difference of
the series➢ MSIS of all series is averaged to evaluate the total performance of the method
𝐌𝐒𝐈𝐒 =1
ℎ
σ𝑡=1ℎ ቅ𝑈𝑡 − 𝐿𝑡 +
2𝑎 (𝐿𝑡 − 𝑌𝑡)1{𝑌𝑡 < 𝐿𝑡} +
2𝑎 (𝑌𝑡 − 𝑈𝑡)1{𝑌𝑡 > 𝑈𝑡
1𝑛 − 𝑚
σ𝑡=𝑚+1𝑛 𝑌𝑡 − 𝑌𝑡−𝑚
,where L and U are the Lower and Upper bounds of the prediction intervals, 𝑌 are the future observations
of the series, 𝑎 is the significance level (0,05) and 1 is the indicator function (being 1 if Y is within the
postulated interval and 0 otherwise).
![Page 21: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/21.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The benchmarks
1. Naïve 1 (S) – used to compare all methods (Prediction Intervals)
2. Seasonal Naïve (S)
3. Naïve 2 (S) - reference for estimating OWA
4. Simple Exponential Smoothing (S)
5. Holt’s Exponential Smoothing (S)
6. Dampen Exponential Smoothing (S)
7. Combination of 4, 5 and 5 (C) – used to compare all methods (Point Forecasts)*
8. Theta (S)
9. MLP (ML)
10.RNN (ML)
10 benchmarks were used to facilitate comparisons between the participating methods: 7 classic Statistical methods, 1 Combination and 2 simplified Machine Learning ones
*Accurate, robust, simple & easy to understand
![Page 22: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/22.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The prizes
Prize Description Amount
1st Prize Best performing method according to OWA 9,000 €
2nd Prize Second-best performing method according to OWA 4,000 €
3rd Prize Third-best performing method according to OWA 2,000 €
Prediction Intervals Prize Best performing method according to MSIS 5,000 €
The UBER Student Prize Best performing method according to OWA 5,000 €
The Amazon Prize The best reproducible forecasting method according to OWA 2,000 €
Six prizes, standing in total at 27,000€
Sponsorships
![Page 23: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/23.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The participants (1/2)
✓ 50 submissions (20 with PIs)✓ 17 countries
0
2
4
6
8
10
12
14
![Page 24: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/24.jpg)
The M4 Competition
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
The participants (2/2)
✓ The majority utilized statistical methods or combinations, both of Statistical and ML models, and only a few pure ML ones*.
✓ More than half of the participants were related to the academia and the rest were either companies or individuals
0
5
10
15
20
25
30
35
University Company-Organization Individual
# of Participants per Affiliation Type
0
5
10
15
20
25
30
35
Combination Statistical Machine Learning Other
# of Participants per Method Type
*These are rough classifications – more work is needed to verify them
![Page 25: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/25.jpg)
Evaluation of submissions – Point Forecasts
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Rankings (1/5)
Rank Team Affiliation Method sMAPE MASE OWADiff fromComb (%)
1 Smyl Uber Technologies Hybrid 11.37 1.54 0.821 -8.52
2 Montero-Manso et al.University of A Coruña & Monash
UniversityComb (S & ML) 11.72 1.55 0.838 -6.65
3 Pawlikowski et al. ProLogistica Soft Comb (S) 11.84 1.55 0.841 -6.25
4 Jaganathan & Prakash Individual Comb (S & ML) 11.70 1.57 0.842 -6.17
5 Fiorucci, J. A. & LouzadaUniversity of Brasilia & University of
São PauloComb (S) 11.84 1.55 0.843 -6.10
6 Petropoulos & SvetunkovUniversity of Bath & Lancaster
UniversityComb (S) 11.89 1.57 0.848 -5.55
7 Shaub Harvard Extension School Comb (S) 12.02 1.60 0.860 -4.13
8 Legaki & KoutsouriNational Technical University of
AthensStatistical 11.99 1.60 0.861 -4.11
9 Doornik et al. University of Oxford Comb (S) 11.92 1.63 0.865 -3.62
10 Pedregal et al. University of Castilla-La Mancha Comb (S) 12.11 1.61 0.869 -3.19
11 4Theta (Benchmark) - Statistical 12.15 1.63 0.874 -2.65
12 RoubinchteinWashington State Employment
Security DepartmentComb (S) 12.18 1.63 0.876 -2.38
13 Ibrahim Georgia Institute of Technology Statistical 12.20 1.64 0.880 -1.97
14 Tartu M4 seminar University of Tartu Comb (S & ML) 12.50 1.63 0.888 -1.09
15 Waheeb Universiti Tun Hussein Onn Malaysia Comb (S) 12.15 1.71 0.894 -0.40
![Page 26: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/26.jpg)
Evaluation of submissions – Point Forecasts
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Rankings (2/5)
Rank Team Affiliation Method sMAPE MASE OWADiff fromComb (%)
16 Darin & StellwagenBusiness Forecast Systems
(Forecast Pro)Statistical 12.28 1.69 0.895 0.25
17 Dantas & Cyrino OliveiraPontifical Catholic University of Rio
de JaneiroComb (S) 12.55 1.66 0.896 0.19
18 Theta (Benchmark) - Statistical 12.31 1.70 0.897 0.03
19 Comb (Benchmark) - Comb (S) 12.55 1.66 0.898 0.00
20 Nikzad, A. Scarsin (i2e) Comb (S) 12.37 1.72 0.907 -1.01
21 Damped (Benchmark) - Statistical 12.66 1.68 0.907 -1.02
22 Segura-Heras et al.Universidad Miguel Hernández &
Universitat de ValenciaComb (S) 12.51 1.72 0.910 -1.38
23 Trotta IndividualMachine Learning
12.89 1.68 0.915 -1.94
24 Chen & Francis Fordham University Comb (S) 12.55 1.73 0.915 -1.96
25 Svetunkov et al.Lancaster University &
University of NewcastleComb (S) 12.46 1.74 0.916 -2.01
26 Talagala et al. Monash University Statistical 12.90 1.69 0.917 -2.12
27 Sui & Rengifo Fordham University Comb (S) 12.85 1.74 0.930 -3.56
28 Kharaghani Individual Comb (S) 13.06 1.72 0.930 -3.63
29 Smart Forecast Smart Cube Comb (S) 13.21 1.79 0.955 -6.34
30 Wainwright et al. Oracle Corporation (Crystal Ball) Statistical 13.34 1.80 0.962 -7.15
![Page 27: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/27.jpg)
Evaluation of submissions – Point Forecasts
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Rankings (3/5)Top 6 performing methods
Smyl, S.• Hybrid model mixing Exp. Smoothing with LSTM – estimated concurrently• Hierarchical modeling – parameters estimated using information both from the
whole dataset and individual series | Combinations are also consideredMontero-Manso, P., Talagala, T., Hyndman, R. J. & Athanasopoulos, G.• Weighted average of ARIMA, ETS , tbats, Theta, naïve, seasonal naïve, NN and LSTM• Weights estimated through gradient boosting tree (xgboost) using holdout testsPawlikowski, M., Chorowska, A. & Yanchuk, O.• Weighted average of several statistical methodsusing holdout tests• Pool defined based on time series characteristics / manual selectionJaganathan, S. & Prakash, P.• Combination of statistical methods as described in Armstrong, J. S. (2001)Fiorucci, J. A. & Louzada, F.• Weighted average of ARIMA, ETS & Theta• Weights estimated using cross-validationPetropoulos, F. & Svetunkov, I.• Median of ETS, CES, ARIMA & Theta
![Page 28: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/28.jpg)
Evaluation of submissions – Point Forecasts
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Rankings (4/5)
Spearman’s correlation coefficient of the rankings
Correlation sMAPE MASE OWA
sMAPE - - -
MASE 0.88 - -
OWA 0.94 0.98 -
The final ranks, both according to MASE and sMAPE, are highly correlated with OWA, meaning that both can be used as proxies to measure the
relative performance of the individual methods
![Page 29: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/29.jpg)
Evaluation of submissions – Point Forecasts
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Rankings (5/5)Multiple Comparisons with the Best (MCB) OWA Rank Participant
#2 Montero-Manso
#5 Fiorucci
#3 Pawlikowski
#4 Jaganathan
#1 Smyl
#6 Petropoulos
✓ The forecasts of the first six methods did not statistically differ✓ Apart from these methods, the improvements of the rest over the benchmarks were minor
Naive2
Theta
Montero-Manso.
Fiorucci
Pawlikowski
Jaganathan.
Smyl.
Naive
sNaiveSESHoltDamped
ComMLP
RNNPetropoulos
![Page 30: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/30.jpg)
Evaluation of submissions – Point Forecasts
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
What about Complexity - Future WorkDoes sub-optimality matter? (Nikolopoulos & Petropoulos, 2017)
Forecasting performance (sMAPE) versus computational complexity (Makridakis et al., 2018)
![Page 31: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/31.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Comparing different types of methods
Type of Method Yearly Quarterly Monthly Weekly Daily Hourly Total
Statistical 0.93 0.93 0.95 0.97 1.00 1.00 0.97
Machine Learning 1.27 1.16 1.20 1.00 1.93 0.92 1.48
Combination 0.87 0.90 0.92 0.90 1.02 0.65 0.91
Other 0.99 1.92 1.77 8.88 9.16 2.79 1.80
Type of Method Macro Micro Demographic Industry Finance Other Total
Statistical 0.95 0.98 0.95 0.99 0.97 0.97 0.98
Machine Learning 1.20 1.16 1.44 1.43 1.41 1.56 1.48
Combination 0.90 0.89 0.90 0.93 0.92 0.91 0.91
Other 1.64 1.81 1.93 1.55 2.04 1.76 1.80
Median performance per Frequency & Domain
✓ In general, Combinations produced more accurate forecasts that the rest of the methods, regardless the frequency and the domain of the data
✓ Out of the 17 methods that did better than the benchmarks, 12 wereComb, 4 were Statistical and 1 was Hybrid
✓ Only 1 pure ML method performed better than Naive2
![Page 32: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/32.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Comparing different types of methods
Top 3 per Frequency & Domain
Frequency 1st 2nd 3rd
Yearly Smyl, S. (#1) Legaki, N. Z. (#8) Montero-Manso, P. (#2)
Quarterly Montero-Manso, P. (#2) Smyl, S. (#1) Petropoulos, F. (#6)
Monthly Smyl, S. (#1) Jaganathan, S. (#4) Montero-Manso, P. (#2)
Weekly Darin, S. (#16) Petropoulos, F. (#6) Pawlikowski, M. (#3)
Daily Pawlikowski, M. (#3) Taru M4Seminar (#14) Fiorucci, J. A. (#5)
Hourly Doornik, J. (#9) Smyl, S. (#1) Pawlikowski, M. (#3)
Domain 1st 2nd 3rd
Macro Smyl, S. (#1) Jaganathan, S. (#4) Montero-Manso, P. (#2)
Micro Smyl, S. (#1) Legaki, N. Z. (#8) Pawlikowski, M. (#3)
Demographic Montero-Manso, P. (#2) Smyl, S. (#1) Pawlikowski, M. (#3)
Industry Montero-Manso, P. (#2) Smyl, S. (#1) Jaganathan, S. (#4)
Finance Smyl, S. (#1) Montero-Manso, P. (#2) Fiorucci, J. A. (#5)
Other Smyl, S. (#1) Pawlikowski, M. (#3) Montero-Manso, P. (#2)
Legend: - Statistical - Combination
➢ Although the best performing methods for the whole dataset were also very accurate for the individual subsets, in many cases they were outperformed by other methods with a much lower rank – No method to fit them all
Spearman’s correlation coefficient of the rankings
![Page 33: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/33.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Impact of forecasting horizon
Average sMAPE across 60 methods (benchmarks & submissions)
FrequencyDeterioration per period (%)
Yearly 20
Quarterly 13Monthly 6
Weekly 7Daily 14Hourly 1
✓ The length of the forecasting horizon has a great impact on forecasting accuracy
✓ Only for hourly data did ML methods become competitive
![Page 34: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/34.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Impact of time series characteristics*
Average impact in forecasting accuracy (coefficient) per t-s characteristick methods/type x 100,000 observations
Machine Learning: • More data, better forecasts• Not robust for noisy and linear series• Good for seasonal series
Type of Method
Randomness Trend Seasonality Linearity Stability Length
Machine Learning 0.20 -0.10 -0.04 0.14 -0.05 -0.08
Statistical 0.18 -0.08 -0.02 0.09 -0.04 0.15
Combination 0.17 -0.09 -0.02 0.10 -0.03 -0.02
Total 0.18 -0.08 -0.02 0.10 -0.04 0.06
Combinations:• Robust for noisy data • Bad in capturing seasonality
Statistical:• Bad for trended & seasonal
series• Good for modeling linear
patterns• The less the data the better
(use only the most recent ones)
𝑠𝑀𝐴𝑃𝐸 = 𝑎 ∗ 𝑅𝑎𝑛𝑑𝑜𝑚𝑛𝑒𝑠𝑠 + 𝑏 ∗ 𝑇𝑟𝑒𝑛𝑑 +⋯+ 𝑓 ∗ 𝐿𝑒𝑛𝑔𝑡ℎ*
![Page 35: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/35.jpg)
Evaluation of submissions – Prediction Intervals
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Rankings
Rank Team Affiliation Method MSIS CoverageDiff fromNaive (%)
1 Smyl Uber Technologies Hybrid 12.23 94.78% 49.2%
2 Montero-Manso et al.University of A Coruña & Monash
UniversityComb (S & ML) 14.33 95.96% 40.4%
3 Doornik et al. University of Oxford Comb (S) 15.18 90.70% 36.9%
4 ETS (benchmark) - Statistical 15.68 91.27% 34.8%
5 Fiorucci & LouzadaUniversity of Brasilia & University of
São PauloComb (S) 15.69 88.52% 34.8%
6 Petropoulos & SvetunkovUniversity of Bath & Lancaster
UniversityComb (S) 15.98 87.81% 33.6%
7 RoubinchteinWashington State Employment
Security DepartmentComb (S) 16.50 88.93% 31.4%
8 Talagala et al. Monash University Statistical 18.43 86.48% 23.4%
9 ARIMA (benchmark) - Statistical 18.68 85.80% 22.3%
10 Ibrahim Georgia Institute of Technology Statistical 20.20 85.62% 16.0%
11 Iqbal et al. Wells Fargo Securities Statistical 22.00 86.41% 8.5%
12 ReillyAutomatic Forecasting Systems, Inc.
(AutoBox)Statistical 22.37 82.87% 7.0%
13 Wainwright et al. Oracle Corporation (Crystal Ball) Statistical 22.67 82.99% 5.7%
14 Segura-Heras et al.Universidad Miguel Hernández &
Universitat de ValenciaComb (S) 22.72 90.10% 5.6%
15 Naïve (benchmark) - Statistical 24.05 86.40% 0.0%
![Page 36: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/36.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Evaluation of submissions – Prediction Intervals
Median performance per Frequency
✓ Apart from the first two methods, the rest underestimated reality considerably
✓ On average, the coverage of the methods was only 86.4% (target is 95%)✓ Estimating uncertainty was more difficult for low frequency data, especially
for the yearly series – limited sample & longer forecasting horizon
![Page 37: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/37.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Evaluation of submissions – Prediction Intervals
Median performance per Domain
✓ Demographic and Industry data were easier to predict – slower changes and less fluctuations
✓ Micro & Finance data are characterized by the higher levels of uncertainty –challenges for business forecasting
![Page 38: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/38.jpg)
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
Impact of forecasting horizon
Average Coverage across 23 methods (benchmarks & submissions)
✓ The length of the forecasting horizon has a great impact on estimating the PIs correctly, especially for yearly, quarterly & monthly data
Co
vera
ge
![Page 39: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/39.jpg)
Conclusions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
✓ Hybrid methods, utilizing basic principles of statistical models and ML components, have a great potential
✓ Combining forecasts of different methods significantly improves forecasting accuracy
✓ Pure ML methods are inadequate for time series forecasting
✓ Prediction intervals underestimate reality considerably
Accuracy of individual statistical or ML methods is low and hybrid approaches and
combination of methods is the way forward to improve forecasting accuracy and make forecasting more valuable
Five major findings
![Page 40: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/40.jpg)
Conclusions
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
✓ Complex methods did better than simple ones but the improvements were notexceptional. Given the computational resources used, one can question if theseare also practical.
✓ Forecasting horizon has a negative effect on forecasting accuracy – both forpoint forecasts and PIs
✓ When using large samples, the variations reported between different errormeasures were insignificant
✓ Different methods should be used per series according to their characteristics,as well as their frequency and domain. Yet, learning from the masses seemsmandatory.
✓ The majority of the forecasters exploited traditional forecasting approaches andmostly experimented on how to combine them
…and some minor, yet important ones
![Page 41: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/41.jpg)
Next Steps
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
➢ Understand why hybrid methods work better in order to advance them further and improve their forecasting performance
➢ Figure out how combinations should be performed and where the emphasis should be given – pool or weights?
➢ Study the elements of the top performing methods in terms of PIs and lean how to exploit and advance their features to better capture uncertainty
➢ Accept the drawbacks of ML methods and reveal ways to utilize their advantages in time series forecasting
➢ Experiment and discover new, more accurate forecasting approaches
![Page 42: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/42.jpg)
Thank you for your attentionQuestions?
If you would like to learn more about M4 visit
https://www.m4.unic.ac.cy/
or contact me at
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018
![Page 43: The M4 Competition in Progress...o Machine Learning o Combination o Judgmental with contradicted results in the literature Even if we knew which method is best for the examined application](https://reader035.vdocuments.net/reader035/viewer/2022081522/5f0298c97e708231d4050de5/html5/thumbnails/43.jpg)
References• Armstrong, J. S., Green, K. C. & Graefe, A. (2015). Golden rule of forecasting: Be conservative. Journal of Business Research, 68(8), 1717-1731• Armstrong, J. S. (2001). Combining forecasts. Retrieved from https://repository.upenn.edu/marketing_papers/34• Athanasopoulos, G., Hyndman, R.J., Song, H. & Wu, D.C. (2011). The tourism forecasting competition. International Journal of Forecasting, 27(3),822-
844,• Athanasopoulos, G. & Hyndman, R.J. (2011). The value of feedback in forecasting competitions. International Journal of Forecasting, 27(3), 845-849• Crone, S. F., Hibon, M. & Nikolopoulos, K. (2011). Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on
time series prediction. International Journal of Forecasting, 27(3), 635 - 660• Fildes, R. & Petropoulos, F. (2015). Simple versus complex selection rules for forecasting many time series. Journal of Business Research, 68(8), 1692-
1701• Gneiting, T. & Raftery A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102 (477),
359-378• Hyndman, R. J., Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting 22(4), 679-688• Kang, Y., Hyndman, R.J. & Smith-Miles, K. (2017). Visualising forecasting algorithm performance using time series instance spaces. International
Journal of Forecasting, 33(2), 345-358• Makridakis, S., Hibon, M., & Moser, C. (1979). Accuracy of Forecasting: An Empirical Investigation. Journal of the Royal Statistical Society. Series A
(General), 142(2), 97-145• Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M. et al. (1982). The accuracy of extrapolation (time series) methods: results of a
forecasting competition. Journal of Forecasting, 1, 111-153• Makridakis, S., Chatfield, C., Hibon, M., Lawrence, M., Mills, T. et al. (1993). The M2-competition: A real-time judgmentally based forecasting study.
International Journal of Forecasting, 9(1), 5-22• Makridakis, S. & Hibon, M. (2000). The M3-Competition: results, conclusions and implications. International Journal of Forecasting, 16(4), 451-476• Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018). Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLOS
ONE, 13(3), 1-26• Montero-Manso, P., Netto, C. & Talagala, T. (2018). M4comp2018: Data from the M4-Competition, R package version: 0.1.0• Newbold, P., & Granger, C. (1974). Experience with Forecasting Univariate Time Series and the Combination of Forecasts. Journal of the Royal
Statistical Society. Series A (General), 137(2), 131-165• Nikolopoulos, K. & Petropoulos, F. (2017). Forecasting for big data: Does suboptimality matter?,• Computers & Operations Research (in press)• Petropoulos, F., Makridakis, S., Assimakopoulos, V. & Nikolopoulos, K. (2014). ‘Horses for Courses’ in demand forecasting. European Journal of
Operational Research, 237(1), 152-163• Spiliotis, E., Patikos, A., Assimakopoulos V. & Kouloumos, A. (2017). Data as a service: Providing new datasets to the forecasting community for time
series analysis. 37th International Symposium on Forecasting, Cairns, Australia
38th International Symposium on ForecastingBoulder Colorado, USA– June 2018