https://github.com/Nixtla/statsforecast/tree/main/experiments/m3 Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Education + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} Nixtla / statsforecast Public * Notifications * Fork 107 * Star 1.8k * Code * Issues 41 * Pull requests 9 * Discussions * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create statsforecast/experiments/m3/ Go to file statsforecast/experiments/m3/ Latest commit @FedericoGarza FedericoGarza fix: typos ... f563ff0 Dec 1, 2022 fix: typos f563ff0 Git stats * History Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time . . src fix: rename ensemble name Dec 1, 2022 README.md fix: typos Dec 1, 2022 environment.yml fix: add mising model experiment Nov 30, 2022 [ ] Statistical vs Deep Learning forecasting methods Abstract Background Experiment Results Accuracy: Comparison with SOTA benchmarks Computational Complexity: Comparison with SOTA benchmarks Tine Engineering Relative Computational Complexity Summary: Comparison with SOTA benchmarks Conclusions Unsolicited Advice Reproducibility References README.md Statistical vs Deep Learning forecasting methods Comparison of several Deep Learning models and ensembles to classical statistical univariate models for the 3,003 series of the M3 competition. Abstract We present a reproducible experiment that shows that: 1. A simple statistical ensemble outperforms most individual deep-learning models. 2. A simple statistical ensemble is 25,000 faster and only slightly less accurate than an ensemble of deep learning models. In other words, deep-learning ensembles outperform statistical ensembles just by 0.36 points in SMAPE. However, the DL ensemble takes more than 14 days to run and costs around USD 11,000, while the statistical ensemble takes 6 minutes to run and costs $0.5c. Background In Statistical, machine learning and deep learning forecasting methods: Comparisons and ways forward, Makridakis and other prominent participants of the forecasting science community compare several Deep Learning and Statistical models for all 3,003 series of the M3 competition. The purpose of [the] paper is to test empirically the value currently added by Deep Learning (DL) approaches in time series forecasting by comparing the accuracy of some state-of-theart DL methods with that of popular Machine Learning (ML) and statistical ones. The authors conclude that: We find that combinations of DL models perform better than most standard models, both statistical and ML, especially for the case of monthly series and long-term forecasts. We don't think that's the full picture. By including a statistical ensemble, we show that these claims are not completely warranted and that one should rather conclude that, for this setting at least, Deep Learning is rather unattractive. Experiment Building upon the original design, we further included A simple combination of univariate models in the comparison. This ensemble is formed by averaging four statistical models: AutoARIMA, ETS, CES and DynamicOptimizedTheta. This combination won sixth place and was the simplest ensemble among the top 10 performers in the M4 competition. For the experiment, we use StatsForecast's implementation of Arima, ETS, CES and DOT. For the DL models and ensembles, we reproduce the reported metrics and results from the mentioned paper. Results Accuracy: Comparison with SOTA benchmarks Accuracy is reported in Symmetric mean absolute percentage error ( SMAPE) The M3 dataset has four groups of time series. In the next graph, you can see the performance of all models and ensembles. image In the next table, you can see the performance of the models across all four groups and the average performance for all groups. image Computational Complexity: Comparison with SOTA benchmarks Computational complexity is reported in time, lines of code and, Relative Computational Complexity (RCC). Tine Using StatsForecast and a 96 cores EC2 instance (c5d.24xlarge) it takes 5.6 mins to train, forecast and ensemble the four models for the 3,003 series of M3. Time (mins) Yearly Quarterly Monthly Other StatsForecast ensemble 1.10 1.32 2.38 1.08 The authors of the paper only report computational time for the monthly group, which amounts to 20,680 mins or 14.3 days. In comparison, the StatsForecast ensemble only takes 2.38 minutes to run for that group. Furthermore, the authors don't include times for Hyperparameter optimization. For this comparison, we will take the reported 14 days of computational time. However, it must be noted that the true computational time must be significantly higher for all groups. Engineering Furthermore, running all statistical models, including data downloading, data wrangling, training, forecasting and ensembling the models, can be achieved in less than 150 lines of Python code. In comparison, this repo has more than 1,000 lines of code and needs Python, R, Mongo and Shell code. Relative Computational Complexity The mentioned paper uses Relative Computational Complexity (RCC) for comparing the models. To calculate the RCC of StatsForecast, we followed the same methodology and measured the time it takes to generate naive forecasts for all 3,003 series in our environment. Using a c5d.24xlarge instance (96 CPU, 192 GB RAM) it takes 12 seconds to train and predict 3,003 instances of a Seasonal Naive forecast. Therefore, the RCC of the simple ensemble is 28. In the next table, you can find the RCC of the deep learning models and the ensembles Method Type Relative Computational Complexity (RCC) DeepAR DL 313,000 Feed-Forward DL 47,300 Transformer DL 47,500 WaveNet DL 306,000 Ensemble-DL DL 713,800 Ensemble - Stats Statistical 28 SeasonalNaive Benchmark 1 Summary: Comparison with SOTA benchmarks We present a summary comparison, including SMAPE, RCC, Cost proxy, and self-reported computational time. image We observe that StatsForecast yields average SMAPE results similar to DeepAR with computational savings of 99%. Furthermore, the StatsForecast ensemble: * Has better performance than the N-BEATS model for the Yearly and Other groups. * Has a better average performance than the individual Gluon-TS models. * It performs better than all Gluont-TS models for the Monthly and Other groups. * It is consistently better than the Transformer, Wavenet, and Feed-Forward models. In conclusion, the deep learning ensemble achieves 12.27 points of accuracy (sMAPE), with a relative computational cost of 713,000 and a proxy monetary cost of USD 11,4200. The simple statistical ensemble achieves 12.63 points of accuracy, with a relative computational cost of 28 and a proxy monetary cost of USD 0.5c. Therefore, the DL Ensemble is only 0.36 points more accurate than the statistical ensemble, but 25,000 times more expensive. In plain English: a deep-learning ensemble that takes more than 14 days to run and costs around USD 11,000, outperforms a statistical ensemble that takes 6 minutes to run and costs $0.5c by only 0.36 points of SMAPE. Conclusions For this setting: Deep Learning models are simply worse than a statistical ensemble. To outperform this statistical ensemble by 0.36 points of SMAPE a complicated deep learning ensemble is needed. The deep learning ensemble, however, takes more than two weeks to run, costs several thousands of dollars and demands several engineering hours. In conclusion: in terms of speed, costs, simplicity and interpretability, deep learning is far behind the simple statistical ensemble. In terms of accuracy, they seem to be rather close. This conclusion might or not hold in other datasets, however, given the a priori uncertainty of the benefits and the certainty of cost, statistical methods should be considered the first option in daily forecasting practice. Unsolicited Advice Choose your models wisely. It would be extremely expensive and borderline irresponsible to favor deep learning models in an organization before establishing solid baselines. Simpler is sometimes better. Not everything that glows is gold. Reproducibility To reproduce the main results you have to: 1. Create the environment using conda env create -f environment.yml. 2. Activate the environment using conda activate m3-dl. 3. Run the experiments using python -m src.experiment --group [group] where [group] can be Other, Monthly, Quarterly, and Yearly. 4. Finally, you can evaluate the forecasts using python -m src.evaluation. References * Hyndman, Rob J. & Khandakar, Yeasmin (2008). "Automatic Time Series Forecasting: The forecast package for R" * Hyndman, Rob J., et al (2008). "Forecasting with exponential smoothing: the state space approach" * Svetunkov, Ivan & Kourentzes, Nikolaos. (2015). "Complex Exponential Smoothing". 10.13140/RG.2.1.3757.2562. * Jose A. Fiorucci, Tiago R. Pellegrini, Francisco Louzada, Fotios Petropoulos, Anne B. Koehler: Models for optimising the theta method and their relationship to state space models, International Journal of Forecasting, Volume 32, Issue 4, 2016, Pages 1151-1161, ISSN 0169-2070 * Fotios Petropoulos, Ivan Svetunkov: A simple combination of univariate models, International Journal of Forecasting, Volume 36, Issue 1, 2020, Pages 110-115, ISSN 0169-2070. * Spyros Makridakis, Evangelos Spiliotis, Vassilios Assimakopoulos, ArtemiosAnargyros Semenoglou, Gary Mulder & Konstantinos Nikolopoulos (2022): Statistical, machine learning and deep learning forecasting methods: Comparisons and ways forward, Journal of the Operational Research Society, DOI: 10.1080/ 01605682.2022.2118629 Footer (c) 2022 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.