https://github.com/Nixtla/statsforecast/tree/main/experiments/m3

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
      + Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
      + For
      + Enterprise
      + Teams
      + Startups
      + Education
      + By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
      + Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
      + Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
Nixtla / statsforecast Public

  * Notifications
  * Fork 107
  * Star 1.8k

  * Code
  * Issues 41
  * Pull requests 9
  * Discussions
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Projects
  * Security
  * Insights

main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
statsforecast/experiments/m3/
Go to file
statsforecast/experiments/m3/

Latest commit

@FedericoGarza
FedericoGarza fix: typos
...
f563ff0 Dec 1, 2022
fix: typos
f563ff0

Git stats

  * History

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
. .
src
fix: rename ensemble name
Dec 1, 2022
README.md
fix: typos
Dec 1, 2022
environment.yml
fix: add mising model experiment
Nov 30, 2022
[                    ]
Statistical vs Deep Learning forecasting methods Abstract Background
Experiment Results Accuracy: Comparison with SOTA benchmarks
Computational Complexity: Comparison with SOTA benchmarks Tine
Engineering Relative Computational Complexity Summary: Comparison
with SOTA benchmarks Conclusions Unsolicited Advice Reproducibility
References

README.md

 Statistical vs Deep Learning forecasting methods

Comparison of several Deep Learning models and ensembles to classical
statistical univariate models for the 3,003 series of the M3
competition.

 Abstract

We present a reproducible experiment that shows that:

 1. A simple statistical ensemble outperforms most individual
    deep-learning models.

 2. A simple statistical ensemble is 25,000 faster and only slightly
    less accurate than an ensemble of deep learning models.

In other words, deep-learning ensembles outperform statistical
ensembles just by 0.36 points in SMAPE. However, the DL ensemble
takes more than 14 days to run and costs around USD 11,000, while the
statistical ensemble takes 6 minutes to run and costs $0.5c.

 Background

In Statistical, machine learning and deep learning forecasting
methods: Comparisons and ways forward, Makridakis and other prominent
participants of the forecasting science community compare several
Deep Learning and Statistical models for all 3,003 series of the M3
competition.

    The purpose of [the] paper is to test empirically the value
    currently added by Deep Learning (DL) approaches in time series
    forecasting by comparing the accuracy of some state-of-theart DL
    methods with that of popular Machine Learning (ML) and
    statistical ones.

The authors conclude that:

    We find that combinations of DL models perform better than most
    standard models, both statistical and ML, especially for the case
    of monthly series and long-term forecasts.

We don't think that's the full picture.

By including a statistical ensemble, we show that these claims are
not completely warranted and that one should rather conclude that,
for this setting at least, Deep Learning is rather unattractive.

 Experiment

Building upon the original design, we further included A simple
combination of univariate models in the comparison.

This ensemble is formed by averaging four statistical models:
AutoARIMA, ETS, CES and DynamicOptimizedTheta. This combination won
sixth place and was the simplest ensemble among the top 10 performers
in the M4 competition.

For the experiment, we use StatsForecast's implementation of Arima,
ETS, CES and DOT.

For the DL models and ensembles, we reproduce the reported metrics
and results from the mentioned paper.

 Results

 Accuracy: Comparison with SOTA benchmarks

Accuracy is reported in Symmetric mean absolute percentage error (
SMAPE)

The M3 dataset has four groups of time series. In the next graph, you
can see the performance of all models and ensembles.

image

In the next table, you can see the performance of the models across
all four groups and the average performance for all groups.

image

 Computational Complexity: Comparison with SOTA benchmarks

Computational complexity is reported in time, lines of code and,
Relative Computational Complexity (RCC).

 Tine

Using StatsForecast and a 96 cores EC2 instance (c5d.24xlarge) it
takes 5.6 mins to train, forecast and ensemble the four models for
the 3,003 series of M3.

     Time (mins)       Yearly Quarterly Monthly Other
StatsForecast ensemble   1.10      1.32    2.38  1.08

The authors of the paper only report computational time for the
monthly group, which amounts to 20,680 mins or 14.3 days. In
comparison, the StatsForecast ensemble only takes 2.38 minutes to run
for that group. Furthermore, the authors don't include times for
Hyperparameter optimization.

For this comparison, we will take the reported 14 days of
computational time. However, it must be noted that the true
computational time must be significantly higher for all groups.

 Engineering

Furthermore, running all statistical models, including data
downloading, data wrangling, training, forecasting and ensembling the
models, can be achieved in less than 150 lines of Python code. In
comparison, this repo has more than 1,000 lines of code and needs
Python, R, Mongo and Shell code.

 Relative Computational Complexity

The mentioned paper uses Relative Computational Complexity (RCC) for
comparing the models. To calculate the RCC of StatsForecast, we
followed the same methodology and measured the time it takes to
generate naive forecasts for all 3,003 series in our environment.

Using a c5d.24xlarge instance (96 CPU, 192 GB RAM) it takes 12
seconds to train and predict 3,003 instances of a Seasonal Naive
forecast. Therefore, the RCC of the simple ensemble is 28.

In the next table, you can find the RCC of the deep learning models
and the ensembles

     Method         Type     Relative Computational Complexity (RCC)
DeepAR           DL                                          313,000
Feed-Forward     DL                                           47,300
Transformer      DL                                           47,500
WaveNet          DL                                          306,000
Ensemble-DL      DL                                          713,800
Ensemble - Stats Statistical                                      28
SeasonalNaive    Benchmark                                         1

 Summary: Comparison with SOTA benchmarks

We present a summary comparison, including SMAPE, RCC, Cost proxy,
and self-reported computational time.

image

We observe that StatsForecast yields average SMAPE results similar to
DeepAR with computational savings of 99%.

Furthermore, the StatsForecast ensemble:

  * Has better performance than the N-BEATS model for the Yearly and
    Other groups.
  * Has a better average performance than the individual Gluon-TS
    models.
  * It performs better than all Gluont-TS models for the Monthly and
    Other groups.
  * It is consistently better than the Transformer, Wavenet, and
    Feed-Forward models.

In conclusion, the deep learning ensemble achieves 12.27 points of
accuracy (sMAPE), with a relative computational cost of 713,000 and a
proxy monetary cost of USD 11,4200. The simple statistical ensemble
achieves 12.63 points of accuracy, with a relative computational cost
of 28 and a proxy monetary cost of USD 0.5c.

Therefore, the DL Ensemble is only 0.36 points more accurate than the
statistical ensemble, but 25,000 times more expensive.

In plain English: a deep-learning ensemble that takes more than 14
days to run and costs around USD 11,000, outperforms a statistical
ensemble that takes 6 minutes to run and costs $0.5c by only 0.36
points of SMAPE.

 Conclusions

For this setting: Deep Learning models are simply worse than a
statistical ensemble. To outperform this statistical ensemble by 0.36
points of SMAPE a complicated deep learning ensemble is needed. The
deep learning ensemble, however, takes more than two weeks to run,
costs several thousands of dollars and demands several engineering
hours.

In conclusion: in terms of speed, costs, simplicity and
interpretability, deep learning is far behind the simple statistical
ensemble. In terms of accuracy, they seem to be rather close.

This conclusion might or not hold in other datasets, however, given
the a priori uncertainty of the benefits and the certainty of cost,
statistical methods should be considered the first option in daily
forecasting practice.

 Unsolicited Advice

Choose your models wisely.

It would be extremely expensive and borderline irresponsible to favor
deep learning models in an organization before establishing solid
baselines.

Simpler is sometimes better. Not everything that glows is gold.

 Reproducibility

To reproduce the main results you have to:

 1. Create the environment using conda env create -f environment.yml.
 2. Activate the environment using conda activate m3-dl.
 3. Run the experiments using python -m src.experiment --group
    [group] where [group] can be Other, Monthly, Quarterly, and
    Yearly.
 4. Finally, you can evaluate the forecasts using python -m
    src.evaluation.

 References

  * Hyndman, Rob J. & Khandakar, Yeasmin (2008). "Automatic Time
    Series Forecasting: The forecast package for R"
  * Hyndman, Rob J., et al (2008). "Forecasting with exponential
    smoothing: the state space approach"
  * Svetunkov, Ivan & Kourentzes, Nikolaos. (2015). "Complex
    Exponential Smoothing". 10.13140/RG.2.1.3757.2562.
  * Jose A. Fiorucci, Tiago R. Pellegrini, Francisco Louzada, Fotios
    Petropoulos, Anne B. Koehler: Models for optimising the theta
    method and their relationship to state space models,
    International Journal of Forecasting, Volume 32, Issue 4, 2016,
    Pages 1151-1161, ISSN 0169-2070
  * Fotios Petropoulos, Ivan Svetunkov: A simple combination of
    univariate models, International Journal of Forecasting, Volume
    36, Issue 1, 2020, Pages 110-115, ISSN 0169-2070.
  * Spyros Makridakis, Evangelos Spiliotis, Vassilios Assimakopoulos,
    ArtemiosAnargyros Semenoglou, Gary Mulder & Konstantinos
    Nikolopoulos (2022): Statistical, machine learning and deep
    learning forecasting methods: Comparisons and ways forward,
    Journal of the Operational Research Society, DOI: 10.1080/
    01605682.2022.2118629

Footer

 (c) 2022 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.