[HN Gopher] Show HN: Exp. Smoothing is 32% more accurate and 100...
___________________________________________________________________
Show HN: Exp. Smoothing is 32% more accurate and 100x faster than
Neural-Prophet
We benchmarked on more than 55K series and show that ETS improves
MAPE and sMAPE forecast accuracy by 32% and 19%, respectively, with
104x less computational time over NeuralProphet. We hope this
exercise helps the forecast community avoid adopting yet another
overpromising and unproven forecasting method.
Author : maxmc
Score : 115 points
Date : 2022-08-17 19:33 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ren_engineer wrote:
| what's the consensus on machine learning vs more classical
| methods for time series forecasting? I know in 2018 a hybrid
| model won the M4 competition, obviously in this case classical
| still beats AI/ML
|
| https://en.wikipedia.org/wiki/Makridakis_Competitions
| rich_sasha wrote:
| I think depends massively in what you mean by "time series". If
| it is really an ARMA model you're looking at then ML can only
| bring noise to the problem. If it is a complex large system
| that happens to be indexed by time, ML can well be better.
|
| AFAIK Prophet had more modest scope than "be all and end all of
| TS modelling", rather a decent model for everything. It might
| indeed be excellent at that...
| dylanjcastillo wrote:
| In the M5 competition[1], most winning solutions used LightGBM.
| So ML beat classical.
|
| Just a couple of the winning solutions used DL.
|
| [1]
| https://www.sciencedirect.com/science/article/pii/S016920702...
| gillesjacobs wrote:
| This wouldn't pass peer-review of it were a paper. Major issues:
|
| - No fair hyperparametrization for Neural Prophet. They mention
| multiple times they used default hyperparams or ad-hoc example
| hyperparams.
|
| - 3/4 benchmark datasets (one they didn't finish training) where
| ETS outperforms is not strong evidence of all-round robustness.
| Benchmarks like SuperGlue for NLP combine 10 completely different
| tasks with more subtasks to assess language model performance.
| And even SuperGlue is not uncontroversial.
| gillesjacobs wrote:
| While the results don't prove the superiority convincingly, it
| does seem that ETS is a good candidate as a first go-to in
| practical applications. "In practice, practice and theory are
| the same. In theory, they are not."
| variaga wrote:
| In _theory_ practice and theory are the same. In _practice_
| they are not.
| gillesjacobs wrote:
| HN pedantry ruins the fun of wordplay yet again.
| anon_123g987 wrote:
| In theory, his version is right. In practice, yours.
| Imnimo wrote:
| In the original Prophet paper
| (https://peerj.com/preprints/3190.pdf) they claim that Prophet
| outperforms ETS (see Figure 7, for example). And in the
| NeuralProphet paper, they claim that it outperforms Prophet (but
| do not, as far as I can see, compare directly to ETS). Here we
| see ETS outperforms NeuralProphet.
|
| Presumably this apparent non-transitivity is because of
| differences in each evaluation. If we fix the evaluation to the
| method used here, is it still the case that NeuralProphet
| outperforms Prophet (and therefore the claim that Prophet
| outperforms ETS is not correct)? Or is it that NeuralProphet does
| not outperform Prophet, but Prophet does outperform ETS?
| beernet wrote:
| As usual in ML, the appropriate solution depends on the problem
| and context.
|
| ML (particularly DL) tends to outperform "classical" statistical
| time series forecasting when the data is (strongly) nonlinear,
| highly dimensional and large. The opposite holds as well.
|
| It is also important to note that accuracy is not the only
| relevant metric in practical applications. Explainability is of
| particular interest in time series forecasting: it is good to
| know if your sales are going to increase/decrease, but it is even
| more valuable to know which input variables are likely to account
| for that change. Hence, a "simple" model with inferior
| forecasting accuracy might be preferred to a stronger estimator
| if it can give insights to not only the "what" will happen, but
| also the "why".
| tomwphillips wrote:
| > ML (particularly DL) tends to outperform "classical"
| statistical time series forecasting when the data is (strongly)
| nonlinear, highly dimensional and large.
|
| This claim about forecasting with DL comes up a lot, but I've
| seen little evidence to back it up.
|
| Personally, I've never managed to have the same success others
| apparently have with DL time series forecasting.
| beernet wrote:
| It's true simply because large ANNs have a higher capacity,
| which is great for large, nonlinear data but less so for
| small datasets or simple functions.
|
| In any case, Transformers are eating ML right now and I'm
| actually surprised there's no "GPT-3 for time series" yet.
| It's technically the same problem as language modeling (that
| is, multi-step prediction of numerics), however, there is
| only a comparably little amount of human-generated data for
| self-supervised learning of a time series forecasting model.
| Another reason might be that the expected applications and
| potentials of such a pre-trained model aren't as glamorous as
| generating language.
| time_to_smile wrote:
| > It's technically the same problem as language modeling
|
| You're thinking of modeling event sequences which is not
| strictly speaking the same as time series modeling.
|
| Plenty of people do use LSTMs to model event sequences,
| using the hidden state of the model as a vector
| representation of processes current location walking a
| graph (i.e. a Users journey through a mobile app, or
| navigating following links on the web.)
|
| Time series is different because the ticks of timed events
| are at consistent intervals and are also part of the
| problem being modeled. In general time series models have
| often been distinct from sequence models.
|
| The reason there's no GPT-3 for any general sequence is the
| lack of data. Typically the vocabulary of events is much
| smaller than natural languages and the corpus of sequences
| much smaller.
| time_to_smile wrote:
| A larger problem is that time series modeling is particularly
| resistant towards black box approaches since a lot of
| information is encoded in the model itself.
|
| Take even a simple moving average model on daily observations.
| Consider stock ticker data (where there are no weekends) and
| web traffic data (where there is an observation each day). The
| stock ticker data should be smoothed with a 5 day window and
| the web traffic with a 7 to help reduce the impact of weekly
| effects (which probably shouldn't exist in the stock market
| anyway).
|
| It's possible in either of these cases you might find a moving
| average that performs better on some choose metric, say 4 or 8
| days. However neither of these alternatives make any sense as a
| window if we're trying to remove day-of-week effect, and unless
| you can come up with a justifiable explanation, smoothing over
| arbitrary windows should be avoided.
|
| If you let a black box optimize even a simple moving average
| you would be avoiding some very essential introspection into
| what your model is actually claiming.
|
| Not to mention that we often can do more than just prediction
| with these intentional model tunings (for example day-of-week
| effect can be explicitly differenced from the data to measure
| exactly how much sales should increase on a Saturday)
| rich_sasha wrote:
| Hmm, wow. When I saw the headline, I assumed they used like one
| dataset or something similarly limiting.
|
| I'd need to dig out the original paper, but I would be surprised
| if the original didn't compare to basic benchmark methods. But
| from memory, I never saw such a comparison (until now).
| cercatrova wrote:
| Can someone explain this? I don't know what the context is for
| this Show HN.
| IshKebab wrote:
| They're time series prediction methods. E.g. they mention
| electricity usage forecasting - given historical data, what
| will the usage be in 1 hour?
|
| Facebook's Prophet is quite popular in the space I understand.
| No idea about the other two.
| mkl wrote:
| A minor language error: "this model does not outperform classical
| statistical methods _neither_ in accuracy _nor_ speed. " should
| say "either" and "or".
| ISV_Damocles wrote:
| https://dictionary.cambridge.org/grammar/british-grammar/nei...
| mkl wrote:
| Nothing there seems to contradict me. The problem in the
| linked page is that "neither ... nor" is used after "not",
| which makes it a double negative.
| lightedman wrote:
| The "Not-Neither-Nor" sequence is typical, even with regards to
| American English, versus British English (the Queen's English.)
| In either case, both are technically-correct.
| bo1024 wrote:
| As part of a double negative?
| kevin_thibedeau wrote:
| English is nothing if not inconsistent.
___________________________________________________________________
(page generated 2022-08-17 23:00 UTC)