[HN Gopher] Zero-Shot Forecasting: Our Search for a Time-Series ...
___________________________________________________________________
Zero-Shot Forecasting: Our Search for a Time-Series Foundation
Model
Author : tiwarinitish86
Score : 64 points
Date : 2025-06-13 05:04 UTC (17 hours ago)
(HTM) web link (www.parseable.com)
(TXT) w3m dump (www.parseable.com)
| nikhil4usinha wrote:
| Interesting, what are the usecases youre using the models for?
| Would like to know more on that, like anomaly detection
| parmesant wrote:
| That's actually one of the use-cases that we set out to explore
| with these models. We'll release a head-to-head comparison
| soon!
| CubsFan1060 wrote:
| That's the thing I'm most interested in out of these. Super
| interested to see what you find out.
|
| Did you or do you plan to publish any of your code or data
| sets from this?
| Debanitrkl wrote:
| Author here, we're just getting started with these
| experiments and plan to apply them to more features on our
| roadmap. Future posts will be more detailed, based on the
| feedback we received here. Once we finish implementing
| these features, we'll be happy to share the code and
| dataset.
| dragon195346 wrote:
| Great read! Really interesting to see how these foundation models
| like Chronos and Toto are starting to perform well on real-world
| observability data.
| wenc wrote:
| I wonder how this would perform on the M4 Makridakis competitions
| (time series competitions)
|
| https://github.com/Mcompetitions/M4-methods
|
| https://en.wikipedia.org/wiki/Makridakis_Competitions
|
| Makridakis' conclusion remained true for many years:
| "statistically sophisticated and complex methods do not
| necessarily provide more accurate forecasts than simpler ones."
|
| Maybe things have changed?
|
| (side: Nixtla showed a simple ensemble outperforming Chronos, and
| the Chronos team responded, but there's some back and forth in
| the comments: https://www.linkedin.com/pulse/extended-comparison-
| chronos-a...)
| parmesant wrote:
| This looks like a great benchmark! We've been thinking of doing
| a better and more detailed follow-up and this seems like the
| perfect dataset to do that with. Thanks!
| mvATM99 wrote:
| Look i'm optimistic about time-series foundation models too, but
| this post is hard to take seriously when the test is so flawed:
|
| - Forward filling missing short periods of missing values. Why
| keep this in when you explictly mention this is not normal?
| Either remove it all or don't impute anything
|
| - Claiming superiority over classic models and then not
| mentioning any in the results table
|
| - Or let's not forget, the cardinal sin of using MAPE as an
| evaluation metric
| parmesant wrote:
| Author here, we're trying these out for the first time for our
| use-cases so these are great points for us to improve upon!
| mvATM99 wrote:
| Good to see positive reception to feedback! Sorry if my
| message came out as condescending, was not the intent. I
| recommend reading this piece on metrics
| https://openforecast.org/wp-
| content/uploads/2024/07/Svetunko.... It's easy to grasp, yet
| it contains great tips.
| stevenae wrote:
| To clarify, you'd prefer rmsle?
| mvATM99 wrote:
| Short answer: i use multiple metrics, never rely on just 1
| metric.
|
| Long answer: Is the metric for people with subject-matter
| knowledge? Then (Weighted)RMSSE, or the MASE alternative for
| a median forecast. WRMSSE is is very nice, it can deal with
| zeroes, is scale-invariant and symmetrical in penalizing
| under/over-forecasting.
|
| The above metrics are completely uninterpretable to people
| outside of the forecasting sphere though. For those cases i
| tend to just stick with raw errors; if a percentage metric is
| really necessary then a Weighted MAPE/RMSE, the weighing is
| still graspable for most, and it doesn't explode with zeroes.
|
| I've also been exploring FVA (Forecast Value Added), compared
| against a second decent forecast. FVA is very intuitive, if
| your base-measures are reliable at least. Aside from that i
| always look at forecast plots. It's tedious but they often
| tell you a lot that gets lost in the numbers.
|
| RMSLE i havent used much. From what i read it looks
| interesting, though more for very specific scenarios (many
| outliers, high variance, nonlinear data?)
| stevenae wrote:
| Thanks for the reply! I am outside the forecasting sphere.
|
| RMSLE gives proportional error (so, scale-invariant)
| without MAPE's systematic under-prediction bias. It does
| require all-positive values, for the logarithm step.
| ted_dunning wrote:
| MAPE can be a problem also if you have a problem where rare
| excursions are what you want to predict and the cost of
| missing an event is much higher than predicting a non-
| event. A model that just predicts no change would have very
| low MAPE because most of the time nothing happens. When the
| event happens, however, the error of predicting status quo
| ante is much worse than small baseline errors.
| sheepscreek wrote:
| > Our dataset consisted of Kubernetes pod metrics collected from
| a production retail checkout application.
|
| That sums it up and it's no surprise why Datadog's toto model
| performed exceptionally well.
|
| The results would have been much more useful had they opted for a
| heterogenous mix of data sets. I am thinking of census data and
| statistics, or financial forecasting (GDP, interest rates), or
| clinical trial drop-out rates etc. So many interesting problems
| out there.
| bitshiftfaced wrote:
| The GIFT Eval benchmark would be a good place to start:
| https://huggingface.co/spaces/Salesforce/GIFT-Eval
| fumeux_fume wrote:
| I'm a bit confused by the results table. Were these models tested
| against the same dataset? Also, a visualization of the test data
| and forecasts would be helpful as well.
| Fripplebubby wrote:
| I think that the concept of a "foundation model" for time series
| is actually a bit flawed as presented in this blog post. A
| foundation model is interesting because it is capable of many
| tasks _beyond the target tasks_ that it was trained to do,
| whereas what the author is looking for is a time-series model
| that can make out-of-distribution predictions without re-training
| - which is, in my opinion, a problem that is pretty well solved
| by existing ARIMA and (especially) Prophet models (Yes, you have
| to re-fit the model on your distribution, but this is not at all
| akin to the task of training or fine-tuning an LLM, it's
| something you can do in seconds on a modern CPU, and yes, there
| are certain hyperparameters that may need to be selected, but
| they are actually fairly minimal).
|
| But for a model to make out-of-distribution predictions does not
| make it a foundation model for time series, really that's just
| the basic task that all time series forecasting models do. A more
| interesting question is, does an LLM architecture seem to improve
| the task of univariate or multivariate time-series prediction? I
| don't think the answer is yes, although, depending on your
| domain, being able to use language inputs to your model may have
| a positive impact, and the best way to incorporate language
| inputs is certainly to use a transformer architecture, but that
| isn't what is addressed in this post.
| th0ma5 wrote:
| A lot of people try to hedge this kind of sober insight along
| with their personal economic goals to say all manner of
| unfalsifiable statements of adequate application in some
| context, but it is refreshing to try to deal with the issues
| separately and I think a lot of people miss the insufficiency
| compared to traditional methods in all cases that I've heard of
| so far.
| cyanydeez wrote:
| Ai slop
| spmurrayzzz wrote:
| I'd be curious what the results would be with the automated
| Autogluon fit/evals. I suspect given the results here, a weighted
| average model would likely win out.
___________________________________________________________________
(page generated 2025-06-13 23:01 UTC)