[HN Gopher] Moirai: A time series foundation model for universal...
___________________________________________________________________
Moirai: A time series foundation model for universal forecasting
Author : throwaway888abc
Score : 181 points
Date : 2024-03-26 00:51 UTC (22 hours ago)
(HTM) web link (blog.salesforceairesearch.com)
(TXT) w3m dump (blog.salesforceairesearch.com)
| gorold wrote:
| Code is available! https://github.com/SalesforceAIResearch/uni2ts
| peter_l_downs wrote:
| And the documentation makes me think they did a great job
| making this easy to use. Looking forward to playing around with
| it.
|
| Edit: oh you're one of the authors -- thank you, and
| congratulations!
| fbdab103 wrote:
| Choosing beggers and all that, but the LOTSA dataset could
| really benefit from a Readme on HuggingFace. Even just a
| citation back to the original paper would be good.
| gorold wrote:
| That's actually a great suggestion, thanks! We're also still
| working on improving the readability/usability of the
| codebase too
| mikeyouse wrote:
| Looks super interesting. Definitely going to play with this
| though it took me way too long to figure out what Salesforce Air
| Search was.. maybe that's a sign I should log off for the day.
| dcl wrote:
| What does 'any-variate' forecasting mean? Can you use this pre-
| trained model to produce forecasts when there exists useful
| covariates/features/predictors? Is this something the other TS
| foundation models can/cannot do?
| gorold wrote:
| When we deal with many different multivariate time series, each
| time series can have a different number of variates. So "any-
| variate" means that the model is able to take as inputs
| multivariate time series with arbitrary number of variates, and
| model the interactions with the Transformer's attention
| mechanism. This is something that many other TS foundation
| models do not consider yet - they convert all multivariate time
| series into multiple univariate time series.
|
| Whether or not the forecasts improves as a result of the
| additional covariates is still an open question which needs to
| be studied more -- we need to build better evaluations and
| benchmarks for this.
| dcl wrote:
| Understood, thank you. There are certainly applications in
| demand sensing/demand forecasting where things like recent
| order information, recent sales, CRM inputs are quite
| predictive of near-term outcomes, but become useless for
| longer horizon forecasts. In my experience, when information
| like this is available, no time-series technique that is
| unable to leverage this information would beat even simple
| regressions for short term horizon forecasts.
| lukas_b wrote:
| This looks very interesting! I'm trying to understand if the
| flattening technique might work for my ts. It's structured as
| follows: At each time step t, I have an m by n data matrix.
| The value for m (rows) varies per time step. n stays constant
| and represents the features. And i want to predict one of the
| n values. (In this case, t represents a single day, m (rows)
| represent the people that entered a store on that day, and n
| (cols) represent various features of the people. I want to
| predict one of those features, given the others.) The fact
| that it's a time series matters, because i expect the
| relationship to change over time. For instance some feature
| n[x] (person wears a yellow shirt) might be correlated with
| my target feature n[y] (person steals) but only in the
| summer. would it be possible to flatten this too? What would
| that look like?
| hackerlight wrote:
| They flatten the time and variate dimensions into a single 1D
| vector. So it can handle arbitrary numbers of features.
| longdog wrote:
| Interesting, but I'm very skeptical. There are over a dozen
| transformers-based foundation time series model released in the
| past year and without fail, every one of them claims to be at or
| near SOTA. For example:
|
| - Time-LLM (https://arxiv.org/abs/2310.01728)
|
| - Lag-Llama (https://arxiv.org/abs/2310.08278)
|
| - UniTime (https://arxiv.org/abs/2310.09751)
|
| - TEMPO (https://arxiv.org/abs/2310.04948)
|
| - TimeGPT (https://arxiv.org/abs/2310.03589)
|
| - TimesFM (https://arxiv.org/html/2310.10688v2)
|
| - GPT4TS (https://arxiv.org/pdf/2308.08469.pdf)
|
| Yet not a SINGLE transformer-based model I've managed to
| successfully run has beaten gradient boosted tree models on my
| use case (economic forecasting). To be honest I believe these
| foundational models are all vastly overfit. There's basically
| only 2 benchmarking sets that are ever used in time series (the
| Monash set and the M-competition set), so it'd be easy to
| overtune a model just to perform well on these.
|
| I would love to see someone make a broader set of varied
| benchmarks and have an independent third party do these
| evaluations like with LLM leaderboards. Otherwise I assume all
| published benchmarks are 100% meaningless and gamed.
| dcl wrote:
| Why would you expect anything to work well for economic
| forecasting :p
| donbreo wrote:
| Jamie pull up the article that proves none of the published
| models work well with economic forecasting
| rokkitmensch wrote:
| I'm so sad. This hilarious comment is languishing in the
| doldrums.
| idiotsecant wrote:
| Not reddit.
| ImHereToVote wrote:
| There is always Gary Stevensons Economics model. Works
| without fail.
| hackerlight wrote:
| Neural nets are known to struggle with tabular data. Have you
| tried fine tuning or attaching a decoder somewhere that you
| train on your task? Zero-shot inference might be asking for too
| much.
| boredemployee wrote:
| >> Neural nets are known to struggle with tabular data.
|
| Not disagreeing with you, and I'm not a specialist, but it's
| funny that lot of papers seem to claim exactly the opposite.
| hackerlight wrote:
| What paper says the opposite? This is what I can find:
|
| https://arxiv.org/abs/2207.08815
|
| https://arxiv.org/abs/2305.02997
| logicchains wrote:
| Pretty much any real-world time series prediction task is going
| to involve more data than just the time series itself, and some
| of this data will probably be tabular, so it's not surprise
| gradient boosted trees perform better.
| Tarq0n wrote:
| Honestly the best part of this paper is they've put together a
| large new set of time series for benchmarking.
| tudorw wrote:
| https://facebook.github.io/prophet/
|
| "Prophet is a procedure for forecasting time series data based
| on an additive model where non-linear trends are fit with
| yearly, weekly, and daily seasonality, plus holiday effects. It
| works best with time series that have strong seasonal effects
| and several seasons of historical data. Prophet is robust to
| missing data and shifts in the trend, and typically handles
| outliers well."
| refulgentis wrote:
| ?
| wenc wrote:
| They should sign up for the next Makridakis forecasting
| competition.
|
| https://en.wikipedia.org/wiki/Makridakis_Competitions
|
| Makridakis and Hibon reached the sad conclusion that
| "statistically sophisticated and complex methods do not
| necessarily provide more accurate forecasts than simpler ones."
| hcarlens wrote:
| That was true in the first Makridakis competition ("M1") in
| 1982, and possibly until M4 in 2018, but both M5 and M6 were
| won by what would generally be considered relatively
| sophisticated methods (e.g. LightGBM).
|
| The Wikipedia article doesn't have that much detail on M5 or
| M6, but the M5 papers are in the International Journal of
| Forecasting[1] and M6 should be published later this year
| (there's already a preprint on arxiv [2]).
|
| I recently spent some time looking into the history and results
| of the M competitions and had a chance to speak to Professor
| Makridakis about them, as well as the winners of each of the M6
| competition tracks [3]. While the methods have become more
| sophisticated, some conclusions from M1 still seem to hold: in
| particular, that there is no overall "best" method, and that
| the winning method tends to be different for different types of
| data, time horizons, and evaluation metrics.
|
| [1]:
| https://www.sciencedirect.com/science/article/pii/S016920702...
| [2]: https://arxiv.org/abs/2310.13357 [3]:
| https://mlcontests.com/state-of-competitive-machine-learning...
| vermorel wrote:
| Our basic low-dimensional parametric model landed No1 at the
| SKU level at the M5, see my lecture
| https://www.lokad.com/tv/2022/1/5/no1-at-the-sku-level-in-
| th... (more references at the bottom)
| hcarlens wrote:
| Interesting, thanks for sharing!
| wenc wrote:
| A recent thread on Amazon's new Chronos forecasting model
| showed that an ensemble of simple models outperformed it (a
| highly parametrized transformed model) on the M competition
| datasets.
|
| https://github.com/Nixtla/nixtla/tree/main/experiments/amazo.
| ..
| thelastbender12 wrote:
| I'm curious where universal forecasting models are most useful.
| It is technically fascinating but forecasting specifically seems
| like a domain where you'd want interpretable modeling - you use
| it for big-value problems and it significantly affects your
| action/policy. So, the tradeoff between performance and model
| simplicity should lean towards the latter?
| jdowner wrote:
| So I am not alone! There seem so few people who hold this view
| these days.
| paul80808 wrote:
| Same for my shop - we manage a large pool of cost driven by
| partially forcastable factors; we've repeatedly rejected
| methods purely on explainability grounds. Our accountability
| requirements do not allow us to point the finger at an LLM if
| we get it wrong.
| melondonkey wrote:
| I know. Here I am modeling my data generating process like a
| chump.
| laylower wrote:
| Show us how it performs against other models on the M3, M4 and M5
| competition.
|
| This is the gold standard of forecasting tools.
|
| Moirai stands for fates [https://en.wikipedia.org/wiki/Moirai] in
| Greek mythology
| tsurba wrote:
| Very cool that the dataset and model weights are open right away!
| This paper also doesn't have a bunch of weird architectural
| choices pulled out of nowhere like the other TS foundation models
| recently. Looks like it will actually be useful, thank you! Maybe
| I will actually get to do representation learning for TS during
| my PhD.
|
| As a sidenote/rant, it would be nice if all _supervised_ TS
| benchmarks included "DLinear + RevIN" as the standard baseline,
| as in my experiments it tends to get about the same performance
| as all other new SOTA forecasting models. Most papers compare to
| the linear model without RevIN while they themselves use it, and
| only beat it because of that :) And in any case supervised
| training of transformers from scratch on datasets having less
| than 1M points is just stupid (so less raw data than a single
| image?). Less than 1B is still at least mildly stupid.
|
| Here of course the angle is zero-shot so its somewhat excused
| from this, but it still would be interesting whether it can beat
| that supervised model combination.
| pbronez wrote:
| References this paper on Time Series transformers. First I've
| seen someone apply transformers to time series specifically. Very
| curious how well this might work for low-frequency events.
|
| https://arxiv.org/abs/2402.02592
| melondonkey wrote:
| One detail I don't really understand is the low-variance normal
| component of the target mixture. Would be curious to see from the
| weights how often that was used
| magundu wrote:
| Any one tried this for Prometheus metrics?
___________________________________________________________________
(page generated 2024-03-26 23:02 UTC)