[HN Gopher] Moirai: A time series foundation model for universal...
       ___________________________________________________________________
        
       Moirai: A time series foundation model for universal forecasting
        
       Author : throwaway888abc
       Score  : 181 points
       Date   : 2024-03-26 00:51 UTC (22 hours ago)
        
 (HTM) web link (blog.salesforceairesearch.com)
 (TXT) w3m dump (blog.salesforceairesearch.com)
        
       | gorold wrote:
       | Code is available! https://github.com/SalesforceAIResearch/uni2ts
        
         | peter_l_downs wrote:
         | And the documentation makes me think they did a great job
         | making this easy to use. Looking forward to playing around with
         | it.
         | 
         | Edit: oh you're one of the authors -- thank you, and
         | congratulations!
        
         | fbdab103 wrote:
         | Choosing beggers and all that, but the LOTSA dataset could
         | really benefit from a Readme on HuggingFace. Even just a
         | citation back to the original paper would be good.
        
           | gorold wrote:
           | That's actually a great suggestion, thanks! We're also still
           | working on improving the readability/usability of the
           | codebase too
        
       | mikeyouse wrote:
       | Looks super interesting. Definitely going to play with this
       | though it took me way too long to figure out what Salesforce Air
       | Search was.. maybe that's a sign I should log off for the day.
        
       | dcl wrote:
       | What does 'any-variate' forecasting mean? Can you use this pre-
       | trained model to produce forecasts when there exists useful
       | covariates/features/predictors? Is this something the other TS
       | foundation models can/cannot do?
        
         | gorold wrote:
         | When we deal with many different multivariate time series, each
         | time series can have a different number of variates. So "any-
         | variate" means that the model is able to take as inputs
         | multivariate time series with arbitrary number of variates, and
         | model the interactions with the Transformer's attention
         | mechanism. This is something that many other TS foundation
         | models do not consider yet - they convert all multivariate time
         | series into multiple univariate time series.
         | 
         | Whether or not the forecasts improves as a result of the
         | additional covariates is still an open question which needs to
         | be studied more -- we need to build better evaluations and
         | benchmarks for this.
        
           | dcl wrote:
           | Understood, thank you. There are certainly applications in
           | demand sensing/demand forecasting where things like recent
           | order information, recent sales, CRM inputs are quite
           | predictive of near-term outcomes, but become useless for
           | longer horizon forecasts. In my experience, when information
           | like this is available, no time-series technique that is
           | unable to leverage this information would beat even simple
           | regressions for short term horizon forecasts.
        
           | lukas_b wrote:
           | This looks very interesting! I'm trying to understand if the
           | flattening technique might work for my ts. It's structured as
           | follows: At each time step t, I have an m by n data matrix.
           | The value for m (rows) varies per time step. n stays constant
           | and represents the features. And i want to predict one of the
           | n values. (In this case, t represents a single day, m (rows)
           | represent the people that entered a store on that day, and n
           | (cols) represent various features of the people. I want to
           | predict one of those features, given the others.) The fact
           | that it's a time series matters, because i expect the
           | relationship to change over time. For instance some feature
           | n[x] (person wears a yellow shirt) might be correlated with
           | my target feature n[y] (person steals) but only in the
           | summer. would it be possible to flatten this too? What would
           | that look like?
        
         | hackerlight wrote:
         | They flatten the time and variate dimensions into a single 1D
         | vector. So it can handle arbitrary numbers of features.
        
       | longdog wrote:
       | Interesting, but I'm very skeptical. There are over a dozen
       | transformers-based foundation time series model released in the
       | past year and without fail, every one of them claims to be at or
       | near SOTA. For example:
       | 
       | - Time-LLM (https://arxiv.org/abs/2310.01728)
       | 
       | - Lag-Llama (https://arxiv.org/abs/2310.08278)
       | 
       | - UniTime (https://arxiv.org/abs/2310.09751)
       | 
       | - TEMPO (https://arxiv.org/abs/2310.04948)
       | 
       | - TimeGPT (https://arxiv.org/abs/2310.03589)
       | 
       | - TimesFM (https://arxiv.org/html/2310.10688v2)
       | 
       | - GPT4TS (https://arxiv.org/pdf/2308.08469.pdf)
       | 
       | Yet not a SINGLE transformer-based model I've managed to
       | successfully run has beaten gradient boosted tree models on my
       | use case (economic forecasting). To be honest I believe these
       | foundational models are all vastly overfit. There's basically
       | only 2 benchmarking sets that are ever used in time series (the
       | Monash set and the M-competition set), so it'd be easy to
       | overtune a model just to perform well on these.
       | 
       | I would love to see someone make a broader set of varied
       | benchmarks and have an independent third party do these
       | evaluations like with LLM leaderboards. Otherwise I assume all
       | published benchmarks are 100% meaningless and gamed.
        
         | dcl wrote:
         | Why would you expect anything to work well for economic
         | forecasting :p
        
           | donbreo wrote:
           | Jamie pull up the article that proves none of the published
           | models work well with economic forecasting
        
             | rokkitmensch wrote:
             | I'm so sad. This hilarious comment is languishing in the
             | doldrums.
        
               | idiotsecant wrote:
               | Not reddit.
        
             | ImHereToVote wrote:
             | There is always Gary Stevensons Economics model. Works
             | without fail.
        
         | hackerlight wrote:
         | Neural nets are known to struggle with tabular data. Have you
         | tried fine tuning or attaching a decoder somewhere that you
         | train on your task? Zero-shot inference might be asking for too
         | much.
        
           | boredemployee wrote:
           | >> Neural nets are known to struggle with tabular data.
           | 
           | Not disagreeing with you, and I'm not a specialist, but it's
           | funny that lot of papers seem to claim exactly the opposite.
        
             | hackerlight wrote:
             | What paper says the opposite? This is what I can find:
             | 
             | https://arxiv.org/abs/2207.08815
             | 
             | https://arxiv.org/abs/2305.02997
        
         | logicchains wrote:
         | Pretty much any real-world time series prediction task is going
         | to involve more data than just the time series itself, and some
         | of this data will probably be tabular, so it's not surprise
         | gradient boosted trees perform better.
        
         | Tarq0n wrote:
         | Honestly the best part of this paper is they've put together a
         | large new set of time series for benchmarking.
        
         | tudorw wrote:
         | https://facebook.github.io/prophet/
         | 
         | "Prophet is a procedure for forecasting time series data based
         | on an additive model where non-linear trends are fit with
         | yearly, weekly, and daily seasonality, plus holiday effects. It
         | works best with time series that have strong seasonal effects
         | and several seasons of historical data. Prophet is robust to
         | missing data and shifts in the trend, and typically handles
         | outliers well."
        
           | refulgentis wrote:
           | ?
        
       | wenc wrote:
       | They should sign up for the next Makridakis forecasting
       | competition.
       | 
       | https://en.wikipedia.org/wiki/Makridakis_Competitions
       | 
       | Makridakis and Hibon reached the sad conclusion that
       | "statistically sophisticated and complex methods do not
       | necessarily provide more accurate forecasts than simpler ones."
        
         | hcarlens wrote:
         | That was true in the first Makridakis competition ("M1") in
         | 1982, and possibly until M4 in 2018, but both M5 and M6 were
         | won by what would generally be considered relatively
         | sophisticated methods (e.g. LightGBM).
         | 
         | The Wikipedia article doesn't have that much detail on M5 or
         | M6, but the M5 papers are in the International Journal of
         | Forecasting[1] and M6 should be published later this year
         | (there's already a preprint on arxiv [2]).
         | 
         | I recently spent some time looking into the history and results
         | of the M competitions and had a chance to speak to Professor
         | Makridakis about them, as well as the winners of each of the M6
         | competition tracks [3]. While the methods have become more
         | sophisticated, some conclusions from M1 still seem to hold: in
         | particular, that there is no overall "best" method, and that
         | the winning method tends to be different for different types of
         | data, time horizons, and evaluation metrics.
         | 
         | [1]:
         | https://www.sciencedirect.com/science/article/pii/S016920702...
         | [2]: https://arxiv.org/abs/2310.13357 [3]:
         | https://mlcontests.com/state-of-competitive-machine-learning...
        
           | vermorel wrote:
           | Our basic low-dimensional parametric model landed No1 at the
           | SKU level at the M5, see my lecture
           | https://www.lokad.com/tv/2022/1/5/no1-at-the-sku-level-in-
           | th... (more references at the bottom)
        
             | hcarlens wrote:
             | Interesting, thanks for sharing!
        
           | wenc wrote:
           | A recent thread on Amazon's new Chronos forecasting model
           | showed that an ensemble of simple models outperformed it (a
           | highly parametrized transformed model) on the M competition
           | datasets.
           | 
           | https://github.com/Nixtla/nixtla/tree/main/experiments/amazo.
           | ..
        
       | thelastbender12 wrote:
       | I'm curious where universal forecasting models are most useful.
       | It is technically fascinating but forecasting specifically seems
       | like a domain where you'd want interpretable modeling - you use
       | it for big-value problems and it significantly affects your
       | action/policy. So, the tradeoff between performance and model
       | simplicity should lean towards the latter?
        
         | jdowner wrote:
         | So I am not alone! There seem so few people who hold this view
         | these days.
        
           | paul80808 wrote:
           | Same for my shop - we manage a large pool of cost driven by
           | partially forcastable factors; we've repeatedly rejected
           | methods purely on explainability grounds. Our accountability
           | requirements do not allow us to point the finger at an LLM if
           | we get it wrong.
        
           | melondonkey wrote:
           | I know. Here I am modeling my data generating process like a
           | chump.
        
       | laylower wrote:
       | Show us how it performs against other models on the M3, M4 and M5
       | competition.
       | 
       | This is the gold standard of forecasting tools.
       | 
       | Moirai stands for fates [https://en.wikipedia.org/wiki/Moirai] in
       | Greek mythology
        
       | tsurba wrote:
       | Very cool that the dataset and model weights are open right away!
       | This paper also doesn't have a bunch of weird architectural
       | choices pulled out of nowhere like the other TS foundation models
       | recently. Looks like it will actually be useful, thank you! Maybe
       | I will actually get to do representation learning for TS during
       | my PhD.
       | 
       | As a sidenote/rant, it would be nice if all _supervised_ TS
       | benchmarks included  "DLinear + RevIN" as the standard baseline,
       | as in my experiments it tends to get about the same performance
       | as all other new SOTA forecasting models. Most papers compare to
       | the linear model without RevIN while they themselves use it, and
       | only beat it because of that :) And in any case supervised
       | training of transformers from scratch on datasets having less
       | than 1M points is just stupid (so less raw data than a single
       | image?). Less than 1B is still at least mildly stupid.
       | 
       | Here of course the angle is zero-shot so its somewhat excused
       | from this, but it still would be interesting whether it can beat
       | that supervised model combination.
        
       | pbronez wrote:
       | References this paper on Time Series transformers. First I've
       | seen someone apply transformers to time series specifically. Very
       | curious how well this might work for low-frequency events.
       | 
       | https://arxiv.org/abs/2402.02592
        
       | melondonkey wrote:
       | One detail I don't really understand is the low-variance normal
       | component of the target mixture. Would be curious to see from the
       | weights how often that was used
        
       | magundu wrote:
       | Any one tried this for Prometheus metrics?
        
       ___________________________________________________________________
       (page generated 2024-03-26 23:02 UTC)