[HN Gopher] XLSTMTime: Long-Term Time Series Forecasting with xLSTM
___________________________________________________________________
XLSTMTime: Long-Term Time Series Forecasting with xLSTM
Author : beefman
Score : 217 points
Date : 2024-07-16 17:14 UTC (1 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| carbocation wrote:
| > In recent years, transformer-based models have gained
| prominence in multivariate long-term time series forecasting
|
| Prominence, yes. But are they generally better than non-deep
| learning models? My understanding was that this is not the case,
| but I don't follow this field closely.
| Pandabob wrote:
| While I don't have firsthand experience with these models, I
| recently discussed this topic with a friend who has used tree-
| based models like XGBoost for time series analysis. They noted
| that transformer-based architectures tend to yield decent
| performance on time series tasks with relatively little effort
| compared to tree models.
|
| From what I understood, tree-based models can usually
| outperform transformers when given sufficient parameter tuning.
| However, models like TimeGPT offer decent performance without
| extensive tuning, making them an attractive option for quicker
| implementations.
| techwizrd wrote:
| In my aviation safety work, deep learning outperforms
| traditional non-DL models for multivariate time-series
| forecasting. Between deep learning models, I've had a wide
| variance in performance between transformers, Bi-LSTMs, regular
| MLPs, VAEs, and so on.
| theLiminator wrote:
| What's your go-to model that generally performs well with
| little tuning?
| techwizrd wrote:
| If you have short time-series with low variance, noise and
| outliers, strong prior knowledge, or limited resources to
| train and maintain a model, I would stick with simpler
| traditional models.
|
| If DL is a good fit for your use-case, then I tend to like
| transformers or combining CNNs with recurrent models (e.g.,
| BiGRU, GRU, BiLSTM, LSTM) and optional attention.
| montereynack wrote:
| Seconding the other question, would be curious to know
| ramon156 wrote:
| Now take into account that it has to be lightweight and DL
| falls shirt
| nerdponx wrote:
| What are you doing in aviation safety that requires time
| series modeling? Weather?
| all2 wrote:
| My best guess would be accident occurrence prediction.
| dongobread wrote:
| From experience in payments/spending forecasting, I've found
| that deep learning generally underperform gradient-boosted tree
| models. Deep learning models tend to be good at learning
| seasonality but do not handle complex trends or shocks very
| well. Economic/financial data tends to have straightforward
| seasonality with complex trends, so deep learning tends to do
| quite poorly.
|
| I do agree with this paper - all of the good deep learning time
| series architectures I've tried are simple extensions of MLPs
| or RNNs (e.g. DeepAR or N-BEATS). The transformer-based
| architectures I've used have been absolutely awful, especially
| the endless stream of transformer-based "foundational models"
| that are coming out these days.
| sigmoid10 wrote:
| Transformers are just MLPs with extra steps. So in theory
| they should be just as powerful. The problem with
| transformers is simultaneously their big advantage: They
| scale extremely well with larger networks and more training
| data. Better so than any other architecture out there. So if
| you had enormous datasets and unlimited compute budget, you
| could probably do amazing things in this regard as well. But
| if you're just a mortal data scientist without extra funding,
| you will be better off with more traditional approaches.
| dongobread wrote:
| I think what you say is true when comparing transformers to
| CNNs/RNNs, but not to MLPs.
|
| Transformers, RNNs, and CNNs are all techniques to reduce
| parameter count compared to a pure-MLP model. If you took a
| transformer model and replaced each self-attention layer
| with a linear layer+activation function, you'd have a pure
| MLP model that can model every relationship the transformer
| does, but can model more possible relationships as well
| (but at the cost of tons more parameters). MLPs are more
| powerful/scalable but transformers are more efficient.
|
| Compared to MLPs, transformers save on parameter count by
| skimping on the number of parameters devoted to modeling
| the relationship between tokens. This works in language
| modeling, where relationships between tokens isn't _that_
| important - you can jumble up the words in this sentence
| and it still mostly makes sense. This doesn 't work in time
| series, where relationships between tokens (timesteps) is
| the most important thing of all. The LTSF paper linked in
| the OP paper also mentions this same problem:
| https://arxiv.org/pdf/2205.13504 (see section 1)
| immibis wrote:
| Transformers reduce the number of relationships between
| tokens that must be learned, too. An MLP has to
| separately learn all possible relationships between token
| 1 and 2, and 2 and 3, and 3 and 4. A transformer can
| learn relationships between specific values regardless of
| position.
| newrotik wrote:
| Though I agree with the idea that MLPs are theoretically
| more "capable" than transformers, I think seeing them
| just as a parameter reduction technique is also
| excessively reductive.
|
| Many have tried to build deep and large MLPs for a long
| time, but at some point adding more parameters wouldn't
| increase models' performance.
|
| In contrast, transformers became so popular because their
| modelling power just kept scaling with more and more data
| and more and more parameters. It seems like the
| 'restriction' imposed on transformaters (the attention
| structure) is a verg good functional form for modelling
| language (and, more and more, some tasks in vision and
| audio).
|
| They did not become popular because they were modest with
| respect to the parameters used.
| sigmoid10 wrote:
| >Compared to MLPs, transformers save on parameter count
| by skimping on the number of parameters
|
| That is only correct if you look at models with equal
| parameter count from a purely theoretical perspective. In
| practice, it is possible to train transformers to orders
| of magnitude bigger scales than MLPs because they are so
| much more efficient. That's why I said a modern
| transformer will easily beat these puny modern MLPs, but
| only in cases where data and compute budgets allow it.
| That is not even a question. If you look at recent time
| series forecasting leaderboard entries, you'll almost
| always see transformers playing along at the top of it:
| https://github.com/thuml/Time-Series-Library
| rjurney wrote:
| They aren't so hot, but recent efforts at transfer learning
| were promising.
| svnt wrote:
| The paper says this in the next paragraph. xLSTMTime is not
| transformer-based either.
| Dowwie wrote:
| marketed as a forecasting tool, so is this not applicable to
| event classification in time series?
| RamblingCTO wrote:
| I'd say that's kind of a different task. I'm not a pro in this,
| but you could maybe treat it as a multi-variate forecast
| problem where the targets are probabilities per event if n is
| really small?
| jimmySixDOF wrote:
| Yes, I would be interested where this (and any Transformer/LLM
| based approach) is improving anomaly detection for example.
| spmurrayzzz wrote:
| I can't speak for all use cases, but I've done a great deal
| of work in the space of using deep learning approaches for
| anomaly detection in network device telemetry. In particular
| with high resolution univariate time series of latency
| measurements, we saw success using convolutional autoencoders
| and GANs. These methods lean on reconstruction loss rather
| than forecasting, but still effective.
|
| There is some prior art for this that we leaned on [1][2].
|
| RE: transformers -- I did some early experimentation with
| Temporal Fusion Transformers [3] which worked pretty well for
| forecasting compared to other deep learning methods, but
| rarely did I see it outperform standard baselines (like
| ARIMA) in our datasets.
|
| [1] https://www.mdpi.com/2076-3417/12/23/12472
|
| [2] https://arxiv.org/abs/2009.07769
|
| [3] https://arxiv.org/abs/1912.09363
| greatpostman wrote:
| The best deep learning time series models are closed source
| inside hedge funds.
| 3abiton wrote:
| I think hedge funds, at least the advanced once, definitely
| don't use time series modelling anymore. That's quit outdated
| nowadays.
| max_ wrote:
| What do you suspect they are using?
| meowkit wrote:
| They pull data from all kinds of things now.
|
| For example, satellite imagery of trucking activity
| correlated to specific companies or industries.
|
| Its all signal processing at some level, but directly
| modeling the time series of price or other asset metrics
| doesn't have the alpha it may have had decades ago.
| greatpostman wrote:
| Alternative data is passed into time series models. They
| are features.
|
| You don't know as much about this as you think
| myhf wrote:
| emoji hand pointing up
| nextos wrote:
| Some funds that tried to recruit me were really interested
| in classical generative models (ARMA, GARCH, HMMs with
| heavy-tailed emissions, etc.) extended with deep components
| to make them more flexible. Pyro and Kevin Murphy's ProbML
| vol II are a good starting point to learn more about these
| topics.
|
| The key is to understand that in some of these problems,
| data is relatively scarce, and it is really important to
| quantify uncertainty.
| rjurney wrote:
| There are many ways of approaching quantitative trading and
| many people do employ time series analysis, especially for
| high frequency trading.
| fermisea wrote:
| Most of the hard work is actually feature construction rather
| than monolithic models. And afaik gradient boosting still rules
| the world
| energy123 wrote:
| There is no such thing as a generally best model due to the no
| free lunch theorem. What works in hedge funds will be bad in
| other areas that need less or different inductive biases due to
| having more or less data and different data.
| thedudeabides5 wrote:
| cant wait for someone to lose all their money trying to predict
| stocks with this thing
| nyanpasu64 wrote:
| I misread this as XSLT :')
| selimnairb wrote:
| Same. I am old?
| ThomasBHickey wrote:
| Me too (and yes, I'm old)
| mikepurvis wrote:
| 100% clicked thinking I was getting into an article about XML
| and wondering how interesting that was in 2024. Simultaneously
| disappointed and pleased.
| antod wrote:
| Yup. And it's about transforms too.
| optimalsolver wrote:
| Reminder: If someone's time series forecasting method worked,
| they wouldn't be publishing it.
| dongobread wrote:
| They definitely would and do, the vast majority of time series
| work is not about asset prices or beating the stock market
| musleh2 wrote:
| The Transformer model, despite being one of the most successful
| in AI history, was still being published.
| logicchains wrote:
| It's a sequence model, not a time-series model. All time
| series are sequences but not all sequences are time series.
| dlojudice wrote:
| Is this somehow related to the Google weather prediction model
| using AI [1]?
|
| https://deepmind.google/discover/blog/graphcast-ai-model-for...
| scellus wrote:
| No, Graphcast is a graph transformer trained on ERA5 weather
| reconstructions of the atmosphere, not a general time series
| prediction model. It by the way outperforms all traditional
| global point forecasts (non-ensembles), at least on predicting
| large-scale global patterns (Z500 and such, on the lag of 3-10
| days or so). ECMWF has AIFS that is a derivate of Graphcast,
| they'll probably get it or something similar to production in a
| couple of years.
| wafngar wrote:
| AIFS is transformer based (Graphcast is pure GNN) so
| different architecture and is already running operationally,
| see:
|
| https://www.ecmwf.int/en/about/media-centre/aifs-
| blog/2024/i...
| brcmthrowaway wrote:
| Wow, is there a way to apply this to financial trading?
| musleh2 wrote:
| If you have dataset in financial , I can try it for you
| localfirst wrote:
| time series forecasting works best with deterministic domains.
| none of the published LLM/AI/Deep/Machine techniques do well in
| the stock market. Absolutely none. we've tried them all.
| dkga wrote:
| A part of my work is literally building nowcasting and other
| types of prediction models in economics (inflation, GDP etc) and
| finance (market liquidity, etc). I haven't yet had a chance to
| read the paper but overall the tone of "transformers are great
| for what they do but LSTM-type of models are very valuable still"
| completely resonates with me.
| uoaei wrote:
| Have you had the chance to apply Mamba to your work at all?
| Thoughts?
| _0ffh wrote:
| Too bad the dataset link in the paper isn't working. I hope
| that'll get amended.
___________________________________________________________________
(page generated 2024-07-17 23:09 UTC)