[HN Gopher] XLSTMTime: Long-Term Time Series Forecasting with xLSTM
___________________________________________________________________
XLSTMTime: Long-Term Time Series Forecasting with xLSTM
Author : beefman
Score : 120 points
Date : 2024-07-16 17:14 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| carbocation wrote:
| > In recent years, transformer-based models have gained
| prominence in multivariate long-term time series forecasting
|
| Prominence, yes. But are they generally better than non-deep
| learning models? My understanding was that this is not the case,
| but I don't follow this field closely.
| Pandabob wrote:
| While I don't have firsthand experience with these models, I
| recently discussed this topic with a friend who has used tree-
| based models like XGBoost for time series analysis. They noted
| that transformer-based architectures tend to yield decent
| performance on time series tasks with relatively little effort
| compared to tree models.
|
| From what I understood, tree-based models can usually
| outperform transformers when given sufficient parameter tuning.
| However, models like TimeGPT offer decent performance without
| extensive tuning, making them an attractive option for quicker
| implementations.
| techwizrd wrote:
| In my aviation safety work, deep learning outperforms
| traditional non-DL models for multivariate time-series
| forecasting. Between deep learning models, I've had a wide
| variance in performance between transformers, Bi-LSTMs, regular
| MLPs, VAEs, and so on.
| theLiminator wrote:
| What's your go-to model that generally performs well with
| little tuning?
| techwizrd wrote:
| If you have short time-series with low variance, noise and
| outliers, strong prior knowledge, or limited resources to
| train and maintain a model, I would stick with simpler
| traditional models.
|
| If DL is a good fit for your use-case, then I tend to like
| transformers or combining CNNs with recurrent models (e.g.,
| BiGRU, GRU, BiLSTM, LSTM) and optional attention.
| montereynack wrote:
| Seconding the other question, would be curious to know
| ramon156 wrote:
| Now take into account that it has to be lightweight and DL
| falls shirt
| nerdponx wrote:
| What are you doing in aviation safety that requires time
| series modeling? Weather?
| dongobread wrote:
| From experience in payments/spending forecasting, I've found
| that deep learning generally underperform gradient-boosted tree
| models. Deep learning models tend to be good at learning
| seasonality but do not handle complex trends or shocks very
| well. Economic/financial data tends to have straightforward
| seasonality with complex trends, so deep learning tends to do
| quite poorly.
|
| I do agree with this paper - all of the good deep learning time
| series architectures I've tried are simple extensions of MLPs
| or RNNs (e.g. DeepAR or N-BEATS). The transformer-based
| architectures I've used have been absolutely awful, especially
| the endless stream of transformer-based "foundational models"
| that are coming out these days.
| sigmoid10 wrote:
| Transformers are just MLPs with extra steps. So in theory
| they should be just as powerful. The problem with
| transformers is simultaneously their big advantage: They
| scale extremely well with larger networks and more training
| data. Better so than any other architecture out there. So if
| you had enormous datasets and unlimited compute budget, you
| could probably do amazing things in this regard as well. But
| if you're just a mortal data scientist without extra funding,
| you will be better off with more traditional approaches.
| dongobread wrote:
| I think what you say is true when comparing transformers to
| CNNs/RNNs, but not to MLPs.
|
| Transformers, RNNs, and CNNs are all techniques to reduce
| parameter count compared to a pure-MLP model. If you took a
| transformer model and replaced each self-attention layer
| with a linear layer+activation function, you'd have a pure
| MLP model that can model every relationship the transformer
| does, but can model more possible relationships as well
| (but at the cost of tons more parameters). MLPs are more
| powerful/scalable but transformers are more efficient.
|
| Compared to MLPs, transformers save on parameter count by
| skimping on the number of parameters devoted to modeling
| the relationship between tokens. This works in language
| modeling, where relationships between tokens isn't _that_
| important - you can jumble up the words in this sentence
| and it still mostly makes sense. This doesn 't work in time
| series, where relationships between tokens (timesteps) is
| the most important thing of all. The LTSF paper linked in
| the OP paper also mentions this same problem:
| https://arxiv.org/pdf/2205.13504 (see section 1)
| rjurney wrote:
| They aren't so hot, but recent efforts at transfer learning
| were promising.
| svnt wrote:
| The paper says this in the next paragraph. xLSTMTime is not
| transformer-based either.
| Dowwie wrote:
| marketed as a forecasting tool, so is this not applicable to
| event classification in time series?
| RamblingCTO wrote:
| I'd say that's kind of a different task. I'm not a pro in this,
| but you could maybe treat it as a multi-variate forecast
| problem where the targets are probabilities per event if n is
| really small?
| jimmySixDOF wrote:
| Yes, I would be interested where this (and any Transformer/LLM
| based approach) is improving anomaly detection for example.
| greatpostman wrote:
| The best deep learning time series models are closed source
| inside hedge funds.
| 3abiton wrote:
| I think hedge funds, at least the advanced once, definitely
| don't use time series modelling anymore. That's quit outdated
| nowadays.
| max_ wrote:
| What do you suspect they are using?
| meowkit wrote:
| They pull data from all kinds of things now.
|
| For example, satellite imagery of trucking activity
| correlated to specific companies or industries.
|
| Its all signal processing at some level, but directly
| modeling the time series of price or other asset metrics
| doesn't have the alpha it may have had decades ago.
| greatpostman wrote:
| Alternative data is passed into time series models. They
| are features.
|
| You don't know as much about this as you think
| myhf wrote:
| emoji hand pointing up
| nextos wrote:
| Some funds that tried to recruit me were really interested
| in classical generative models (ARMA, GARCH, HMMs with
| heavy-tailed emissions, etc.) extended with deep components
| to make them more flexible. Pyro and Kevin Murphy's ProbML
| vol II are a good starting point to learn more about these
| topics.
|
| The key is to understand that in some of these problems,
| data is relatively scarce, and it is really important to
| quantify uncertainty.
| rjurney wrote:
| There are many ways of approaching quantitative trading and
| many people do employ time series analysis, especially for
| high frequency trading.
| fermisea wrote:
| Most of the hard work is actually feature construction rather
| than monolithic models. And afaik gradient boosting still rules
| the world
| thedudeabides5 wrote:
| cant wait for someone to lose all their money trying to predict
| stocks with this thing
| nyanpasu64 wrote:
| I misread this as XSLT :')
| selimnairb wrote:
| Same. I am old?
| optimalsolver wrote:
| Reminder: If someone's time series forecasting method worked,
| they wouldn't be publishing it.
| dongobread wrote:
| They definitely would and do, the vast majority of time series
| work is not about asset prices or beating the stock market
| dlojudice wrote:
| Is this somehow related to the Google weather prediction model
| using AI [1]?
|
| https://deepmind.google/discover/blog/graphcast-ai-model-for...
| brcmthrowaway wrote:
| Wow, is there a way to apply this to financial trading?
___________________________________________________________________
(page generated 2024-07-16 23:00 UTC)