[HN Gopher] XLSTMTime: Long-Term Time Series Forecasting with xLSTM
       ___________________________________________________________________
        
       XLSTMTime: Long-Term Time Series Forecasting with xLSTM
        
       Author : beefman
       Score  : 120 points
       Date   : 2024-07-16 17:14 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | carbocation wrote:
       | > In recent years, transformer-based models have gained
       | prominence in multivariate long-term time series forecasting
       | 
       | Prominence, yes. But are they generally better than non-deep
       | learning models? My understanding was that this is not the case,
       | but I don't follow this field closely.
        
         | Pandabob wrote:
         | While I don't have firsthand experience with these models, I
         | recently discussed this topic with a friend who has used tree-
         | based models like XGBoost for time series analysis. They noted
         | that transformer-based architectures tend to yield decent
         | performance on time series tasks with relatively little effort
         | compared to tree models.
         | 
         | From what I understood, tree-based models can usually
         | outperform transformers when given sufficient parameter tuning.
         | However, models like TimeGPT offer decent performance without
         | extensive tuning, making them an attractive option for quicker
         | implementations.
        
         | techwizrd wrote:
         | In my aviation safety work, deep learning outperforms
         | traditional non-DL models for multivariate time-series
         | forecasting. Between deep learning models, I've had a wide
         | variance in performance between transformers, Bi-LSTMs, regular
         | MLPs, VAEs, and so on.
        
           | theLiminator wrote:
           | What's your go-to model that generally performs well with
           | little tuning?
        
             | techwizrd wrote:
             | If you have short time-series with low variance, noise and
             | outliers, strong prior knowledge, or limited resources to
             | train and maintain a model, I would stick with simpler
             | traditional models.
             | 
             | If DL is a good fit for your use-case, then I tend to like
             | transformers or combining CNNs with recurrent models (e.g.,
             | BiGRU, GRU, BiLSTM, LSTM) and optional attention.
        
           | montereynack wrote:
           | Seconding the other question, would be curious to know
        
           | ramon156 wrote:
           | Now take into account that it has to be lightweight and DL
           | falls shirt
        
           | nerdponx wrote:
           | What are you doing in aviation safety that requires time
           | series modeling? Weather?
        
         | dongobread wrote:
         | From experience in payments/spending forecasting, I've found
         | that deep learning generally underperform gradient-boosted tree
         | models. Deep learning models tend to be good at learning
         | seasonality but do not handle complex trends or shocks very
         | well. Economic/financial data tends to have straightforward
         | seasonality with complex trends, so deep learning tends to do
         | quite poorly.
         | 
         | I do agree with this paper - all of the good deep learning time
         | series architectures I've tried are simple extensions of MLPs
         | or RNNs (e.g. DeepAR or N-BEATS). The transformer-based
         | architectures I've used have been absolutely awful, especially
         | the endless stream of transformer-based "foundational models"
         | that are coming out these days.
        
           | sigmoid10 wrote:
           | Transformers are just MLPs with extra steps. So in theory
           | they should be just as powerful. The problem with
           | transformers is simultaneously their big advantage: They
           | scale extremely well with larger networks and more training
           | data. Better so than any other architecture out there. So if
           | you had enormous datasets and unlimited compute budget, you
           | could probably do amazing things in this regard as well. But
           | if you're just a mortal data scientist without extra funding,
           | you will be better off with more traditional approaches.
        
             | dongobread wrote:
             | I think what you say is true when comparing transformers to
             | CNNs/RNNs, but not to MLPs.
             | 
             | Transformers, RNNs, and CNNs are all techniques to reduce
             | parameter count compared to a pure-MLP model. If you took a
             | transformer model and replaced each self-attention layer
             | with a linear layer+activation function, you'd have a pure
             | MLP model that can model every relationship the transformer
             | does, but can model more possible relationships as well
             | (but at the cost of tons more parameters). MLPs are more
             | powerful/scalable but transformers are more efficient.
             | 
             | Compared to MLPs, transformers save on parameter count by
             | skimping on the number of parameters devoted to modeling
             | the relationship between tokens. This works in language
             | modeling, where relationships between tokens isn't _that_
             | important - you can jumble up the words in this sentence
             | and it still mostly makes sense. This doesn 't work in time
             | series, where relationships between tokens (timesteps) is
             | the most important thing of all. The LTSF paper linked in
             | the OP paper also mentions this same problem:
             | https://arxiv.org/pdf/2205.13504 (see section 1)
        
         | rjurney wrote:
         | They aren't so hot, but recent efforts at transfer learning
         | were promising.
        
         | svnt wrote:
         | The paper says this in the next paragraph. xLSTMTime is not
         | transformer-based either.
        
       | Dowwie wrote:
       | marketed as a forecasting tool, so is this not applicable to
       | event classification in time series?
        
         | RamblingCTO wrote:
         | I'd say that's kind of a different task. I'm not a pro in this,
         | but you could maybe treat it as a multi-variate forecast
         | problem where the targets are probabilities per event if n is
         | really small?
        
         | jimmySixDOF wrote:
         | Yes, I would be interested where this (and any Transformer/LLM
         | based approach) is improving anomaly detection for example.
        
       | greatpostman wrote:
       | The best deep learning time series models are closed source
       | inside hedge funds.
        
         | 3abiton wrote:
         | I think hedge funds, at least the advanced once, definitely
         | don't use time series modelling anymore. That's quit outdated
         | nowadays.
        
           | max_ wrote:
           | What do you suspect they are using?
        
             | meowkit wrote:
             | They pull data from all kinds of things now.
             | 
             | For example, satellite imagery of trucking activity
             | correlated to specific companies or industries.
             | 
             | Its all signal processing at some level, but directly
             | modeling the time series of price or other asset metrics
             | doesn't have the alpha it may have had decades ago.
        
               | greatpostman wrote:
               | Alternative data is passed into time series models. They
               | are features.
               | 
               | You don't know as much about this as you think
        
               | myhf wrote:
               | emoji hand pointing up
        
             | nextos wrote:
             | Some funds that tried to recruit me were really interested
             | in classical generative models (ARMA, GARCH, HMMs with
             | heavy-tailed emissions, etc.) extended with deep components
             | to make them more flexible. Pyro and Kevin Murphy's ProbML
             | vol II are a good starting point to learn more about these
             | topics.
             | 
             | The key is to understand that in some of these problems,
             | data is relatively scarce, and it is really important to
             | quantify uncertainty.
        
           | rjurney wrote:
           | There are many ways of approaching quantitative trading and
           | many people do employ time series analysis, especially for
           | high frequency trading.
        
         | fermisea wrote:
         | Most of the hard work is actually feature construction rather
         | than monolithic models. And afaik gradient boosting still rules
         | the world
        
       | thedudeabides5 wrote:
       | cant wait for someone to lose all their money trying to predict
       | stocks with this thing
        
       | nyanpasu64 wrote:
       | I misread this as XSLT :')
        
         | selimnairb wrote:
         | Same. I am old?
        
       | optimalsolver wrote:
       | Reminder: If someone's time series forecasting method worked,
       | they wouldn't be publishing it.
        
         | dongobread wrote:
         | They definitely would and do, the vast majority of time series
         | work is not about asset prices or beating the stock market
        
       | dlojudice wrote:
       | Is this somehow related to the Google weather prediction model
       | using AI [1]?
       | 
       | https://deepmind.google/discover/blog/graphcast-ai-model-for...
        
       | brcmthrowaway wrote:
       | Wow, is there a way to apply this to financial trading?
        
       ___________________________________________________________________
       (page generated 2024-07-16 23:00 UTC)