[HN Gopher] Chronos: Learning the Language of Time Series
       ___________________________________________________________________
        
       Chronos: Learning the Language of Time Series
        
       Author : Anon84
       Score  : 185 points
       Date   : 2024-03-22 03:25 UTC (19 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | roca wrote:
       | Doesn't cite TimesFM for some reason. Maybe the latter was
       | published after this paper went camera-ready?
       | https://blog.research.google/2024/02/a-decoder-only-foundati...
        
         | ghc wrote:
         | Because these approaches as likely derived from papers
         | published 3-5 years ago. At this point neither TimesFM or
         | Chronos is particularly novel. I've had similar models in
         | production for complex time series for 18 months now.
        
       | izyda wrote:
       | I do not have a horse in the race, but it is interesting to see
       | open source comparisons to traditional timeseries strategies:
       | https://github.com/Nixtla/nixtla/tree/main/experiments/amazo...
       | 
       | In general, the M-Competitions
       | (https://forecasters.org/resources/time-series-data/), the
       | olympics of timeseries forecasting, have proven frustrating for
       | ML methods... linear models do shockingly well and the ML models
       | that have won, generally seem to be variants of older tree-based
       | methods (ie. LightGBM is a favorite).
       | 
       | Will be interesting to see whether the Transformer architecture
       | ends up making real progress here.
        
         | one_buggy_boi wrote:
         | Are these models high risk because of their lack of
         | interpratability? Specialized models like temporal fusion
         | transformers attempt to solve this but in practice I'm seeing
         | folks torn apart when defending transformers against model risk
         | committees within organizations that are mature enough to have
         | them.
        
           | tomrod wrote:
           | Interpretability is just one pillar to satisfy in AI
           | governance. You have build submodels to assist with
           | interpreting black box main prediction models.
        
         | rdedev wrote:
         | Is there a way to directly train transformer models to output
         | embeddings that could help tree based models downstream? For
         | tabular data tree based models seems to be the best but I feel
         | like foundational models could help them in some way
        
         | wenc wrote:
         | They are comparing a non-ensembled transformer model with an
         | ensemble of simple linear models. It's not surprising that the
         | ensemble models of linear time series models will do well,
         | since ensembles optimize for the bias-variance trade-off.
         | 
         | Transformer/ML models by themselves have a tendency to overfit
         | past patterns. They pick up more signal in the patterns, but
         | they also pick up spurious patterns. They're low bias but high
         | variance.
         | 
         | It would be more interesting to compare an ensemble of
         | transformer models with an ensemble of linear models to see
         | which is more accurate.
         | 
         | (that said, it's pretty impressive that an ensemble of simple
         | linear models can beat a large scale transformer model -- this
         | tells me the domain being forecast has a high degree of
         | variance, which transformer models by themselves don't do well
         | on.)
        
           | gradascent wrote:
           | fyi I think you have bias and variance the wrong way around.
           | Over-fitting indicates high variance
        
             | wenc wrote:
             | Thank you for catching that. Corrected.
        
           | hackerlight wrote:
           | > ensemble of transformer models
           | 
           | Isn't that just dropout?
        
             | mikkom wrote:
             | No. Why do you think so?
        
       | hbcondo714 wrote:
       | Chronos is probably overkill for what I am looking to do with
       | time series data. I just did an Ask HN on time series[0] but
       | unfortunately didn't get the replies I was hoping for. Maybe this
       | thread can get the bump I need:
       | 
       |  _I inherited a large time series JSON dataset in 2024. I 've
       | been successful in using the Observable Framework[1] by writing a
       | Rust (rust-script) data loader[2] to parse and plot simple line
       | charts[3] to visually see the data. There are hundreds of graphs
       | over years of data so I would like to identify what graphs I
       | should be paying attention to. My initial thought is to calculate
       | metrics on each graph such as:                 - Variability: how
       | "spread out" are the data points from one another?       - Trend:
       | direction of data path, up or down?       - Slope: are the data
       | points increasing or decreasing?       - Level: where are the
       | data points on the vertical axis?
       | 
       | What libraries, AI, databases, etc... would you recommend that
       | would allow me to calculate these values? I am no data scientist
       | and don't need forecasting but overall, I just want a dashboard
       | that shows the most "important" graphs._
       | 
       | [1] https://observablehq.com/framework/
       | 
       | [2] https://observablehq.com/framework/loaders
       | 
       | [3] https://observablehq.com/@observablehq/plot-simple-line-
       | char...
       | 
       | edit: the x-axis is Time while the y-axis can be values such as
       | duration, frequency, intervals
       | 
       | [0] https://news.ycombinator.com/item?id=39763246
        
         | bhy wrote:
         | When you ask what data should be paying attention to, that
         | should be depends on your objective. Do you want to predict
         | something? Identify anomalies? In the end, what matters is
         | understanding the meaning and relations of these data, rather
         | than throwing them in to some ML framework and hoping to get
         | something out.
        
           | hbcondo714 wrote:
           | Prediction and anomalies are not objectives but of the 4
           | listed, I would say the primary objective is identifying a
           | trend in the data to know whether the data is moving in a
           | specific direction--increasing or decreasing in value.
           | 
           | I already added linear regression marks that draws linear
           | regression lines with confidence bands[1] to my Observable
           | plots but they do not give me a "value" so I need to manually
           | look at the graphs and read the red line.
           | 
           | [1] https://observablehq.com/plot/marks/linear-regression
        
         | notagoodidea wrote:
         | I always worked in R for time series analysis. This cookbook
         | has everything you would need for a plan to analyze a time
         | series [0] and this book provides a strong base and
         | understanding while being focus on forecasting. [1] Have fun !
         | 
         | [0] https://rc2e.com/timeseriesanalysis [1]
         | https://otexts.com/fpp2/
        
           | sampo wrote:
           | > https://otexts.com/fpp2/
           | 
           | Third edition: https://otexts.com/fpp3/
        
             | ideamotor wrote:
             | Agree, great resource.
        
         | Galanwe wrote:
         | Doesn't look like you need anything fancy here.
         | 
         | Load you time serie in a dataframe, and:
         | 
         | > - Variability: how "spread out" are the data points from one
         | another?
         | 
         | So basically df.std(), with rolling variants for short term /
         | long term.
         | 
         | > - Trend: direction of data path, up or down? - Slope: are the
         | data points increasing or decreasing?
         | 
         | Just do a simple rolling linear regression of your data point
         | against time.
        
       | jethkl wrote:
       | It's great to see research in this field, I know there is
       | opportunity here, and I hope to someday benefit from progress.
       | But I skimmed the paper, and it doesn't appear solve a problem
       | that I have. From the practical standpoint, what I want from a
       | time series tool includes: 1) a small set of simple levers that I
       | can review and tune 2) short training time for any input sets of
       | size O(10k) to O(100k) (this covers seconds/day, minutes/week,
       | hours/year) 3) the process of train + forecast can run fine on
       | CPUs -- not GPUs with low memory overhead 4) decent out of the
       | box performance that basically passes the sniff test and 5) a
       | simple way to include regressors. I've enough experience to have
       | learned to be wary of fully automated tuning, benchmark
       | performance metrics, elaborate models, etc.
        
       | meow_cat wrote:
       | Maybe I'm missing something obvious, but what is the idea behind
       | quantizing and tokenizing time series? We tokenize text because
       | text isn't numbers. In the case of time series, we're... turning
       | numbers into less precise numbers? The benefit of scaling and
       | centering is trivial and i guess all timeseries ML does it, but I
       | don't see why we need a token after that.
        
         | matrix2596 wrote:
         | I'm building upon insights from this paper
         | (https://arxiv.org/pdf/2403.03950.pdf) and believe that
         | classification can sometimes outperform regression, even when
         | dealing with continuous output values. This is particularly
         | true in scenarios where the output is noisy and may assume
         | various values (multi modal). By treating the problem as
         | classification over discrete bins, we can obtain an approximate
         | distribution over these bins, rather than settling for a
         | single, averaged value as regression would yield. This approach
         | not only facilitates sampling but may also lead to more
         | favorable loss landscapes. The linked paper in this comment
         | provides more details of this idea.
        
           | lamename wrote:
           | Isn't it a given that classification would "outperform"
           | regression, assuming n_classes <
           | n_possible_continuous_labels? Turning a regression problem
           | into a classification problem bins the data, offers more
           | examples per label, simplifying the problem, with a tradeoff
           | in what granularity you can predict.
           | 
           | (It depends on what you mean by "outperform" since metrics
           | for classification and regression aren't always comparable,
           | but I think I'm following the meaning of your comment
           | overall)
        
         | dist-epoch wrote:
         | Tokenisation turns a continuous signal into a normalized
         | discrete vocabulary: stock "went up a lot", "went up a little",
         | "stayed flat". This smooths out noise and simplifies matching
         | up similar but not identical signals.
         | 
         | > We tokenize text because text isn't numbers.
         | 
         | Text is actually numbers. People tried inputting UTF8 directly
         | into transformers, but it doesn't work that well. Karpathy
         | explains why:
         | 
         | https://www.youtube.com/watch?v=zduSFxRajkE
        
           | lamename wrote:
           | Interesting. Can you explain how this is superior and/or
           | different from traditional DSP filters or other non-
           | tokenization tricks in the signal processing field?
        
             | dist-epoch wrote:
             | Traditional DSP filters still output a continuous signal.
             | And it's a well-explored domain, hard to imagine any low-
             | hanging fruit there.
             | 
             | My intuition is the following: transformers work really
             | well for text, so we could try turning a time series into a
             | "story" (limited vocabulary) and see what happens.
        
               | lamename wrote:
               | Like this or something different?
               | 
               | https://github.com/gzerveas/mvts_transformer
        
           | prlin wrote:
           | > Text is actually numbers
           | 
           | Text can be represented by numbers but they aren't the same
           | datatype. They don't support the same operations (addition,
           | subtraction, multiplication, etc).
        
         | spyder wrote:
         | I think it could also have a connection with symbolic AI: The
         | discrete tokens could be the symbols that many believe is
         | useful or necessary for reasoning. It is also useful for
         | compression, reducing memory requirements by the quantization
         | and small integer representations.
         | 
         | https://en.wikipedia.org/wiki/Neuro-symbolic_AI
        
         | intalentive wrote:
         | My guess is that it enforces a kind of sparsity constraint.
        
         | 555watch wrote:
         | My primitive understanding is that we approximate a Markovian
         | approach and indirectly model the transition probabilities just
         | by working through tokens.
        
       | sampo wrote:
       | Amazon's older time series forecasting system DeepAR, has
       | supported using external regressors since 2018 [1]. From this new
       | Chronos paper, I didn't find any mention of external regressors.
       | 
       | [1] https://aws.amazon.com/blogs/machine-learning/amazon-
       | sagemak...
        
         | kfor wrote:
         | They do mention covariates in section 6.1 - specifically how
         | this method doesn't support them but ideas on how they could in
         | the future such as via stacking:
         | 
         | > In this work, we have focused on univariate time series
         | forecasting since it constitutes the most common of real-world
         | time series use-cases. Nevertheless, practical forecasting
         | tasks often involve additional information that must be taken
         | into account. One example involves covariates, that can be
         | either time-independent (e.g., color of the product) or time-
         | varying (e.g., on which days the product is on sale). Another
         | closely related problem is multivariate forecasting, where
         | historic values of one time series (e.g., interest rates) can
         | influence the forecast for another time series (e.g., housing
         | prices). The number of covariates or multivariate dimensions
         | can vary greatly across tasks, which makes it challenging to
         | train a single model that can handle all possible combinations.
         | A possible solution may involve training task-specific adaptors
         | that inject the covariates into the pretrained forecasting
         | model (Rahman et al., 2020). As another option, we can build
         | stacking ensembles (Ting & Witten, 1997) of Chronos and other
         | light-weight models that excel at handling covariates such as
         | LightGBM (Ke et al., 2017).
        
           | sampo wrote:
           | Ah. Thank you. The same concept goes under different names,
           | so one needs to search for all of "exogenous variables",
           | "external regressors", "external factors" and "covariates".
        
       | amelius wrote:
       | What types of model are the algo-traders using these days?
        
         | mikkom wrote:
         | Do you really think the profitable algo traders are going to
         | tell you that :-)
        
           | amelius wrote:
           | Why not? Sharing information moves the field forward.
        
             | jebarker wrote:
             | Profitable algorithmic traders are not in the business of
             | moving the field forward. They're in the business of making
             | profits.
        
             | JustFinishedBSG wrote:
             | What field ? They aren't curing cancer, serves 0 purpose to
             | advance the "field".
        
             | gkbrk wrote:
             | You make money with if you have useful data others don't
             | have, or you have better algorithms that others aren't
             | using.
             | 
             | When these become publicly known and used, your system
             | doesn't work any more because the prices now include
             | whatever signal you had for yourself before.
        
               | Galanwe wrote:
               | It's a bit more subtle than that, because there are
               | feedback loops in the system. When a signal or factor
               | spreads, it does so at multiple time horizons.
               | 
               | e.g. If I have a good signal at predicting horizon 1 day,
               | then it is in my interest to have many people trading it
               | at horizon > 1 day, as they will push the price in my
               | direction.
        
       | 19h wrote:
       | We're using HTMs for time series in our quant algorithms and
       | they're performing pretty well; it's a shame that it's mostly
       | ignored my ML scientists..
        
         | lucidrains wrote:
         | oh interesting, Jeff Hawkin's HTM?
        
         | brcmthrowaway wrote:
         | are you hiring
        
       | melondonkey wrote:
       | As a practitioner the most impactful library for time series has
       | been brms, which basically gives you syntactic sugar for creating
       | statistical models in Stan. Checks all the boxes including
       | probabilistic forecasts, multiple link functions for the
       | likelihood including weiner, gamma, Gaussian, student t,
       | binomial, zero-inflated and hurdle models. Also has auto-
       | regressive and ordinal predictors and you actually learn
       | something from your data.
       | 
       | I find a lot of these ML and DL libraries to be harder to
       | troubleshoot beyond blind hyperparameter tuning whereas with
       | stats I can tweak model, modify likelihood, etc. There's also a
       | lot of high value problems that have few data points these
       | libraries tend to want at least daily data.
        
         | bethekind wrote:
         | Could you expand on what you mean by "practitioner?"
         | 
         | Also a followup question. With timeGPT and chronos advertised
         | as "foundational time series models", do you think they have
         | any value?
        
       | nyrikki wrote:
       | It may not be known yet, and this project seems to be targeted at
       | gaussian distributions, but wouldn't the simplicity bias reduce
       | sensitivity? I mean attention in transformers works so well in
       | part because OOD is typically close enough.
       | 
       | Probably just my own bias because it seems everything I deal with
       | is at least MArP and anomalies are important to my use case.
       | 
       | I can see where this is useful for others, even Amazon suggests
       | ARIMA or ETS if you don't have hundreds of related streams.
       | 
       | Is this more targeted at people who want more smoothing?
       | 
       | Or am I just missing something?
        
       | BrokrnAlgorithm wrote:
       | Coming from finance, I always wonder how and if these large pre-
       | trained models are usable on any financial time series. I see the
       | appeal of pre-trained models in areas where there is clearly a
       | stationary pattern, even if its very hidden (i.e industrial or
       | biological metrics). But given the inherently high signal/noise
       | ratio and how extremely non-stationary or chaotic the financial
       | data processes tend to be, i struggle to see the use of pre-
       | trained foundation models.
        
         | bethekind wrote:
         | I played around with timeGPT beta against predicting the sp500
         | index performance for the next day (not multi variate time
         | series as I couldn't figure out how to get it setup) and trying
         | to use the confidence intervals it generated to buy options was
         | useless at best
         | 
         | I can see chronos working a bit better, as it tries to convert
         | trends, and pieces of time series into tokens, like gpt does
         | for phrases.
         | 
         | Ie. Stock goes down terribly, then dead cat bounces. This is
         | common.
         | 
         | Stock goes up, hits resistance due to existing sell orders,
         | comes down
         | 
         | Stock is on stable upward trend, continues upward trend
         | 
         | If I can verbalize these usual actions, it's likely chronos can
         | also pickup on them.
         | 
         | Once again quality of data trumps all for LLM's, so performance
         | might vary. If you read the paper, they point out a few
         | situations where the LLM is unable to learn a trend, ie. When
         | the prompting time series isn't long enough.
        
         | intalentive wrote:
         | Imitation learning of discretionary traders who rely on a
         | mixture of rules and intuition.
        
         | andoando wrote:
         | Stock prices change continuously based on the current price and
         | future events that have not happened. I don't think they are at
         | all predictable.
        
       | amai wrote:
       | I doubt the differences in performance between all the ,,neural"
       | models are statistically significant. It strikes me as odd that a
       | model like TFT can be the worst of the ,,neural" models in one
       | benchmark and at the same time be the best in another benchmark.
       | Also what is the point of Benchmark I ? ,,It comprises 15
       | datasets that were also part of the training data of Chronos
       | models" . That is not forecasting. That is just
       | remembering/overfitting these time series.
        
       ___________________________________________________________________
       (page generated 2024-03-22 23:01 UTC)