[HN Gopher] Chronos: Learning the Language of Time Series
___________________________________________________________________
Chronos: Learning the Language of Time Series
Author : Anon84
Score : 185 points
Date : 2024-03-22 03:25 UTC (19 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| roca wrote:
| Doesn't cite TimesFM for some reason. Maybe the latter was
| published after this paper went camera-ready?
| https://blog.research.google/2024/02/a-decoder-only-foundati...
| ghc wrote:
| Because these approaches as likely derived from papers
| published 3-5 years ago. At this point neither TimesFM or
| Chronos is particularly novel. I've had similar models in
| production for complex time series for 18 months now.
| izyda wrote:
| I do not have a horse in the race, but it is interesting to see
| open source comparisons to traditional timeseries strategies:
| https://github.com/Nixtla/nixtla/tree/main/experiments/amazo...
|
| In general, the M-Competitions
| (https://forecasters.org/resources/time-series-data/), the
| olympics of timeseries forecasting, have proven frustrating for
| ML methods... linear models do shockingly well and the ML models
| that have won, generally seem to be variants of older tree-based
| methods (ie. LightGBM is a favorite).
|
| Will be interesting to see whether the Transformer architecture
| ends up making real progress here.
| one_buggy_boi wrote:
| Are these models high risk because of their lack of
| interpratability? Specialized models like temporal fusion
| transformers attempt to solve this but in practice I'm seeing
| folks torn apart when defending transformers against model risk
| committees within organizations that are mature enough to have
| them.
| tomrod wrote:
| Interpretability is just one pillar to satisfy in AI
| governance. You have build submodels to assist with
| interpreting black box main prediction models.
| rdedev wrote:
| Is there a way to directly train transformer models to output
| embeddings that could help tree based models downstream? For
| tabular data tree based models seems to be the best but I feel
| like foundational models could help them in some way
| wenc wrote:
| They are comparing a non-ensembled transformer model with an
| ensemble of simple linear models. It's not surprising that the
| ensemble models of linear time series models will do well,
| since ensembles optimize for the bias-variance trade-off.
|
| Transformer/ML models by themselves have a tendency to overfit
| past patterns. They pick up more signal in the patterns, but
| they also pick up spurious patterns. They're low bias but high
| variance.
|
| It would be more interesting to compare an ensemble of
| transformer models with an ensemble of linear models to see
| which is more accurate.
|
| (that said, it's pretty impressive that an ensemble of simple
| linear models can beat a large scale transformer model -- this
| tells me the domain being forecast has a high degree of
| variance, which transformer models by themselves don't do well
| on.)
| gradascent wrote:
| fyi I think you have bias and variance the wrong way around.
| Over-fitting indicates high variance
| wenc wrote:
| Thank you for catching that. Corrected.
| hackerlight wrote:
| > ensemble of transformer models
|
| Isn't that just dropout?
| mikkom wrote:
| No. Why do you think so?
| hbcondo714 wrote:
| Chronos is probably overkill for what I am looking to do with
| time series data. I just did an Ask HN on time series[0] but
| unfortunately didn't get the replies I was hoping for. Maybe this
| thread can get the bump I need:
|
| _I inherited a large time series JSON dataset in 2024. I 've
| been successful in using the Observable Framework[1] by writing a
| Rust (rust-script) data loader[2] to parse and plot simple line
| charts[3] to visually see the data. There are hundreds of graphs
| over years of data so I would like to identify what graphs I
| should be paying attention to. My initial thought is to calculate
| metrics on each graph such as: - Variability: how
| "spread out" are the data points from one another? - Trend:
| direction of data path, up or down? - Slope: are the data
| points increasing or decreasing? - Level: where are the
| data points on the vertical axis?
|
| What libraries, AI, databases, etc... would you recommend that
| would allow me to calculate these values? I am no data scientist
| and don't need forecasting but overall, I just want a dashboard
| that shows the most "important" graphs._
|
| [1] https://observablehq.com/framework/
|
| [2] https://observablehq.com/framework/loaders
|
| [3] https://observablehq.com/@observablehq/plot-simple-line-
| char...
|
| edit: the x-axis is Time while the y-axis can be values such as
| duration, frequency, intervals
|
| [0] https://news.ycombinator.com/item?id=39763246
| bhy wrote:
| When you ask what data should be paying attention to, that
| should be depends on your objective. Do you want to predict
| something? Identify anomalies? In the end, what matters is
| understanding the meaning and relations of these data, rather
| than throwing them in to some ML framework and hoping to get
| something out.
| hbcondo714 wrote:
| Prediction and anomalies are not objectives but of the 4
| listed, I would say the primary objective is identifying a
| trend in the data to know whether the data is moving in a
| specific direction--increasing or decreasing in value.
|
| I already added linear regression marks that draws linear
| regression lines with confidence bands[1] to my Observable
| plots but they do not give me a "value" so I need to manually
| look at the graphs and read the red line.
|
| [1] https://observablehq.com/plot/marks/linear-regression
| notagoodidea wrote:
| I always worked in R for time series analysis. This cookbook
| has everything you would need for a plan to analyze a time
| series [0] and this book provides a strong base and
| understanding while being focus on forecasting. [1] Have fun !
|
| [0] https://rc2e.com/timeseriesanalysis [1]
| https://otexts.com/fpp2/
| sampo wrote:
| > https://otexts.com/fpp2/
|
| Third edition: https://otexts.com/fpp3/
| ideamotor wrote:
| Agree, great resource.
| Galanwe wrote:
| Doesn't look like you need anything fancy here.
|
| Load you time serie in a dataframe, and:
|
| > - Variability: how "spread out" are the data points from one
| another?
|
| So basically df.std(), with rolling variants for short term /
| long term.
|
| > - Trend: direction of data path, up or down? - Slope: are the
| data points increasing or decreasing?
|
| Just do a simple rolling linear regression of your data point
| against time.
| jethkl wrote:
| It's great to see research in this field, I know there is
| opportunity here, and I hope to someday benefit from progress.
| But I skimmed the paper, and it doesn't appear solve a problem
| that I have. From the practical standpoint, what I want from a
| time series tool includes: 1) a small set of simple levers that I
| can review and tune 2) short training time for any input sets of
| size O(10k) to O(100k) (this covers seconds/day, minutes/week,
| hours/year) 3) the process of train + forecast can run fine on
| CPUs -- not GPUs with low memory overhead 4) decent out of the
| box performance that basically passes the sniff test and 5) a
| simple way to include regressors. I've enough experience to have
| learned to be wary of fully automated tuning, benchmark
| performance metrics, elaborate models, etc.
| meow_cat wrote:
| Maybe I'm missing something obvious, but what is the idea behind
| quantizing and tokenizing time series? We tokenize text because
| text isn't numbers. In the case of time series, we're... turning
| numbers into less precise numbers? The benefit of scaling and
| centering is trivial and i guess all timeseries ML does it, but I
| don't see why we need a token after that.
| matrix2596 wrote:
| I'm building upon insights from this paper
| (https://arxiv.org/pdf/2403.03950.pdf) and believe that
| classification can sometimes outperform regression, even when
| dealing with continuous output values. This is particularly
| true in scenarios where the output is noisy and may assume
| various values (multi modal). By treating the problem as
| classification over discrete bins, we can obtain an approximate
| distribution over these bins, rather than settling for a
| single, averaged value as regression would yield. This approach
| not only facilitates sampling but may also lead to more
| favorable loss landscapes. The linked paper in this comment
| provides more details of this idea.
| lamename wrote:
| Isn't it a given that classification would "outperform"
| regression, assuming n_classes <
| n_possible_continuous_labels? Turning a regression problem
| into a classification problem bins the data, offers more
| examples per label, simplifying the problem, with a tradeoff
| in what granularity you can predict.
|
| (It depends on what you mean by "outperform" since metrics
| for classification and regression aren't always comparable,
| but I think I'm following the meaning of your comment
| overall)
| dist-epoch wrote:
| Tokenisation turns a continuous signal into a normalized
| discrete vocabulary: stock "went up a lot", "went up a little",
| "stayed flat". This smooths out noise and simplifies matching
| up similar but not identical signals.
|
| > We tokenize text because text isn't numbers.
|
| Text is actually numbers. People tried inputting UTF8 directly
| into transformers, but it doesn't work that well. Karpathy
| explains why:
|
| https://www.youtube.com/watch?v=zduSFxRajkE
| lamename wrote:
| Interesting. Can you explain how this is superior and/or
| different from traditional DSP filters or other non-
| tokenization tricks in the signal processing field?
| dist-epoch wrote:
| Traditional DSP filters still output a continuous signal.
| And it's a well-explored domain, hard to imagine any low-
| hanging fruit there.
|
| My intuition is the following: transformers work really
| well for text, so we could try turning a time series into a
| "story" (limited vocabulary) and see what happens.
| lamename wrote:
| Like this or something different?
|
| https://github.com/gzerveas/mvts_transformer
| prlin wrote:
| > Text is actually numbers
|
| Text can be represented by numbers but they aren't the same
| datatype. They don't support the same operations (addition,
| subtraction, multiplication, etc).
| spyder wrote:
| I think it could also have a connection with symbolic AI: The
| discrete tokens could be the symbols that many believe is
| useful or necessary for reasoning. It is also useful for
| compression, reducing memory requirements by the quantization
| and small integer representations.
|
| https://en.wikipedia.org/wiki/Neuro-symbolic_AI
| intalentive wrote:
| My guess is that it enforces a kind of sparsity constraint.
| 555watch wrote:
| My primitive understanding is that we approximate a Markovian
| approach and indirectly model the transition probabilities just
| by working through tokens.
| sampo wrote:
| Amazon's older time series forecasting system DeepAR, has
| supported using external regressors since 2018 [1]. From this new
| Chronos paper, I didn't find any mention of external regressors.
|
| [1] https://aws.amazon.com/blogs/machine-learning/amazon-
| sagemak...
| kfor wrote:
| They do mention covariates in section 6.1 - specifically how
| this method doesn't support them but ideas on how they could in
| the future such as via stacking:
|
| > In this work, we have focused on univariate time series
| forecasting since it constitutes the most common of real-world
| time series use-cases. Nevertheless, practical forecasting
| tasks often involve additional information that must be taken
| into account. One example involves covariates, that can be
| either time-independent (e.g., color of the product) or time-
| varying (e.g., on which days the product is on sale). Another
| closely related problem is multivariate forecasting, where
| historic values of one time series (e.g., interest rates) can
| influence the forecast for another time series (e.g., housing
| prices). The number of covariates or multivariate dimensions
| can vary greatly across tasks, which makes it challenging to
| train a single model that can handle all possible combinations.
| A possible solution may involve training task-specific adaptors
| that inject the covariates into the pretrained forecasting
| model (Rahman et al., 2020). As another option, we can build
| stacking ensembles (Ting & Witten, 1997) of Chronos and other
| light-weight models that excel at handling covariates such as
| LightGBM (Ke et al., 2017).
| sampo wrote:
| Ah. Thank you. The same concept goes under different names,
| so one needs to search for all of "exogenous variables",
| "external regressors", "external factors" and "covariates".
| amelius wrote:
| What types of model are the algo-traders using these days?
| mikkom wrote:
| Do you really think the profitable algo traders are going to
| tell you that :-)
| amelius wrote:
| Why not? Sharing information moves the field forward.
| jebarker wrote:
| Profitable algorithmic traders are not in the business of
| moving the field forward. They're in the business of making
| profits.
| JustFinishedBSG wrote:
| What field ? They aren't curing cancer, serves 0 purpose to
| advance the "field".
| gkbrk wrote:
| You make money with if you have useful data others don't
| have, or you have better algorithms that others aren't
| using.
|
| When these become publicly known and used, your system
| doesn't work any more because the prices now include
| whatever signal you had for yourself before.
| Galanwe wrote:
| It's a bit more subtle than that, because there are
| feedback loops in the system. When a signal or factor
| spreads, it does so at multiple time horizons.
|
| e.g. If I have a good signal at predicting horizon 1 day,
| then it is in my interest to have many people trading it
| at horizon > 1 day, as they will push the price in my
| direction.
| 19h wrote:
| We're using HTMs for time series in our quant algorithms and
| they're performing pretty well; it's a shame that it's mostly
| ignored my ML scientists..
| lucidrains wrote:
| oh interesting, Jeff Hawkin's HTM?
| brcmthrowaway wrote:
| are you hiring
| melondonkey wrote:
| As a practitioner the most impactful library for time series has
| been brms, which basically gives you syntactic sugar for creating
| statistical models in Stan. Checks all the boxes including
| probabilistic forecasts, multiple link functions for the
| likelihood including weiner, gamma, Gaussian, student t,
| binomial, zero-inflated and hurdle models. Also has auto-
| regressive and ordinal predictors and you actually learn
| something from your data.
|
| I find a lot of these ML and DL libraries to be harder to
| troubleshoot beyond blind hyperparameter tuning whereas with
| stats I can tweak model, modify likelihood, etc. There's also a
| lot of high value problems that have few data points these
| libraries tend to want at least daily data.
| bethekind wrote:
| Could you expand on what you mean by "practitioner?"
|
| Also a followup question. With timeGPT and chronos advertised
| as "foundational time series models", do you think they have
| any value?
| nyrikki wrote:
| It may not be known yet, and this project seems to be targeted at
| gaussian distributions, but wouldn't the simplicity bias reduce
| sensitivity? I mean attention in transformers works so well in
| part because OOD is typically close enough.
|
| Probably just my own bias because it seems everything I deal with
| is at least MArP and anomalies are important to my use case.
|
| I can see where this is useful for others, even Amazon suggests
| ARIMA or ETS if you don't have hundreds of related streams.
|
| Is this more targeted at people who want more smoothing?
|
| Or am I just missing something?
| BrokrnAlgorithm wrote:
| Coming from finance, I always wonder how and if these large pre-
| trained models are usable on any financial time series. I see the
| appeal of pre-trained models in areas where there is clearly a
| stationary pattern, even if its very hidden (i.e industrial or
| biological metrics). But given the inherently high signal/noise
| ratio and how extremely non-stationary or chaotic the financial
| data processes tend to be, i struggle to see the use of pre-
| trained foundation models.
| bethekind wrote:
| I played around with timeGPT beta against predicting the sp500
| index performance for the next day (not multi variate time
| series as I couldn't figure out how to get it setup) and trying
| to use the confidence intervals it generated to buy options was
| useless at best
|
| I can see chronos working a bit better, as it tries to convert
| trends, and pieces of time series into tokens, like gpt does
| for phrases.
|
| Ie. Stock goes down terribly, then dead cat bounces. This is
| common.
|
| Stock goes up, hits resistance due to existing sell orders,
| comes down
|
| Stock is on stable upward trend, continues upward trend
|
| If I can verbalize these usual actions, it's likely chronos can
| also pickup on them.
|
| Once again quality of data trumps all for LLM's, so performance
| might vary. If you read the paper, they point out a few
| situations where the LLM is unable to learn a trend, ie. When
| the prompting time series isn't long enough.
| intalentive wrote:
| Imitation learning of discretionary traders who rely on a
| mixture of rules and intuition.
| andoando wrote:
| Stock prices change continuously based on the current price and
| future events that have not happened. I don't think they are at
| all predictable.
| amai wrote:
| I doubt the differences in performance between all the ,,neural"
| models are statistically significant. It strikes me as odd that a
| model like TFT can be the worst of the ,,neural" models in one
| benchmark and at the same time be the best in another benchmark.
| Also what is the point of Benchmark I ? ,,It comprises 15
| datasets that were also part of the training data of Chronos
| models" . That is not forecasting. That is just
| remembering/overfitting these time series.
___________________________________________________________________
(page generated 2024-03-22 23:01 UTC)