[HN Gopher] OpenTSLM: Language models that understand time series
___________________________________________________________________
OpenTSLM: Language models that understand time series
Paper: https://www.opentslm.com/OpenTSLM-whitepaper.pdf Repo:
https://github.com/StanfordBDHG/OpenTSLM Foundation models excel
at text, images, audio, and video, but lack temporal reasoning
capabilities over time-series data streams that run the real world:
vitals, prices, telemetry, grid loads, clickstreams, machine logs,
business processes. Time Series Language Models (TSLMs) are open
foundation models, supporting time-series as a native modality,
next to text, letting users ask questions, get explanations, and
recommendations, all in natural language. The OpenTSLM White Paper
released today demonstrates state-of-the-art temporal reasoning
performance. Unlike prior approaches, the cross-attention
architecture scales to long time-series remaining viable at scale.
The results: - Sleep staging: 4.4x accuracy with a model 200x
smaller (~880x efficiency) - Activity recognition: ~6x accuracy
with 200x smaller (~1,000x efficiency) - ECG interpretation: ~2x
accuracy with 200x smaller (~400x efficiency) -- first model to
process 12-lead ECG signals and text simultaneously with chain-of-
thought reasoning validated by cardiologists. For the first time,
foundation models can handle multiple time-series streams of
varying lengths concurrently, integrate them with textual context,
and produce _interpretable_ explanations (verified by domain
experts, clinicians). This work is the result of a growing
collaboration between researchers from Stanford, ETH Zurich, UIUC,
University of St. Gallen, University of Washington, Google, and
Amazon. It points to the next foundation model frontier: temporal
intelligence that unlocks proactive healthcare, adaptive robotics,
resilient infrastructure, and new forms of human-AI collaboration.
Author : rjakob
Score : 171 points
Date : 2025-10-01 17:25 UTC (5 hours ago)
(HTM) web link (www.opentslm.com)
(TXT) w3m dump (www.opentslm.com)
| let_tim_cook_ wrote:
| "Stanford Repo Released Sep 31, 2025" Seems like something
| sampled from a distribution with non-zero probability that the
| day after Sep 30, 2025 would is the 31st....
| rjakob wrote:
| Thanks for the note. Ironically, the post is about models built
| to understand time.
| lomase wrote:
| They fixed it already.
| esafak wrote:
| Wouldn't it be better to have the model write a script that calls
| a TS library and give it access to an interpreter to run it?
| That's how a human would do it. I'm not convinced of the need to
| bake this into the model. What can you do with native TS
| capability that you can't by tool calling?
| ForHackernews wrote:
| Does it actually have a concept of time? Does it understand
| causality?
| esafak wrote:
| There are papers on that, such as
| https://arxiv.org/abs/2410.15319. Time series modeling will
| not bring about an understanding of causality except in a
| weak sense https://en.wikipedia.org/wiki/Granger_causality.
| To truly connect a cause and effect you need a graphical
| model. And automated causal discovery, the hardest part of
| which is proposing the nodes of the graph, is a nascent
| field.
| sync wrote:
| Anthropic is encouraging the "have the model write a script"
| technique as well, buried in their latest announcement on
| Claude Agent SDK, this stuck with me:
|
| > The Claude Agent SDK excels at code generation--and for good
| reason. Code is precise, composable, and infinitely reusable,
| making it an ideal output for agents that need to perform
| complex operations reliably.
|
| > When building agents, consider: which tasks would benefit
| from being expressed as code? Often, the answer unlocks
| significant capabilities.
|
| https://www.anthropic.com/engineering/building-agents-with-t...
| RealLast wrote:
| I think you missed the point. Would you call an image analysis
| library to describe an image or reason over a sequence of
| images? Check out some of the plots in the paper to see what
| these models can do.
| esafak wrote:
| I would if the image analysis library was backed by a VLM. I
| have not fully read the paper, but couldn't figure 6 have
| been done by an LLM writing a script that calls libraries for
| time series feature extraction and writing a hypothesis test
| or whatever? They will do the heavy lifting and return a
| likelihood ratio or some statistic that is interpretable to
| an LLM.
| brandonb wrote:
| This is very cool! From the paper, this technique seems to work
| well for question answering in time-series.
|
| In medical AI, IMO, the most exciting work is detecting disease
| signals too subtle for humans--for example, estimating ejection
| fraction from an ECG (which cardiologists can't do this, but
| algorithms can and have been tested in RCTs:
| https://www.nature.com/articles/s41591-021-01335-4 ).
|
| Since OpenTSLM tokenizes time-series into an LLM embedding space,
| would that process prevent capturing such subtle signals? Or
| could the approach be extended to handle that use case?
| RealLast wrote:
| OpenTSLM models are _exactly_ made to capture these subtle
| signals. That was one of the original motivations. The model
| integrates the raw time series data via cross attention, with
| concrete time series representations learned by a raw time
| series encoder.
| brandonb wrote:
| Can you explain how? If I'm understanding the paper right,
| the timeseries encoding is a Conv1D and the cross-attention
| layer is constrained to output the token space of a pre-
| trained LLM. My naive expectation is these constraints would
| make the model less expressive / fine-tunable to pick up on
| these types of subtle signals.
|
| But obviously ML is an empirical field, so if you found that
| a constrained architecture worked well in practice, that's an
| interesting result in its own right.
| RealLast wrote:
| Sure! There is more after the 1D conv, another transformer
| architecture that encodes further features of the time
| series. The LLM can then basically query this encoder for
| information, also able to capture more subtle patterns. In
| away it's similiar to how some vision language models work.
| t_mann wrote:
| > Read the White Paper
|
| > A universal TSLM will power proactive healthcare, adaptive
| robotics, resilient infrastructure, and new forms of human-AI
| collaboration.
|
| > scientists, engineers, and builders from ETH, Stanford,
| Harvard, Cambridge, TUM, CDTM, Google, Meta, AWS, and beyond
|
| What's with all this fuss? Why not just upload your paper to
| arxiv? Time series models are interesting enough, but from the
| abstract it's not even clear whether they are using transformers
| or a recurrent architecture like xLSTM - arguably a more
| intuitive choice for time series - or something else. This
| website is barely distinguishable from a crypto/DeFi pitch.
| RealLast wrote:
| The full paper is on the website. The arxive release of the
| exact same paper is pending. Click the button "read the white
| paper" to get the full paper.
| t_mann wrote:
| [flagged]
| dang wrote:
| Please don't treat people in a hostile fashion when
| discussing their work on HN. That's the opposite of the
| kind of community we want here.
|
| https://news.ycombinator.com/newsguidelines.html
| dschaurecker wrote:
| Very cool!
| amelius wrote:
| If you view a byte sequence as a time series then I suppose this
| could be a good file compression algorithm.
| lacoolj wrote:
| Like hitting a thumb tack with a sledge hammer
| amelius wrote:
| It works.
| Animats wrote:
| The underlying work is something called "Flamingo".[1] This is a
| system for understanding interleaved text and images in sequence.
| So it can process two "modalities" that are both sequential. This
| new work seems to put some kind of time token in one "modality"
| channel, leading to more awareness of time.
|
| (The web site is too cute. Applying a left to right gradient on
| text is a bit much.)
|
| [1] https://arxiv.org/pdf/2204.14198
| llmslave wrote:
| Guaranteed there are hedge funds with language models that can
| predict time series. Alot of really good time series research has
| never been published, and is locked in some guys head that lives
| in a 20 million dollar apartment in NYC
| senorrib wrote:
| I doubt those are language models.
| RealLast wrote:
| Check it out, they are completely based on Llama and Gemma,
| outputting text. Models are open-source.
| reactordev wrote:
| Can confirm, kdb+ exists... and you'll probably never be able
| to get your hands on it. There are lots of models that use it.
| And they are indeed locked inside some guys head high up in the
| towers of midtown.
| IAmGraydon wrote:
| KBD+ is no secret, but what does this have to do with
| anything? It's just a database optimized for time series data
| and has nothing to do with AI. It's widely used in the
| financial business and even for non-financial things like
| Formula-1 race analysis.
| reactordev wrote:
| Cool. You missed the part where I said there are models
| using that. Those models are shhhhhhh...
|
| PyTorch is no secret either yet...
|
| The point I'm making is there are models, based on database
| stream data, that you'll never get access to even if you
| had $100m dollars.
| 1980phipsi wrote:
| One of the difficulties with these models would be backtesting
| investment strategies. You always need to make sure that you
| are only using data that would have been available at the time
| to avoid look-ahead bias.
| fmbb wrote:
| Why would they use LLM for this?
| wordpad wrote:
| The emergent behavior of LLMs being amazing at accurately
| predicting tokens in previously unseen conditions might be
| more powerful than more rigorous machine learning
| extrapolations.
|
| Especially when you throw noisy subjective context at it.
| mikepurvis wrote:
| The "prediction" in this case is I think some approximation
| of "ingest today's news and social media buzz as it's
| happening and predict what the financial news tomorrow
| morning will be."
| observationist wrote:
| Predicting the future is valuable. If a model can apply the
| same underlying world model that it uses to accurately
| predict OLHC series as it does to produce English language,
| then you can interrogate and expand on that underlying world
| model in complex and very useful ways. Being able to prompt
| it can describe a scenario, or uncover hidden influences that
| wouldn't be apparent from a simple accurate prediction.
| Things like that allow sophistication in the tools - instead
| of an accurate chart with all sorts of complex indicators,
| you can get English explication and variations on scenarios.
|
| You can't tell a numbers only model "ok, with this data, but
| now you know all the tomatoes in the world have gone rotten
| and the market doesn't know it yet, what's the best move?"
| You can use an LLM model like that, however, and with RL,
| which allows you to branch and layer strategies dependent on
| dynamic conditions and private data, for arbitrary outcomes.
| Deploy such a model at scale and run tens of thousands of
| simulations, iterating through different scenarios, and you
| can start to apply confidence metrics and complex multiple-
| degree-of-separation strategies to exploit arbitrage
| opportunities.
|
| Any one of the big labs could do something like this,
| including modeling people, demographic samples, distributions
| of psychological profiles, cultural and current events, and
| they'd have a manipulation engine to tell them exactly who,
| when, and where to invest, candidates to support, messages to
| push and publish.
|
| The fundamental measures of intelligence are how far into the
| future a system can predict across which domains. The broader
| the domains and farther into the future, the more
| intelligence, and things like this push the boundaries.
|
| We should probably get around to doing a digital bill of
| rights, but I suspect it's too late already anyway, and we're
| full steam ahead to snow crash territory.
| mikert89 wrote:
| Automated hypothesis testing in the form of a search for
| alpha in the market is certainly being used right now. An
| LLM can ask new questions about correlations between
| assets, and run statistical tests on those correlations, in
| ways that previously was only possible by employing a phd
| statistician
| constantcrying wrote:
| This isn't (just) time series forecasting, it is about
| interacting with time series data through natural language.
| fogzen wrote:
| When I worked at an ML hedge fund 6 years ago, t-SNE performed
| the best and momentum was the feature that best predicted stock
| movements.
|
| The actual algorithms for predicting price movement were fairly
| simplistic, most work was around strategies for dealing with
| overfitting and how to execute the trades. Accuracy was around
| 51-55% (a bit better than coin toss) so it was a big challenge
| to actually execute the trades and still make a profit after
| fees and other nonsense. Finding alpha is what ML is used for
| but that's just the first step.
| cwmoore wrote:
| My experience as well; seemed more accurate while prices were
| rising.
| ttul wrote:
| This makes intuitive sense to me, because the system you are
| modeling is wide open and you're competing against others who
| have the same information. Achieving much more than 51%
| accuracy would be extraordinary. But if you get 51%
| consistently over time, with leverage, you can make a good
| amount of money.
| pks016 wrote:
| Looks promising! I'll try it once I get home today.
|
| I work with a large number of audio time series data (not words
| and all have subtle variation). It would be interesting to see
| how it compares to traditional statistical methods.
| pdntspa wrote:
| OF COURSE the good stuff is proprietary....
| zubairov wrote:
| This is very cool! Amazing work guys!
| iLoveOncall wrote:
| You don't need specially trained LLMs for this. My team has been
| using successfuly Claude 3.5 for a year for the purpose of
| analyzing huge time series data sets (close to the max context
| window), without anything special beyond a prompt describing the
| task at hand.
| nowittyusername wrote:
| I agree, LLM's are capable of doing this right out of the box
| if you provide it grounding data like current time and a few
| other things in the system prompt. Its really odd that this is
| getting any attention.
| RealLast wrote:
| You guys are so funny, when papers like these exist:
| https://arxiv.org/abs/2404.11757
|
| Numerous research, INCLUDING the OpenTSLM paper has PROVEN
| they are NOT able to do this out of the box. Did you even
| check out the results at all? They literally compare OpenTSLM
| against standard text only baselines. Gemma3-270M performs
| better than GPT-4o using tokenized time series alone. Thus, I
| guess you guys are being ironic.
| iLoveOncall wrote:
| An experiment is not a proof.
|
| If this is the level of one of the contributors to the
| OpenTSLM paper (which you very obviously are), no wonder
| due diligence wasn't done properly.
| dang wrote:
| I understand how annoying it is when people post shallow
| dismissals of your work on the internet, but please don't
| give in to the annoyance when replying. It makes the thread
| worse, and it's against the HN guidelines:
| https://news.ycombinator.com/newsguidelines.html.
|
| I don't know if this is your work or not, but I appreciate
| your wanting to defend it...we just need you to do that in
| a way that doesn't attack others, no matter how wrong they
| are or you feel they are. Easier said than done of course,
| but we're all working on it together.
| NwtnsMthd wrote:
| This sounds very interesting, would you be able to share a
| little more about your process? What works and what doesn't?
| iLoveOncall wrote:
| Unfortunately not really, but we've found (and used in
| production for a year) that Claude 3.5 is perfectly capable
| of identifying anomalies or other points of interests in very
| large sets of time series data.
|
| Think of 100-200K worth of tokens formatted like this:
|
| <Entity1>-<Entity2> <Dimension> <ISO 8601 time> <value>
|
| <Entity1>-<Entity2> <Dimension> <ISO 8601 time +1> <value>
|
| <Entity1>-<Entity2> <Dimension> <ISO 8601 time +2> <value>
|
| <Entity1>-<Entity2> <Dimension2> <ISO 8601 time> <value>
|
| <Entity1>-<Entity2> <Dimension2> <ISO 8601 time +1> <value>
|
| The only pre-filtering we do is eliminate "obviously non
| relevant" data, such as series where the value is completely
| flat the whole time, but this was done to add more data to
| the context, not because Claude struggled with it (it
| doesn't).
| ghc wrote:
| > Few studies use cross-attention to integrate time series into
| LLMs
|
| I mean, sure, but why would you need a study for that? There's
| plenty of prior work using cross-attention to integrate time
| series dynamics into non-LLM transformer models, right? Or maybe
| I'm assuming that integrating a time series embedding with an LLM
| is easier than it is.
|
| Looking at the repo, the training data seems extremely health-
| focused. I guess I would have to tune the model with my own
| datasets if I want it to answer questions about multi-source
| sensor data?
| orbifold wrote:
| This is a terrible idea and direction but it will not stop people
| from pursuing it and as soon as they have a critical mass of
| people reviewing each other it will go on for quite a while.
| Transformers for time series is one of those things that seems to
| make sense but not really.
| EGreg wrote:
| Can you elaborate as to why, actually? What specifically makes
| this the case
| posidoli wrote:
| That is outstanding work and will revolutionize the approaches in
| this topic!
| Y_Y wrote:
| Bad bot
| yawnxyz wrote:
| would be cool to use this to predict series of passages for
| directed evolution, e.g. appelman protocol or similar, in
| phage/host interactions
| copypaper wrote:
| I understand this provides a way to interact with ts data via
| natural language, but is there any benefit to this over tool
| calling to a library that uses signal processing and/or rule
| based algos (or using machine learning if the data is
| noisy/variable)?
|
| For example, you ask an off-the-shelf LLM to analyze your ECG
| data. The LLM uses a tool to call out to your ECG ts analysis
| library. The library iterates over the data and finds stats & ECG
| events. It returns something like "Average heart rate: 60bpm,
| AFib detected at <time>, etc...". The LLM has all the info it
| needs to give an accurate analysis at a fraction of computational
| cost.
|
| On top of that, this requires a large annotated dataset and a
| pre-trained model. And correct me if I'm wrong, but I don't think
| it's possible to have a "general" model that could handle
| arbitrary time series data. I.e. a model that is trained on ECG
| data would not be compatible with stock market data. And there
| isn't a way to have a model that understands both stock market
| data and ECG data.
___________________________________________________________________
(page generated 2025-10-01 23:00 UTC)