[HN Gopher] OpenTSLM: Language models that understand time series
       ___________________________________________________________________
        
       OpenTSLM: Language models that understand time series
        
       Paper: https://www.opentslm.com/OpenTSLM-whitepaper.pdf  Repo:
       https://github.com/StanfordBDHG/OpenTSLM  Foundation models excel
       at text, images, audio, and video, but lack temporal reasoning
       capabilities over time-series data streams that run the real world:
       vitals, prices, telemetry, grid loads, clickstreams, machine logs,
       business processes.  Time Series Language Models (TSLMs) are open
       foundation models, supporting time-series as a native modality,
       next to text, letting users ask questions, get explanations, and
       recommendations, all in natural language.  The OpenTSLM White Paper
       released today demonstrates state-of-the-art temporal reasoning
       performance. Unlike prior approaches, the cross-attention
       architecture scales to long time-series remaining viable at scale.
       The results:  - Sleep staging: 4.4x accuracy with a model 200x
       smaller (~880x efficiency)  - Activity recognition: ~6x accuracy
       with 200x smaller (~1,000x efficiency)  - ECG interpretation: ~2x
       accuracy with 200x smaller (~400x efficiency)  -- first model to
       process 12-lead ECG signals and text simultaneously with chain-of-
       thought reasoning validated by cardiologists.  For the first time,
       foundation models can handle multiple time-series streams of
       varying lengths concurrently, integrate them with textual context,
       and produce _interpretable_ explanations (verified by domain
       experts, clinicians).  This work is the result of a growing
       collaboration between researchers from Stanford, ETH Zurich, UIUC,
       University of St. Gallen, University of Washington, Google, and
       Amazon.  It points to the next foundation model frontier: temporal
       intelligence that unlocks proactive healthcare, adaptive robotics,
       resilient infrastructure, and new forms of human-AI collaboration.
        
       Author : rjakob
       Score  : 171 points
       Date   : 2025-10-01 17:25 UTC (5 hours ago)
        
 (HTM) web link (www.opentslm.com)
 (TXT) w3m dump (www.opentslm.com)
        
       | let_tim_cook_ wrote:
       | "Stanford Repo Released Sep 31, 2025" Seems like something
       | sampled from a distribution with non-zero probability that the
       | day after Sep 30, 2025 would is the 31st....
        
         | rjakob wrote:
         | Thanks for the note. Ironically, the post is about models built
         | to understand time.
        
         | lomase wrote:
         | They fixed it already.
        
       | esafak wrote:
       | Wouldn't it be better to have the model write a script that calls
       | a TS library and give it access to an interpreter to run it?
       | That's how a human would do it. I'm not convinced of the need to
       | bake this into the model. What can you do with native TS
       | capability that you can't by tool calling?
        
         | ForHackernews wrote:
         | Does it actually have a concept of time? Does it understand
         | causality?
        
           | esafak wrote:
           | There are papers on that, such as
           | https://arxiv.org/abs/2410.15319. Time series modeling will
           | not bring about an understanding of causality except in a
           | weak sense https://en.wikipedia.org/wiki/Granger_causality.
           | To truly connect a cause and effect you need a graphical
           | model. And automated causal discovery, the hardest part of
           | which is proposing the nodes of the graph, is a nascent
           | field.
        
         | sync wrote:
         | Anthropic is encouraging the "have the model write a script"
         | technique as well, buried in their latest announcement on
         | Claude Agent SDK, this stuck with me:
         | 
         | > The Claude Agent SDK excels at code generation--and for good
         | reason. Code is precise, composable, and infinitely reusable,
         | making it an ideal output for agents that need to perform
         | complex operations reliably.
         | 
         | > When building agents, consider: which tasks would benefit
         | from being expressed as code? Often, the answer unlocks
         | significant capabilities.
         | 
         | https://www.anthropic.com/engineering/building-agents-with-t...
        
         | RealLast wrote:
         | I think you missed the point. Would you call an image analysis
         | library to describe an image or reason over a sequence of
         | images? Check out some of the plots in the paper to see what
         | these models can do.
        
           | esafak wrote:
           | I would if the image analysis library was backed by a VLM. I
           | have not fully read the paper, but couldn't figure 6 have
           | been done by an LLM writing a script that calls libraries for
           | time series feature extraction and writing a hypothesis test
           | or whatever? They will do the heavy lifting and return a
           | likelihood ratio or some statistic that is interpretable to
           | an LLM.
        
       | brandonb wrote:
       | This is very cool! From the paper, this technique seems to work
       | well for question answering in time-series.
       | 
       | In medical AI, IMO, the most exciting work is detecting disease
       | signals too subtle for humans--for example, estimating ejection
       | fraction from an ECG (which cardiologists can't do this, but
       | algorithms can and have been tested in RCTs:
       | https://www.nature.com/articles/s41591-021-01335-4 ).
       | 
       | Since OpenTSLM tokenizes time-series into an LLM embedding space,
       | would that process prevent capturing such subtle signals? Or
       | could the approach be extended to handle that use case?
        
         | RealLast wrote:
         | OpenTSLM models are _exactly_ made to capture these subtle
         | signals. That was one of the original motivations. The model
         | integrates the raw time series data via cross attention, with
         | concrete time series representations learned by a raw time
         | series encoder.
        
           | brandonb wrote:
           | Can you explain how? If I'm understanding the paper right,
           | the timeseries encoding is a Conv1D and the cross-attention
           | layer is constrained to output the token space of a pre-
           | trained LLM. My naive expectation is these constraints would
           | make the model less expressive / fine-tunable to pick up on
           | these types of subtle signals.
           | 
           | But obviously ML is an empirical field, so if you found that
           | a constrained architecture worked well in practice, that's an
           | interesting result in its own right.
        
             | RealLast wrote:
             | Sure! There is more after the 1D conv, another transformer
             | architecture that encodes further features of the time
             | series. The LLM can then basically query this encoder for
             | information, also able to capture more subtle patterns. In
             | away it's similiar to how some vision language models work.
        
       | t_mann wrote:
       | > Read the White Paper
       | 
       | > A universal TSLM will power proactive healthcare, adaptive
       | robotics, resilient infrastructure, and new forms of human-AI
       | collaboration.
       | 
       | > scientists, engineers, and builders from ETH, Stanford,
       | Harvard, Cambridge, TUM, CDTM, Google, Meta, AWS, and beyond
       | 
       | What's with all this fuss? Why not just upload your paper to
       | arxiv? Time series models are interesting enough, but from the
       | abstract it's not even clear whether they are using transformers
       | or a recurrent architecture like xLSTM - arguably a more
       | intuitive choice for time series - or something else. This
       | website is barely distinguishable from a crypto/DeFi pitch.
        
         | RealLast wrote:
         | The full paper is on the website. The arxive release of the
         | exact same paper is pending. Click the button "read the white
         | paper" to get the full paper.
        
           | t_mann wrote:
           | [flagged]
        
             | dang wrote:
             | Please don't treat people in a hostile fashion when
             | discussing their work on HN. That's the opposite of the
             | kind of community we want here.
             | 
             | https://news.ycombinator.com/newsguidelines.html
        
       | dschaurecker wrote:
       | Very cool!
        
       | amelius wrote:
       | If you view a byte sequence as a time series then I suppose this
       | could be a good file compression algorithm.
        
         | lacoolj wrote:
         | Like hitting a thumb tack with a sledge hammer
        
           | amelius wrote:
           | It works.
        
       | Animats wrote:
       | The underlying work is something called "Flamingo".[1] This is a
       | system for understanding interleaved text and images in sequence.
       | So it can process two "modalities" that are both sequential. This
       | new work seems to put some kind of time token in one "modality"
       | channel, leading to more awareness of time.
       | 
       | (The web site is too cute. Applying a left to right gradient on
       | text is a bit much.)
       | 
       | [1] https://arxiv.org/pdf/2204.14198
        
       | llmslave wrote:
       | Guaranteed there are hedge funds with language models that can
       | predict time series. Alot of really good time series research has
       | never been published, and is locked in some guys head that lives
       | in a 20 million dollar apartment in NYC
        
         | senorrib wrote:
         | I doubt those are language models.
        
           | RealLast wrote:
           | Check it out, they are completely based on Llama and Gemma,
           | outputting text. Models are open-source.
        
         | reactordev wrote:
         | Can confirm, kdb+ exists... and you'll probably never be able
         | to get your hands on it. There are lots of models that use it.
         | And they are indeed locked inside some guys head high up in the
         | towers of midtown.
        
           | IAmGraydon wrote:
           | KBD+ is no secret, but what does this have to do with
           | anything? It's just a database optimized for time series data
           | and has nothing to do with AI. It's widely used in the
           | financial business and even for non-financial things like
           | Formula-1 race analysis.
        
             | reactordev wrote:
             | Cool. You missed the part where I said there are models
             | using that. Those models are shhhhhhh...
             | 
             | PyTorch is no secret either yet...
             | 
             | The point I'm making is there are models, based on database
             | stream data, that you'll never get access to even if you
             | had $100m dollars.
        
         | 1980phipsi wrote:
         | One of the difficulties with these models would be backtesting
         | investment strategies. You always need to make sure that you
         | are only using data that would have been available at the time
         | to avoid look-ahead bias.
        
         | fmbb wrote:
         | Why would they use LLM for this?
        
           | wordpad wrote:
           | The emergent behavior of LLMs being amazing at accurately
           | predicting tokens in previously unseen conditions might be
           | more powerful than more rigorous machine learning
           | extrapolations.
           | 
           | Especially when you throw noisy subjective context at it.
        
             | mikepurvis wrote:
             | The "prediction" in this case is I think some approximation
             | of "ingest today's news and social media buzz as it's
             | happening and predict what the financial news tomorrow
             | morning will be."
        
           | observationist wrote:
           | Predicting the future is valuable. If a model can apply the
           | same underlying world model that it uses to accurately
           | predict OLHC series as it does to produce English language,
           | then you can interrogate and expand on that underlying world
           | model in complex and very useful ways. Being able to prompt
           | it can describe a scenario, or uncover hidden influences that
           | wouldn't be apparent from a simple accurate prediction.
           | Things like that allow sophistication in the tools - instead
           | of an accurate chart with all sorts of complex indicators,
           | you can get English explication and variations on scenarios.
           | 
           | You can't tell a numbers only model "ok, with this data, but
           | now you know all the tomatoes in the world have gone rotten
           | and the market doesn't know it yet, what's the best move?"
           | You can use an LLM model like that, however, and with RL,
           | which allows you to branch and layer strategies dependent on
           | dynamic conditions and private data, for arbitrary outcomes.
           | Deploy such a model at scale and run tens of thousands of
           | simulations, iterating through different scenarios, and you
           | can start to apply confidence metrics and complex multiple-
           | degree-of-separation strategies to exploit arbitrage
           | opportunities.
           | 
           | Any one of the big labs could do something like this,
           | including modeling people, demographic samples, distributions
           | of psychological profiles, cultural and current events, and
           | they'd have a manipulation engine to tell them exactly who,
           | when, and where to invest, candidates to support, messages to
           | push and publish.
           | 
           | The fundamental measures of intelligence are how far into the
           | future a system can predict across which domains. The broader
           | the domains and farther into the future, the more
           | intelligence, and things like this push the boundaries.
           | 
           | We should probably get around to doing a digital bill of
           | rights, but I suspect it's too late already anyway, and we're
           | full steam ahead to snow crash territory.
        
             | mikert89 wrote:
             | Automated hypothesis testing in the form of a search for
             | alpha in the market is certainly being used right now. An
             | LLM can ask new questions about correlations between
             | assets, and run statistical tests on those correlations, in
             | ways that previously was only possible by employing a phd
             | statistician
        
         | constantcrying wrote:
         | This isn't (just) time series forecasting, it is about
         | interacting with time series data through natural language.
        
         | fogzen wrote:
         | When I worked at an ML hedge fund 6 years ago, t-SNE performed
         | the best and momentum was the feature that best predicted stock
         | movements.
         | 
         | The actual algorithms for predicting price movement were fairly
         | simplistic, most work was around strategies for dealing with
         | overfitting and how to execute the trades. Accuracy was around
         | 51-55% (a bit better than coin toss) so it was a big challenge
         | to actually execute the trades and still make a profit after
         | fees and other nonsense. Finding alpha is what ML is used for
         | but that's just the first step.
        
           | cwmoore wrote:
           | My experience as well; seemed more accurate while prices were
           | rising.
        
           | ttul wrote:
           | This makes intuitive sense to me, because the system you are
           | modeling is wide open and you're competing against others who
           | have the same information. Achieving much more than 51%
           | accuracy would be extraordinary. But if you get 51%
           | consistently over time, with leverage, you can make a good
           | amount of money.
        
       | pks016 wrote:
       | Looks promising! I'll try it once I get home today.
       | 
       | I work with a large number of audio time series data (not words
       | and all have subtle variation). It would be interesting to see
       | how it compares to traditional statistical methods.
        
       | pdntspa wrote:
       | OF COURSE the good stuff is proprietary....
        
       | zubairov wrote:
       | This is very cool! Amazing work guys!
        
       | iLoveOncall wrote:
       | You don't need specially trained LLMs for this. My team has been
       | using successfuly Claude 3.5 for a year for the purpose of
       | analyzing huge time series data sets (close to the max context
       | window), without anything special beyond a prompt describing the
       | task at hand.
        
         | nowittyusername wrote:
         | I agree, LLM's are capable of doing this right out of the box
         | if you provide it grounding data like current time and a few
         | other things in the system prompt. Its really odd that this is
         | getting any attention.
        
           | RealLast wrote:
           | You guys are so funny, when papers like these exist:
           | https://arxiv.org/abs/2404.11757
           | 
           | Numerous research, INCLUDING the OpenTSLM paper has PROVEN
           | they are NOT able to do this out of the box. Did you even
           | check out the results at all? They literally compare OpenTSLM
           | against standard text only baselines. Gemma3-270M performs
           | better than GPT-4o using tokenized time series alone. Thus, I
           | guess you guys are being ironic.
        
             | iLoveOncall wrote:
             | An experiment is not a proof.
             | 
             | If this is the level of one of the contributors to the
             | OpenTSLM paper (which you very obviously are), no wonder
             | due diligence wasn't done properly.
        
             | dang wrote:
             | I understand how annoying it is when people post shallow
             | dismissals of your work on the internet, but please don't
             | give in to the annoyance when replying. It makes the thread
             | worse, and it's against the HN guidelines:
             | https://news.ycombinator.com/newsguidelines.html.
             | 
             | I don't know if this is your work or not, but I appreciate
             | your wanting to defend it...we just need you to do that in
             | a way that doesn't attack others, no matter how wrong they
             | are or you feel they are. Easier said than done of course,
             | but we're all working on it together.
        
         | NwtnsMthd wrote:
         | This sounds very interesting, would you be able to share a
         | little more about your process? What works and what doesn't?
        
           | iLoveOncall wrote:
           | Unfortunately not really, but we've found (and used in
           | production for a year) that Claude 3.5 is perfectly capable
           | of identifying anomalies or other points of interests in very
           | large sets of time series data.
           | 
           | Think of 100-200K worth of tokens formatted like this:
           | 
           | <Entity1>-<Entity2> <Dimension> <ISO 8601 time> <value>
           | 
           | <Entity1>-<Entity2> <Dimension> <ISO 8601 time +1> <value>
           | 
           | <Entity1>-<Entity2> <Dimension> <ISO 8601 time +2> <value>
           | 
           | <Entity1>-<Entity2> <Dimension2> <ISO 8601 time> <value>
           | 
           | <Entity1>-<Entity2> <Dimension2> <ISO 8601 time +1> <value>
           | 
           | The only pre-filtering we do is eliminate "obviously non
           | relevant" data, such as series where the value is completely
           | flat the whole time, but this was done to add more data to
           | the context, not because Claude struggled with it (it
           | doesn't).
        
       | ghc wrote:
       | > Few studies use cross-attention to integrate time series into
       | LLMs
       | 
       | I mean, sure, but why would you need a study for that? There's
       | plenty of prior work using cross-attention to integrate time
       | series dynamics into non-LLM transformer models, right? Or maybe
       | I'm assuming that integrating a time series embedding with an LLM
       | is easier than it is.
       | 
       | Looking at the repo, the training data seems extremely health-
       | focused. I guess I would have to tune the model with my own
       | datasets if I want it to answer questions about multi-source
       | sensor data?
        
       | orbifold wrote:
       | This is a terrible idea and direction but it will not stop people
       | from pursuing it and as soon as they have a critical mass of
       | people reviewing each other it will go on for quite a while.
       | Transformers for time series is one of those things that seems to
       | make sense but not really.
        
         | EGreg wrote:
         | Can you elaborate as to why, actually? What specifically makes
         | this the case
        
       | posidoli wrote:
       | That is outstanding work and will revolutionize the approaches in
       | this topic!
        
         | Y_Y wrote:
         | Bad bot
        
       | yawnxyz wrote:
       | would be cool to use this to predict series of passages for
       | directed evolution, e.g. appelman protocol or similar, in
       | phage/host interactions
        
       | copypaper wrote:
       | I understand this provides a way to interact with ts data via
       | natural language, but is there any benefit to this over tool
       | calling to a library that uses signal processing and/or rule
       | based algos (or using machine learning if the data is
       | noisy/variable)?
       | 
       | For example, you ask an off-the-shelf LLM to analyze your ECG
       | data. The LLM uses a tool to call out to your ECG ts analysis
       | library. The library iterates over the data and finds stats & ECG
       | events. It returns something like "Average heart rate: 60bpm,
       | AFib detected at <time>, etc...". The LLM has all the info it
       | needs to give an accurate analysis at a fraction of computational
       | cost.
       | 
       | On top of that, this requires a large annotated dataset and a
       | pre-trained model. And correct me if I'm wrong, but I don't think
       | it's possible to have a "general" model that could handle
       | arbitrary time series data. I.e. a model that is trained on ECG
       | data would not be compatible with stock market data. And there
       | isn't a way to have a model that understands both stock market
       | data and ECG data.
        
       ___________________________________________________________________
       (page generated 2025-10-01 23:00 UTC)