[HN Gopher] TimesFM: Time Series Foundation Model for time-serie...
___________________________________________________________________
TimesFM: Time Series Foundation Model for time-series forecasting
Author : yeldarb
Score : 306 points
Date : 2024-05-08 13:34 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| uoaei wrote:
| "Time series" is such an over-subscribed term. What sorts of time
| series is this actually useful for?
|
| For instance, will it be able to predict dynamics for a machine
| with thousands of sensors?
| techwizrd wrote:
| Specifically, its referring to univariate, contiguous point
| forecasts. Honestly, I'm a little puzzled by the benchmarks.
| sarusso wrote:
| Even if it was for multivariate time series, the model would
| first need to infer what machine are we talking about, then its
| working conditions, and only then make a reasonable forecast
| based on an hypothesis of its dynamics. I don't know, seems
| pretty hard.
| uoaei wrote:
| Indeed. An issue I ran into over and over while doing
| research for semiconductor manufacturing.
|
| My complaint was more illustrative than earnest.
| iamgopal wrote:
| How can time series model be pre-trained ? I think I'm missing
| something.
| melenaboija wrote:
| Third paragraph of the introduction of the mentioned paper[1]
| in the first paragraph of the repo.
|
| [1] https://arxiv.org/abs/2310.10688
| jurgenaut23 wrote:
| I guess they pre-trained the model to exploit common patterns
| found in any time-series (e.g., seasonalities, trends,
| etc.)... What would be interesting, though, is to see if it
| spots patterns that are domain-specific (e.g., the
| ventricular systole dip in an electrocardiogram), and
| possibly transfer those (that would be obviously useless in
| this specific example, but maybe there are interesting domain
| transfers out there)
| malux85 wrote:
| If you have a univariate series, just single values following
| each other -
|
| [5, 3, 3, 2, 2, 2, 1, ...]
|
| What is the next number? Well let's start with the search space
| - what is the possible range of the next number? Assuming
| unsigned 32bit integers (for explanation simplicity) it's
| 0-(2^32-1)
|
| So are all of those possible outputs equally likely? The next
| number could be 1, or it could be 345,654,543 ... are those
| outputs equally likely?
|
| Even though we know nothing about this sequence, most time
| series don't make enormous random jumps, so no, they are not
| equally likely, 1 is the more likely of the two we discussed.
|
| Ok, so some patterns are more likely than others, let's analyse
| lots and lots of time series data and see if we can build a
| generalised model that can be fine tuned or used as a feature
| extractor.
|
| Many time series datasets have repeating patterns, momentum,
| symmetries, all of these can be learned. Is it perfect? No, but
| what model is? And things don't have to be perfect to be
| useful.
|
| There you go - that's a pre-trained time series model in a
| nutshell
| sarusso wrote:
| My understating is that, while your eye can naturally spot a
| dependency over time in time series data, machines can't. So as
| we did for imaging, where we pre-trained models to let machines
| easily identify objects in pictures, now we are doing the same
| to let machines "see" dependencies over time. Then, how these
| dependencies work, this is another story.
| nwoli wrote:
| Seems like a pretty small (low latency) model. Would be
| interesting to hook up to mouse input (x and y) and see how well
| it predicts where I'm gonna move the mouse (maybe with and
| without seeing the predicted path)
| jarmitage wrote:
| What is the latency?
| throwtappedmac wrote:
| Curious George here: why are you trying to predict where the
| mouse is going? :)
| nwoli wrote:
| Just to see how good the model is (maybe it's creepily good
| in a fun way)
| Timon3 wrote:
| There's a fun game idea in there! Imagine having to
| outmaneuver a constantly learning model. Not to mention the
| possibilities of using this in genres like bullet hell...
| teaearlgraycold wrote:
| Think of the sweet sweet ad revenue!
| throwtappedmac wrote:
| Haha as if advertisers don't know me better than I know me
| tasty_freeze wrote:
| Game developers are constantly trying to minimize lag. I have
| no idea if computers are so fast these days that it is a
| "solved" problem, but I knew a game developer ages ago who
| used a predictive mouse model to reduce the apparent lag by
| guessing where the mouse would be at the time the frame was
| displayed (considering it took 30 ms or whatever to render
| the frame).
| orbital-decay wrote:
| The only thing worse than lag is uneven lag, which is what
| you're going to end up with. Constant lag can be dealt with
| by players, jitter can't.
| aeyes wrote:
| Quake internet play only became acceptable when client side
| prediction was implemented, I'm sure it would be better to
| have real prediction instead of simple interpolation.
|
| https://raw.githubusercontent.com/ESWAT/john-carmack-plan-
| ar...
| ukuina wrote:
| What an amazing look into one of the greatest minds in
| programming!
|
| Thank you for this treasure.
|
| The relevant bits:
|
| > I am now allowing the client to guess at the results of
| the users movement until the authoritative response from
| the server comes through. This is a biiiig architectural
| change. The client now needs to know about solidity of
| objects, friction, gravity, etc. I am sad to see the
| elegent client-as-terminal setup go away, but I am
| practical above idealistic.
|
| > The server is still the final word, so the client is
| allways repredicting it's movement based off of the last
| known good message from the server.
| wongarsu wrote:
| Competitive online games commonly predict the player's
| movement. Network latencies have improved and are now
| usually <16ms (useful milestone since at 60fps you render a
| frame every 16.6ms), but players expect to still be able to
| smoothly play when joining from the other side of the
| continent to play with their friends. You usually want
| every client to agree where everyone is, and predicting
| movement leads to less disagreement than what you would get
| from using "outdated" state because of speed-of-light
| delays.
|
| If you want to predict not just position but also
| orientation in a shooter game, that's basically predicting
| the mouse movements.
| brigadier132 wrote:
| Catching cheaters in games might seem like a good use.
| dangerclose wrote:
| is it better than prophet from meta?
| VHRanger wrote:
| I imagine they're both worse than good old exponential
| smoothing or SARIMAX.
| Pseudocrat wrote:
| Depends on use case. Hybrid approaches have been dominating
| the M-Competitions, but there are generally small percentage
| differences in variance of statistical models vs machine
| learning models.
|
| And exponentially higher cost for ML models.
| VHRanger wrote:
| At the end of the day, if training or doing inference on
| the ML model is massively more costly in time or compute,
| you'll iterate much less with it.
|
| I also think it's a dead end to try to have foundation
| models for "time series" - it's a class of data! Like when
| people tried to have foundation models for any general
| graph type.
|
| You could make foundation models for data within that type
| - eg. meteorological time series, or social network graphs.
| But for the abstract class type it seems like a dead end.
| rockinghigh wrote:
| These models may be helpful if they speed up convergence
| when fine tuned on business-specific time series.
| SpaceManNabs wrote:
| is there a ranking of the methods that actually work on
| benchmark datasets? Hybrid, "ML" or old stats? I remember
| eamonnkeogh doing this on r/ML a few years ago.
| efrank3 wrote:
| Prophet was pretty bad so yes, but it doesn't seem much better
| than ARIMA
| l2dy wrote:
| Blog link (Feb 2024): https://research.google/blog/a-decoder-
| only-foundation-model...
|
| Previous discussion:
| https://news.ycombinator.com/item?id=39235983
| whimsicalism wrote:
| I'm curious why we seem convinced that this is a task that is
| possible or something worthy of investigation.
|
| I've worked on language models since 2018, even then it was
| obvious why language was a useful _and transferable_ task. I do
| not at all feel the same way about general univariate time series
| that could have any underlying process.
| sarusso wrote:
| +1 for "any underlying process". It would be interesting what
| use case they had in mind.
| baq wrote:
| well... if you look at a language in a certain way, it is just
| a way to put bits in a certain order. if you forget about the
| 'language' part, it kinda makes sense to try because why
| shouldn't it work?
| IshKebab wrote:
| Why not? There are plenty of time series that have underlying
| patterns which means you can do better than a total guess even
| without any knowledge of what you are predicting.
|
| Think about something like traffic patterns. You probably won't
| predict higher traffic on game days, but predicting rush hour
| is going to be pretty trivial.
| smokel wrote:
| The things that we are typically interested in have very clear
| patterns. In a way, if we find that there are no patterns, we
| don't even try to do any forecasting.
|
| "The Unreasonable Effectiveness of Mathematics in the Natural
| Sciences" [1] hints that there might be some value here.
|
| [1]
| https://en.m.wikipedia.org/wiki/The_Unreasonable_Effectivene...
| yonixw wrote:
| Exactly, so for example, I think the use of this model is in
| cases where you want user count to have some pattern around
| timing. And be alerted if it has spike.
|
| But you wouldn't want this model for file upload storage
| usage which only increases, where you would put alerts based
| on max values and not patterns/periodic values.
| zeroxfe wrote:
| > I'm curious why we seem convinced that this is a task that is
| possible or something worthy of investigation.
|
| There's a huge industry around time series forecasting used for
| all kinds of things like engineering, finance, climate science,
| etc. and many of the modern ones incorporate some kind of
| machine learning because they deal with very high dimensional
| data. Given the very surprising success of LLMs in non-language
| fields, it seems reasonable that people would work on this.
| whimsicalism wrote:
| Task specific time series models, not time series "foundation
| models" - we are discussing different things.
| zeroxfe wrote:
| I don't think we are. The premise of this is that the
| foundation model can learn some kind of baseline ability to
| reason about forecasting, that is generalizable across
| different domains (each which needs fine tuning.) I don't
| know if it will find anything, but LLMs totally surprised
| us, and this kind of thing seems totally worthy of
| investigation.
| cscurmudgeon wrote:
| Foundational time series models have been around since 2019
| and show competitive levels of performance with task
| specific models.
|
| https://arxiv.org/abs/1905.10437
| wuj wrote:
| Time series data are inherently context sensitive, unlike
| natural languages which follow predictable grammar patterns.
| The patterns in time series data vary based on context. For
| example, flight data often show seasonal trends, while electric
| signals depend on the type of sensor used. There's also data
| that appear random, like stock data, though firms like Rentech
| manage to consistently find unlerlying alphas. Training a
| multivariate time series data would be challenging, but I don't
| see why not for specific applications.
| Xcelerate wrote:
| Is Rentech the only group that genuinely manages to predict
| stock price? Seems like the very observation that it's still
| possible would be enough motivation for other groups to catch
| up over such a long period.
|
| Also, the first realistic approximation of Solomonoff
| induction we achieve is going to be interesting because it
| will destroy the stock market.
| icapybara wrote:
| Agreed, if stock prices were predictable by some technical
| means, they would be quickly driven to unpredictability by
| people trading on those technical indicators.
| frankc wrote:
| This is that old finance chestnut. Two finance professors
| are walking down the hall and one of them spots a twenty
| dollar bill. He goes to pick it up but the other
| professor stops him and says "no don't bother. If there
| was twenty dollars there someone would have already
| picked it up"
|
| Yes, people arbitrage away these anomalies, and make
| billions doing it.
| amelius wrote:
| And that would make the stock markets accessible to fewer
| people, further widening the wealth gap.
| belter wrote:
| Rentech does not seem to be able to predict the stock
| market for their customers...
|
| "Jim Simons' Renaissance Technologies suffers $11 billion
| of client withdrawals in 7 months" -
| https://markets.businessinsider.com/news/stocks/jim-
| simons-r...
| amelius wrote:
| Maybe that would be a good thing. I wouldn't mourn the
| destruction of the stock market as it's just a giant
| wealth-gap increasing casino. Trading has nothing to do
| with underlying value.
| pvorb wrote:
| I fully agree. The stock market is just a giant machine
| that pulls money out of systems.
| shaism wrote:
| Fundamentally, the pre-trained model would need to learn a
| "world model" to predict well in distinct domains. This should
| be possible not regarding compute requirements and the exact
| architecture.
|
| After all, the physical world (down to the subatomic level) is
| governed by physical laws. Ilya Sutskever from OpenAI stated
| that next-token prediction might be enough to learn a world
| model (see [1]). That would imply that a model learns a "world
| model" indirectly, which is even more unrealistic than learning
| the world model directly through pre-training on time-series
| data.
|
| [1] https://www.youtube.com/watch?v=YEUclZdj_Sc
| whimsicalism wrote:
| But the data generating process could be literally anything.
| We are not constrained by physics in any real sense if we
| predicting financial markets or occurrences of a certain
| build error or termite behavior.
| shaism wrote:
| Sure, there are limits. Not everything is predictable, not
| even physics. But that is also not the point of such a
| model. The goal is to forecast across a broad range of use
| cases that do have underlying laws. Similar to LLM, they
| could also be fine-tuned.
| wavemode wrote:
| "predicting the next token well means that you understand the
| underlying reality that led to the creation of that token"
|
| People on the AI-hype side of things tend to believe this,
| but I really fundamentally don't.
|
| It's become a philosophical debate at this point (what does
| it mean to "understand" something, etc.)
| itronitron wrote:
| There was a paper written a while back that proved
| mathematically how you can correlate any time series with any
| other time series, thus vaporizing any perception of value
| gained by correlating time series (at least for those people
| that read the paper.) just wanted to share
| bdjsiqoocwk wrote:
| What does that mean "you can correlate"? That phrase is
| meaningless.
| notnaut wrote:
| I would like to read more. Feels sort of like an expression
| of certain "universal truths" like the 80/20 rule or golden
| ratio
| jimmySixDOF wrote:
| The only other timeseries paper I am aware of is TimeGPT
|
| https://news.ycombinator.com/item?id=37874891
| nextaccountic wrote:
| Do you have a link to the paper?
| refibrillator wrote:
| Why do you think language is so special?
|
| There's an extensive body of literature across numerous domains
| that demonstrates the benefits of Multi-Task Learning (MTL).
| Actually I have a whole folder of research papers on this
| topic, here's one of the earliest references on hand that I
| feel captures the idea succinctly in the context of modern ML:
|
| "MTL improves generalization by leveraging the domain-specific
| information contained in the training signals of related tasks"
| [Caruana, 1998]
|
| I see repetition and structure everywhere in life. To me it's
| not far fetched that a model trained on daily or yearly trends
| could leverage that information in the context of e.g.
| biological signals which are influenced by circadian rhythm
| etc.
|
| Disclaimer: my background is in ML & bio-signals, I work with
| time series too much.
| owl_brawl wrote:
| For those who haven't read it, Rich Caruana's thesis on
| multi-task learning is beautifully written (the cited 1998
| paper here). It's amazing to see how far the field has come,
| and, at the same time, how advanced the thinking was in the
| 90s too.
| bigger_cheese wrote:
| There is potential for integrating ML with time series data in
| industrial applications (things like smelters, reactors etc.),
| where you have continuous stream of time series measurements
| from things like gauges and thermocouples. If you can detect
| (and respond) to changing circumstances faster then a humans in
| control room reacting to trends or alarms then potential big
| efficiency gains...
|
| Operator guidance is often based on heuristics - when metric A
| exceeds X value for Y seconds take action Z. Or rates of change
| if the signal is changing at a rate of more than x etc.
|
| So in these areas there exists potential for ML solution,
| especially if it's capable of learning (i.e. last response
| overshot by X so trim next response appropriately).
| kqr wrote:
| Every time i've actually tried something like this it has not
| outperformed statistical process control.
|
| It's not just that control charts are great signal detectors,
| but also managing processes like that takes a certain
| statistical literacy one gets from applying SPC faithfully
| for a while, and does not get from tossing ML onto it and
| crossing fingers.
| chaos_emergent wrote:
| > Every time i've actually tried something like this it has
| not outperformed statistical process control.
|
| There are clear counterexamples to your experience, most
| notably in maintaining plasma stability in tokamak
| reactors:
| https://www.nature.com/articles/s41586-021-04301-9
| whimsicalism wrote:
| task specific model
| kqr wrote:
| Interesting. Could you point me to where it is compared
| against SPC? I didn't find it from a cursory read.
| fedeb95 wrote:
| as you say, without knowing anything about the underlying
| process, we can't predict generally. Some other comments point
| to contexts in which we do know something about the underlying.
| For instance, I don't think finance is something where you can
| apply this kind of stuff.
| matt-p wrote:
| Not really. It's true it would usually need more context than a
| single series dataset but you can predict broadly accurate-ish
| bandwidth usage trends just using simple statistical
| extrapolation, we've been doing that since the early 90s. If
| you give a model your subscriber numbers and usage data as time
| series it should be able to tell you quite accurately how much
| electricity|bandwidth|gas|road traffic levels| metro passenger
| levels at station Z... you'll be using at 4pm on January 4th
| 2026.
| cma wrote:
| Watch this talk from Albert Gu:
|
| Efficiently Modeling Long Sequences with Structured State
| Spaces
|
| https://www.youtube.com/watch?v=luCBXCErkCs
|
| They made one of the best time series models and it later
| became one of the best language models too (Mamba).
| whimsicalism wrote:
| I have already watched that talk and know Albert Gu. His work
| is not about a "foundational" time series model but rather a
| task specific one.
| klysm wrote:
| I think there are some generalizable notions of multiscale
| periodicity that could get embedded into some kind of latent
| space.
| polskibus wrote:
| how good is it on stocks?
| svaha1728 wrote:
| The next index fund should use AI. What could possibly go
| wrong?
| whimsicalism wrote:
| I promise you your market-making counterparties already are.
| hackerlight wrote:
| What kind of things are they doing with AI?
| whimsicalism wrote:
| Predicting price movements, finding good hedges, etc.
| claytonjy wrote:
| if I knew it was good, why would I tell you that?
| fedeb95 wrote:
| it doesn't apply. Checkout the Incerto by Nassim Nicholas
| Taleb.
| esafak wrote:
| Is anyone using neural networks for anomaly detection in
| observability? If so, which model and how many metrics are you
| supporting per core?
| leeoniya wrote:
| LSTM is common for this.
|
| also https://facebook.github.io/prophet/
| morkalork wrote:
| How data hungry is it, or what is the minimum volume of data
| needed before its worth investigating?
| viraptor wrote:
| The more complex the data is, the more you need. If your
| values are always 5, then you need only one data point.
| morkalork wrote:
| If your values were always 5,you wouldn't use an LSTM to
| model it either. So presumably there's a threshold for
| when LSTM becomes practical and useful, no?
| sarusso wrote:
| What do you mean by "observability"?
| esafak wrote:
| Telemetry. Dashboards. The application is knowing when a
| signal is anomalous.
|
| https://en.wikipedia.org/wiki/Observability_(software)
| sarusso wrote:
| Oh, yes I am working on that. Usually LSTM, exploring
| encoder-decoders and generative models, but also some
| simpler models based on periodic averages (which are
| surprisingly useful in some use cases). But I don't have
| per-core metrics.
| tiagod wrote:
| Depending on how stable your signal is, I've had good
| experience with seasonal ARIMA and LOESS (but it's not
| neural networks)
| optimalsolver wrote:
| When it comes to time series forecasting, if the method actually
| works, it sure as hell isn't being publicly released.
| baq wrote:
| and yet we have those huge llamas publicly available. these are
| computers that talk, dammit
| speedgoose wrote:
| Some times series are more predictable than others. Being good
| at predicting the predictable ones is useful.
|
| For example you can easily predict the weather with descent
| accuracy. Tomorrow is going to be about the same than today.
| From there you can work on better models.
|
| Or predicting a failure in a factory because a vibration
| pattern on an industrial machine always ended up in a massive
| failure after a few days.
|
| But I agree that if a model is good at predicting the stock
| market, it's not going to be released.
| mhh__ wrote:
| Dear googler or meta-er or timeseries transformer startup
| something-er: Please make a ChatGPT/chat.lmsys.org style
| interface for one of these that I can throw data at and see what
| happens.
|
| This one looks pretty easy to setup, in fairness, but some other
| models I've looked at have been surprisingly fiddly / locked
| behind an API.
|
| Perhaps such a thing already exists somewhere?
| wuj wrote:
| On a related note, Amazon also had a model for time series
| forecasting called Chronos.
|
| https://github.com/amazon-science/chronos-forecasting
| toasted-subs wrote:
| Something I've had issues with time series has been having to
| use relatively custom models.
|
| It's difficult to use off the shelf tools when starting with
| math models.
| claytonjy wrote:
| And like all deep learning forecasting models thus far, it
| makes for a nice paper but is not worth anyone using for a real
| problem. Much slower than the classical methods it fails to
| beat.
| p1esk wrote:
| That's what people said about CV models in 2011.
| claytonjy wrote:
| That's fair, but they stopped saying it about CV models in
| 2012. We've been saying this about foundational forecasting
| models since...2019 at least, probably earlier. But it is a
| harder problem!
| belter wrote:
| They also have Amazon Forecast with different algos -
| https://aws.amazon.com/forecast/
| aantix wrote:
| Would this be useful in predicting lat/long coordinates along a
| path? To mitigate issues with GPS drift.
|
| If not, what would be a useful model?
| smokel wrote:
| Map matching to a road network might be helpful here. For
| example, a Hidden Markov Model gives good results. See for
| instance this paper:
|
| "Hidden Markov map matching through noise and sparseness"
| (2009)
|
| https://www.microsoft.com/en-us/research/wp-content/uploads/...
| bbstats wrote:
| Kalman filter
| chaos_emergent wrote:
| "Why would you even try to predict the weather if you know it's
| going to be wrong?"
|
| - most OCs on this thread
| david_shi wrote:
| I have a few qualms with this app: 1. For a Linux user, you can
| already build such a system yourself quite trivially by getting
| an FTP account, mounting it locally with curlftpfs, and then
| using SVN or CVS on the mounted filesystem. From Windows or
| Mac, this FTP account could be accessed through built-in
| software.
|
| 2. It doesn't actually replace a USB drive. Most people I know
| e-mail files to themselves or host them somewhere online to be
| able to perform presentations, but they still carry a USB drive
| in case there are connectivity problems. This does not solve
| the connectivity issue.
|
| 3. It does not seem very "viral" or income-generating. I know
| this is premature at this point, but without charging users for
| the service, is it reasonable to expect to make money off of
| this?
| viraptor wrote:
| I'm not sure I understand two things here. Could someone clarify:
| 1. This is a foundation model, so you're expected to fine tune
| for your use case, right? (But readme doesn't mention tuning) 2.
| When submitting two series, do they impact each other in
| predictions?
| hm-nah wrote:
| Anyone have insights working with Ikigai's "Large Graphical
| Model" and how well it does on time-series? It's proprietary, but
| I'm curious how well it performs.
| celltalk wrote:
| If I give this model the first 100 prime numbers, does it give me
| back the rest of it? If so what is the circuit?
| fedeb95 wrote:
| how is the series of the first 100 prime numbers a time series
| ?
| DeathArrow wrote:
| It seems to me that predicting something based on time is rarely
| accurate and meaningful.
|
| Suppose you want to buy stocks? Would you look on a time based
| graph and buy according to that? Or you rather look at financial
| data, see earnings, profits? Wouldn't a graph that has financial
| performance on x-axis be more meaningful that one that has time?
|
| What if you research real estate in a particular area? Wouldn't
| be square footage a better measure than time?
| Terretta wrote:
| > _Would you look on a time based graph and buy according to
| that? Or you rather look at financial data, see earnings,
| profits?_
|
| Things affecting financials happen through time.
| DeathArrow wrote:
| All things happen through time, but my argument is that time
| might not be the best parameter to model relations.
| htrp wrote:
| Prophet 2.0
___________________________________________________________________
(page generated 2024-05-09 23:02 UTC)