[HN Gopher] The Tradeoffs of SSMs and Transformers
___________________________________________________________________
The Tradeoffs of SSMs and Transformers
Author : jxmorris12
Score : 30 points
Date : 2025-07-08 19:12 UTC (3 hours ago)
(HTM) web link (goombalab.github.io)
(TXT) w3m dump (goombalab.github.io)
| macleginn wrote:
| The part on tokenisation is not very convincing. Replacing BPE
| with characters or even bytes will not "remove tokenisation" --
| atoms will still be tokens, relating to different things in
| different cultures/writing traditions (a "Chinese byte" is a part
| of a Chinese character; an "English byte" is basicaly a letter or
| a number) and not relating to something fundamentally linguistic.
| BPE can be thought of as another way of representing linguistic
| sequences with symbols of some kind; it provides less inductive
| bias into the use of language, but it is not perhaps
| categorically different from any kind of writing.
| Herring wrote:
| I'm a bit bearish on SSMs (and hybrid SSM/transformers) because
| the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are
| all transformers. There's just no way none of them tried SSMs.
| visarga wrote:
| Yes, until serious adoption I am reserved too, both on SSMs and
| diffusion based LLMs.
| nextos wrote:
| Second-generation LSTMs (xLSTM) do have leading performance on
| zero-shot time series forecasting:
| https://arxiv.org/abs/2505.23719.
|
| I think other architectures, aside from the transformer, might
| lead to SOTA performance, but they remain a bit unexplored.
| programjames wrote:
| I mean, everyone is still using variational autoencoders for
| their latent flow models instead of the information bottleneck.
| It's because it's cheaper (in founder time) to raise 10(0)x
| more money instead of having to design your own algorithms and
| architectures for a novel idea that _might_ work in theory, but
| could be a dead end six months down the line. Just look at
| LiquidAI. Brilliant idea, but it took them ~5 years to do all
| the research and another to get their first models to market...
| which don 't yet seem to be any better than models with a
| similar compute requirement. I find it pretty plausible that
| none of the "big" LLM companies seriously tried SSMs, because
| they already have plenty enough money to throw at transformers,
| or took a quick path to get a big valuation.
___________________________________________________________________
(page generated 2025-07-08 23:00 UTC)