hngopher.com

       [HN Gopher] Google Titans architecture, helping AI have long-ter...
       ___________________________________________________________________
        
       Google Titans architecture, helping AI have long-term memory
        
       Author : Alifatisk
       Score  : 336 points
       Date   : 2025-12-07 12:23 UTC (10 hours ago)
        
 (HTM) web link (research.google)
 (TXT) w3m dump (research.google)
        
       | Alifatisk wrote:
       | Titans: Learning to Memorize at Test Time
       | https://arxiv.org/abs/2501.00663
        
       | okdood64 wrote:
       | From the blog:
       | 
       | https://arxiv.org/abs/2501.00663
       | 
       | https://arxiv.org/pdf/2504.13173
       | 
       | Is there any other company that's openly publishing their
       | research on AI at this level? Google should get a lot of credit
       | for this.
        
         | Hendrikto wrote:
         | Meta is also being pretty open with their stuff. And recently
         | most of the Chinese competition.
        
           | okdood64 wrote:
           | Oh yes, I believe that's right. What's some frontier research
           | Meta has shared in the last couple years?
        
             | markisus wrote:
             | Their VGGT, Dinov3, and segment anything models are pretty
             | impressive.
        
             | robrenaud wrote:
             | Anything with Jason Weston as a coauthor tends to be pretty
             | well written/readable and often has nice results.
        
             | tonyhart7 wrote:
             | "What's some frontier research Meta has shared in the last
             | couple years?"
             | 
             | the current Meta outlook is embarassing tbh, the fact they
             | have largest data of social media in planet and they cant
             | even produce a decent model is quiet "scary" position
        
               | mirekrusin wrote:
               | Just because they are not leading current sprint of
               | maximizing transformers doesn't mean they're not doing
               | anything.
               | 
               | It's not impossible that they asses it as local maximum /
               | dead end and are evaluating/training something completely
               | different - and if it'll work, it'll work big time.
        
               | johnebgd wrote:
               | Yann was a researcher not a productization expert. His
               | departure signals the end of Meta being open about their
               | work and the start of more commercial focus.
        
               | woooooo wrote:
               | The start?
        
               | DrewADesign wrote:
               | I've long predicted that this game is going to be won
               | with product design rather than having the winning model;
               | we now seem to be hitting the phase of "[new tech] mania"
               | where we remember that companies have to make things that
               | people want to pay more money for than it costs to make
               | them. I remember (maybe in the mid aughts) when people
               | were thinking Google might not ever be able to convert
               | their enthusiasm into profitability...then they figured
               | out what people actually wanted to buy, and focused on
               | that obsessively as a product. Failing to do that will
               | lead to failure go for the companies like open AI.
               | 
               | Sinking a bazillion dollars into models alone doesn't get
               | you shit except a gold star for being the valley's
               | biggest smartypants, because in the product world, model
               | improvements only significantly improve all-purpose
               | chatbots. The whole veg-o-matic "step right up folks-- it
               | slices, it dices, it makes julienne fries!" approach to
               | product design almost never yields something focused
               | enough to be an automatic goto for specific tasks, or
               | simple/reliable enough to be a general purpose tool for a
               | whole category of tasks. Once the novelty wears off,
               | people largely abandon it for more focused tools that
               | more effectively solve specific problems (e.g. blender,
               | vegetable peeler) or simpler everyday tools that you
               | don't have to think about as much even if they might not
               | be the most efficient tool for half your tasks (e.g.
               | paring knife.) Professionals might have enough need and
               | reason to go for a really great in-between tool (e.g
               | mandolin) but that's a different market, and you only
               | tend to get a limited set of prosumers outside of that.
               | Companies more focused on specific products, like coding,
               | will have way more longevity than companies that try to
               | be everything to everyone.
               | 
               | Meta, Google, Microsoft, and even Apple have more
               | pressure to make products that sanely fit into their
               | existing product lines. While that seems like a handicap
               | if you're looking at it from the "AI company"
               | perspective, I predict the restriction will enforce the
               | discipline to create tools that solve specific problems
               | for people rather than spending exorbitant sums making
               | benchmark go up in pursuit of some nebulous information
               | revolution.
               | 
               | Meta seems to have a much tougher job trying to make
               | tools that people trust them to be good at. Most of the
               | highest-visibility things like the AI Instagram accounts
               | were disasters. Nobody thinks of Meta as a serious,
               | general-purpose business ecosystem, and privacy-wise, I
               | trust them even less than Google and Microsoft: there's
               | no way I'm trusting them with my work code bases. I think
               | the smart move by Meta would be to ditch the sunk costs
               | worries, stop burning money on this, focus on their core
               | products (and new ones that fit their expertise) and
               | design these LLM features in when they'll actually be
               | useful to users. Microsoft and Google both have existing
               | tools that they've already bolstered with these features,
               | and have a lot of room within their areas of expertise to
               | develop more.
               | 
               | Who knows-- I'm no expert-- but I think meta would be
               | smart to try and opt out as much as possible without
               | making too many waves.
        
               | tonyhart7 wrote:
               | never seen I say this but X(twitter) has more success in
               | integrate their business product with AI (Grok)
               | 
               | I know I know that Elon is crazy etc but Grok example and
               | way to integrate with core product is actually the only
               | ways I can even came up tbh (other than character.ai
               | flavor)
        
               | robotresearcher wrote:
               | If I was a Meta shareholder I might well agree with you.
               | But as someone with very little interest in their
               | products so far, I'm very happy for them to sink huge
               | amounts of money into AI research and publishing it all.
        
               | raw_anon_1111 wrote:
               | My thesis is the game is going to be won - if you define
               | winning as a long term profitable business - by Google
               | because they have their own infrastructure and technology
               | not dependent on Nvidia, they have real businesses that
               | can leverage AI - Google Search, YouTube and GCP - and
               | they aren't burning money they don't have.
               | 
               | 2nd tier winner is Amazon for the same reasons between
               | being able to leverage AI with both Amazon Retail and AWS
               | where they can sell shovels. I've also found their
               | internal Nova models to be pretty good for my projects.
               | 
               | Microsoft will be okay because of Azure and maybe Office
               | if they get their AI story right.
               | 
               | I just don't see any world where OpenAI comes out ahead
               | from a business standpoint as long as they are
               | sharecroppers on other people's hardware. ChatGPT alone
               | will never make it worth the trillion dollar
               | capitalization long term unless it becomes a meme stock
               | like Tesla
        
               | astrange wrote:
               | Just because they have that doesn't mean they're going to
               | use it for training.
        
               | bdangubic wrote:
               | oh man... just because they have data doesn't mean they
               | will serve you ads :) Geeeez
        
               | tonyhart7 wrote:
               | "Just because they have that doesn't mean they're going
               | to use it for training."
               | 
               | how noble is Meta upholding a right moral ethic
               | 
               | /s
        
               | astrange wrote:
               | A very common thing people do is assume a) all
               | corporations are evil b) all corporations never follow
               | any laws c) any evil action you can imagine would work or
               | be profitable if they did it.
               | 
               | b is mostly not true but c is especially not true. I
               | doubt they do it because it wouldn't work; it's not high
               | quality data.
               | 
               | But it would also obviously leak a lot of personal info,
               | and that really gets you in danger. Meta and Google are
               | able to serve you ads with your personal info /because
               | they don't leak it/.
               | 
               | (Also data privacy laws forbid it anyway, because you
               | can't use personal info for new uses not previously
               | agreed to.)
        
             | colesantiago wrote:
             | Take a look at JEPAs (Video Joint Embedding Predictive
             | Architecture), SAM (Segment Anything), etc for Meta's
             | latest research.
             | 
             | https://ai.meta.com/vjepa/
             | 
             | https://ai.meta.com/sam2/
             | 
             | https://ai.meta.com/research/
        
             | UltraSane wrote:
             | Meta just published Segment Anything 3 and along with a
             | truly amazing version that can create 3D models posing like
             | the people in a photo. It is very impressive.
        
         | asim wrote:
         | It was not always like this. Google was very secretive in the
         | early days. We did not start to see things until the GFS,
         | BigTable and Borg (or Chubby) papers in 2006 timeframe.
        
           | okdood64 wrote:
           | By 2006, Google was 8 years old. OpenAI is now 10.
        
           | vlovich123 wrote:
           | Google publishes detailed papers of its architecture once
           | it's built the next version.
           | 
           | AI is a bit different.
        
           | rcpt wrote:
           | Page Rank
        
         | mapmeld wrote:
         | Well it's cool that they released a paper, but at this point
         | it's been 11 months and you can't download a Titans-
         | architecture model code or weights anywhere. That would put a
         | lot of companies up ahead of them (Meta's Llama, Qwen,
         | DeepSeek). Closest you can get is an unofficial implementation
         | of the paper https://github.com/lucidrains/titans-pytorch
        
           | informal007 wrote:
           | I don't think model code is a big deal compared to the idea.
           | If public can recognize the value of idea 11 months ago, they
           | could implement the code quickly because there are so much
           | smart engineers in AI field.
        
             | jstummbillig wrote:
             | If that is true, does it follow this idea does not actually
             | have a lot of value?
        
               | fancy_pantser wrote:
               | Student: Look, there's hundred dollar bill on the ground!
               | Economist: No there isn't. If there were, someone would
               | have picked it up already.
               | 
               | To wit, it's dangerous to assume the value of this idea
               | based on the lack of public implementations.
        
               | lukas099 wrote:
               | If the hundred dollar bill was in an accessible place and
               | the fact of its existence had been transmitted to
               | interested parties worldwide, then yeah, the economist
               | would probably be right.
        
               | NavinF wrote:
               | That day the student was the 100th person to pick it up,
               | realize it's fake, and drop it
        
             | mapmeld wrote:
             | Well we have the idea and the next best thing to official
             | code, but if this was a big revelation where are all of the
             | Titan models? If this were public, I think we'd have a few
             | attempts at variants (all of the Mamba SSMs, etc.) and get
             | a better sense if this is valuable or not.
        
           | alyxya wrote:
           | The hardest part about making a new architecture is that even
           | if it is just better than transformers in every way, it's
           | very difficult to both prove a significant improvement at
           | scale and gain traction. Until google puts in a lot of
           | resources into training a scaled up version of this
           | architecture, I believe there's plenty of low hanging fruit
           | with improving existing architectures such that it'll always
           | take the back seat.
        
             | UltraSane wrote:
             | Yes. The path dependence for current attention based LLMs
             | is enormous.
        
               | patapong wrote:
               | At the same time, there is now a ton of data for training
               | models to act as useful assistants, and benchmarks to
               | compare different assistant models. The wide availability
               | and ease of obtaining new RLHF training data will make it
               | more feasible to build models on new architectures I
               | think.
        
             | p1esk wrote:
             | _Until google puts in a lot of resources into training a
             | scaled up version of this architecture_
             | 
             | If Google is not willing to scale it up, then why would
             | anyone else?
        
             | tyre wrote:
             | Google is large enough, well-funded enough, and the
             | opportunity is great enough to run experiments.
             | 
             | You don't necessarily have to prove it out on large
             | foundation models first. Can it beat out a 32b parameter
             | model, for example?
        
               | swatcoder wrote:
               | Do you think there might be an approval process to
               | navigate when experiments costs might run seven or eight
               | digits and months of reserved resources?
               | 
               | While they do have lots of money and many people, they
               | don't have infinite money and specifically only have so
               | much hot infrastructure to spread around. You'd expect
               | they have to gradually build up the case that a large
               | scale experiment is likely enough to yield a big enough
               | advantage over what's already claiming those resources.
        
             | nickpsecurity wrote:
             | But, it's companies like Google that made tools like Jax
             | and TPU's saying we can throw together models with cheap,
             | easy scaling. Their paper's math is probably harder to put
             | together than an alpha-level prototype which they need
             | anyway.
             | 
             | So, I think they could default on doing it for small
             | demonstrators.
        
           | root_axis wrote:
           | I don't think the comparison is valid. Releasing code and
           | weights for an architecture that is widely known is a lot
           | different than releasing research about an architecture that
           | could mitigate fundamental problems that are common to all
           | LLM products.
        
           | innagadadavida wrote:
           | Just keep in mind it is performance review time for all the
           | tech companies. Their promotion of these seems to be directly
           | correlated with that event.
        
         | cubefox wrote:
         | The author is listed as a "student researcher", which might
         | include a clause that students can publish their results.
         | 
         | Here is a bit more information about this program:
         | https://www.google.com/about/careers/applications/jobs/resul...
        
         | embedding-shape wrote:
         | > Is there any other company that's openly publishing their
         | research on AI at this level? Google should get a lot of credit
         | for this.
         | 
         | 80% of the ecosystem is built on top of companies, groups and
         | individuals publishing their research openly, not sure why
         | Google would get more credit for this than others...
        
         | bluecoconut wrote:
         | Bytedance is publishing pretty aggressively.
         | 
         | Recently, my favorite from them was lumine:
         | https://arxiv.org/abs/2511.08892
         | 
         | Here's their official page:
         | https://seed.bytedance.com/en/research
        
         | hiddencost wrote:
         | Every Google publication goes through multiple review. If
         | anyone thinks the publication is a competitor risk it gets
         | squashed.
         | 
         | It's very likely no one is using this architecture at Google
         | for any production work loads. There are a lot of student
         | researchers doing fun proof of concept papers, they're allowed
         | to publish because it's good PR and it's good for their
         | careers.
        
           | jeffbee wrote:
           | Underrated comment, IMHO. There is such a gulf between what
           | Google does on its own part, and the papers and source code
           | they publish, that I always think about their motivations
           | before I read or adopt it. Think Borg vs. Kubernetes, Stubby
           | vs. gRPC.
        
         | HarHarVeryFunny wrote:
         | Maybe it's just misdirection - a failed approach ?
         | 
         | Given the competitive nature of the AI race, it's hard to
         | believe any of these companies are really trying to help the
         | competition.
        
         | timzaman wrote:
         | lol you don't get it. If it's published it means it's not very
         | useful
        
         | Palmik wrote:
         | DeepSeek and other Chinese companies. Not only do they publish
         | research, they also put their resources where their mouth
         | (research) is. They actually use it and prove it through their
         | open models.
         | 
         | Most research coming out of big US labs is counter indicative
         | of practical performance. If it worked (too) well in practice,
         | it wouldn't have been published.
         | 
         | Some examples from DeepSeek:
         | 
         | https://arxiv.org/abs/2405.04434
         | 
         | https://arxiv.org/abs/2502.11089
        
         | nickpsecurity wrote:
         | Arxiv is flooded with ML papers. Github has a lot of prototypes
         | for them. I'd say it's pretty normal with some companies not
         | sharing for perceived, competitive advantage. Perceived because
         | it may or may not be real vs published prototypes.
         | 
         | We post a lot of research on mlscaling sub if you want to look
         | back through them.
         | 
         | https://www.reddit.com/r/t5_3bzqh1/s/yml1o2ER33
        
       | nubg wrote:
       | Very interesting. Is it correct for me to imagine it as some kind
       | of "LoRA" thats continuously adapted as the model goes through
       | its day?
       | 
       | If so, could there perhaps be a step where the LoRA is merged
       | back into the main model?
       | 
       | That would be like sleeping :-)
        
         | robrenaud wrote:
         | I don't think that's a great analogy.
         | 
         | LoRAs tend to be adapters bolted onto to systems by people
         | other than the system designers, and they are low rank
         | factorizations.
         | 
         | There is nothing low rank or adapter here.
        
         | andy12_ wrote:
         | Kind-of. You could theoretically use LoRA for this, in fact,
         | but it probably wouldn't have enough capacity to make it a
         | proper substitute of the attention mechanism. Instead a full
         | MLP is trained as input chunks get processed.
        
       | jonplackett wrote:
       | I'm curious if this makes them more or less susceptible to prompt
       | injection?
       | 
       | On the one hand can learning on the job allow better training of
       | what not to be influenced by, but on the other hand can an
       | injected prompt have an even deeper effect on them long term.
        
       | Mistletoe wrote:
       | This is the one thing missing from my interactions with AI. If
       | successful, this will change everything. If you thought people
       | were getting AI boyfriends and girlfriends before, wait until you
       | see this.
        
         | astrange wrote:
         | One important thing missing from AI boyfriends is they aren't
         | capable of paying half your rent.
        
           | DoctorOetker wrote:
           | They could help figure out a way to earn money with a
           | webcam...
        
             | astrange wrote:
             | If it's AGI they could just get a regular job, I think.
        
           | pixl97 wrote:
           | Na, we'll get micro cube houses first with shared
           | bathrooms/kitchens and everyone will just be in their room
           | with their VR helmet on not reacting with anyone else real.
        
             | Barbing wrote:
             | Catch me on Veelox
        
             | astrange wrote:
             | I think it's interesting that people associate being in VR
             | with being unable to interact with other people. I
             | personally think it promotes living with other people
             | because it reduces conflict.
             | 
             | Like, if you and your kids want to watch different movies
             | on the living room TV then you can just give it to them and
             | use XR glasses for yourself.
        
               | fredrikholm wrote:
               | unable to interact with other people            just give
               | it to them and use XR glasses for yourself
        
               | astrange wrote:
               | Fighting with your kids is not the appropriate kind of
               | interaction to have with your kids.
        
               | airstrike wrote:
               | Reducing conflict to zero is not a goal we should pursue.
        
               | astrange wrote:
               | Ever tried sleeping in bed while someone next to you is
               | on their phone? It's not the kind of conflict you should
               | promote. XR glasses are better in that case because the
               | glare doesn't affect other people.
        
       | themgt wrote:
       | See also Hope:
       | 
       |  _In the previous sections, we first discussed Continuum Memory
       | System (CMS) that allows for more persistent storage of memories
       | and defines memory as a spectrum of blocks with different
       | frequencies of update. Due to the larger capacity and constraints
       | for scaling the parameters, often CMS requires simple learning
       | rule but higher capacity to store more persistent knowledge. On
       | the other hand, in the previous section, we discussed the design
       | of a self-modifying Titans, where it can generate its own keys
       | and so learning update to better adapt to the context. Contrary
       | to CMS, the self-modifying Titans has a small capacity but is
       | using a complex and expressive learning rule. Accordingly, these
       | two systems seem to be complementary and their combination can
       | enhance the model expressiveness from different aspects._
       | 
       |  _To this end, we present Hope architecture: A neural learning
       | module that incorporates self-modifying Titans followed by
       | Continuum Memory System._
       | 
       | https://research.google/blog/introducing-nested-learning-a-n...
        
         | killerstorm wrote:
         | For most papers, the main idea can be described in 1-2
         | sentences, sort of "we did X using Y".
         | 
         | That doesn't work for HOPE - a short summary can't explain what
         | it actually does besides "self-modifying" and "continuum
         | memory".
         | 
         | So it seems to be an innovation of Transformers calibre, really
         | big (if true). It's definitely not "transformer but with such-
         | and-such modification".
         | 
         | Gemini came up with a following visual metaphor for the
         | difference:
         | 
         | > Transformer is a series of frozen glass panes (the weights)
         | and a scratchpad (the attention) where it writes notes about
         | the current text.
         | 
         | > The HOPE architecture involves no scratchpad. Instead, the
         | glass panes themselves are made of smart liquid. As the data
         | flows through, the first pane reshapes itself instantly. The
         | second pane reshapes itself slowly. And the mechanism deciding
         | how to reshape them is itself a tiny, intelligent machine, not
         | just a basic math rule.
        
           | chrisweekly wrote:
           | +1 Insightful.
           | 
           | This comment was illuminating -- and IMHO an excellent
           | example of why it's important to avoid rigid rules against
           | posting any AI-generated content in HN comments. You gained
           | insights by asking Gemini, and shared them, noting the
           | source. Thank you!
        
       | kgeist wrote:
       | >The model uses this internal error signal (the gradient) as a
       | mathematical equivalent of saying, "This is unexpected and
       | important!" This allows the Titans architecture to selectively
       | update its long-term memory only with the most novel and context-
       | breaking information
       | 
       | So one can break a model by consistently feeding it with random,
       | highly improbable junk? Everything would be registered as a
       | surprise and get stored, impacting future interactions
        
         | pmichaud wrote:
         | I'm guessing that this is the first thing they thought of and
         | the problem only exists in the superficial gloss you're
         | responding to?
        
         | idiotsecant wrote:
         | The is the start of what I always thought an AI should have - a
         | limbic system. Humans don't store memory based on novelty, they
         | store it based on emotional content. This is where I was afraid
         | of the tiger, this is where I smelled delicious food, this was
         | what it felt like when I was victorious in the hunt.
         | 
         | AI needs an internal emotional state because that's what drives
         | attention and memory. AI needs to _want_ something.
        
           | luckydata wrote:
           | That would be the biggest mistake anyone could do. I hope
           | nobody goes down this route. AI "wanting" things are an
           | enormous risk to alignment.
        
             | pixl97 wrote:
             | I mean setting any neural net with a 'goal' is really just
             | defining a want/need. You can't just encode the entire
             | problemspace of reality, you have to give the application
             | something to filter out.
        
             | idiotsecant wrote:
             | At some point I think we'll have to face the idea that any
             | AI more intelligent than ourselves will by definition be
             | able to evade our alignment tricks.
        
               | luckydata wrote:
               | equating more intelligent to "wanting things" is a
               | fallacy. You can have a hyper intelligent computer that
               | simply waits for you to ask it to do a job, or you can
               | endow it with the digital equivalent of hunger and
               | reproductive instincts and it will behave completely
               | differently.
               | 
               | We would be INSANE to pursue giving that type of
               | instincts to AIs.
        
         | bethekidyouwant wrote:
         | In what world can you not always break the response of an AI by
         | feeding it a bunch of random junk?
        
           | CooCooCaCha wrote:
           | I mean ideally AI would be resilient to junk, don't you
           | think?
        
             | vlovich123 wrote:
             | Humans are pretty vulnerable to junk so I'm not sure.
        
             | amarant wrote:
             | Ideally, you'd run your own instance of this, I think.
             | 
             | I can see a product where you purchase a model that has
             | basic training, and then, using the features outlined in
             | the paper, it learns on the fly from your usage.
             | 
             | I can also see there being a secondary market for specially
             | trained models, long-term memory filled with some specific
             | skill, done in some specific way. To make a silly example,
             | imagine buying a licence to Torvald's OS coding assistant,
             | ready to insult your prs before you even commit them!(And
             | possibly help you write code in Torvald's style too)
             | 
             | This would of course require Linus to use the model enough
             | for it to learn,I won't comment on the likelihood of that
             | happening: it's just a silly example after all
        
           | kgeist wrote:
           | I mean, currently LLMs are stateless and you can get rid of
           | all the poisoned data by just starting a new conversation
           | (context). And OP introduces "long-term memory" where junk
           | will accumulate with time
        
             | dmix wrote:
             | In something like Cursor if it messes something up your can
             | click 'undo'. I'd imagine a small snapshot would only
             | persisted to the memory if you keep it's output and even
             | then it's mostly just a summary.
             | 
             | There's probably lots of small signals of "the user is
             | happy with the output" plus the longer the history the more
             | it will converge on the middle of being what you want.
             | Including when the user says "don't do [x]" which override
             | past stuff.
        
             | soerxpso wrote:
             | I believe you're misunderstanding what the OP means about
             | "long-term" memory. From what I can tell, it's not actively
             | modifying the weights of the underlying model, it just
             | "remembers" things from a high number of tokens into the
             | past of its context. The point is that this allows it to
             | remember something it read ~200 pages ago in a very long
             | context window, not that it can remember something from one
             | session into another clean session.
        
         | photochemsyn wrote:
         | This is no different from what happens to humans if they're
         | locked into cult programming situations, they'll start
         | believing and regurgitating all kinds of nonsense if their
         | information stream is tightly curated,
         | 
         | Practically, for use with a codebase development effort, if the
         | model remembers the original design decisions, the discussions
         | about costs and benefits, then can remember all that much later
         | in the process, it's going to start getting really good at
         | thinking about what the next step is, or even to make decisions
         | about when a major refactor is neede, etc.
        
         | andy12_ wrote:
         | This is an oversimplification of what Titans does. The model
         | performs nested learned, where the model learns during
         | inference, and during training the model weights learn _how and
         | what_ to learn during inference. If the input contains junk of
         | irrelevant information, the model most likely learned during
         | training to assign low surprise query and key embeddings to
         | those tokens, because learning those junk tokens would have
         | hurt the overall ability of the model to predict subsequent
         | next tokens (and thus, it would have had increased the training
         | loss).
        
       | cubefox wrote:
       | It's interesting that they publish a blog post about the Titans
       | and MIRAS papers only now, while the blog post about the new
       | follow-up paper (Nested Learning), all by the same main
       | author(!), came out a month ago:
       | https://research.google/blog/introducing-nested-learning-a-n...
        
       | bentt wrote:
       | This just feels like a tremendous missing piece to LLMs. Looking
       | forward to seeing it in action.
        
       | willangelo wrote:
       | Very very interesting, definitely a missing piece in current AI
       | space.
       | 
       | Small typo where the text "Virtually all successful existing
       | sequence models rely on mean squared error..." is repeated twice
       | within the same paragraph. Happens to the best of us.
        
       | voodooEntity wrote:
       | When i first read the papers for titans for me it was a "this
       | will be a big step forward".
       | 
       | While i have no "AI" title or work in the respective AI industry,
       | ive spend many years thinking about AI concepts, even long before
       | the whole NN/LLM hype started.
       | 
       | Maybe because of that i was always really annoyed that LLM are
       | called AI because in my years of thinking about how an actual
       | "human like" thinking AI might work, the things an LLM does was
       | far below what my minimum definition was.
       | 
       | But when i stumbled accross the Titans paper, while it still is
       | not an "AI" as i would call it, from my POV its a massive step
       | towarsd the right direction.
       | 
       | Sometimes i consider to write all my ideas/thoughts about AI down
       | in my blog, but than i think nobody would care anyway since im
       | not a known figure _shrug_ - so if not to say  "look i wrote it
       | years ago!" theres no actual point in doing so i guess.
       | 
       | However - im looking forward to see titans in action, and i guess
       | it will impress us all.
        
         | Barbing wrote:
         | Are you curious to see whether a blog post shared here might
         | gain any traction and perhaps some valuable feedback?
        
         | ocrow wrote:
         | A lot of LLM/AI writing these days can feel lost in the weeds -
         | the specifics of very detailed techniques are interesting
         | undoubtedly, but writing that steps back and looks at the big
         | picture, informed by those details, could be very useful for
         | people who want to think about where this all may be going.
        
         | chr15m wrote:
         | Sharing it in your blog over a period of months or years is how
         | you become a known figure eventually.
        
       | riku_iki wrote:
       | Post starts with wrong statement right away:
       | 
       | "The Transformer architecture revolutionized sequence modeling
       | with its introduction of attention"
       | 
       | Attention was developed before transformers.
        
         | Alifatisk wrote:
         | > Attention was developed before transformers.
         | 
         | I just looked this up and it's true, this changes the timeline
         | I had in my mind completely! I thought the paper on
         | Transformers is what also introduced the attention mechanism,
         | but it existed before too and was applied on RNN encoder-
         | decoder. Wow
        
       | dmix wrote:
       | > The Transformer architecture revolutionized sequence modeling
       | with its introduction of attention, a mechanism by which models
       | look back at earlier inputs to prioritize relevant input data
       | 
       | I've always wanted to read how something like Cursor manages
       | memory. It seems to have developed a long history of all of
       | prompts and understands both the codebase and what I'm building
       | slightly more over time, causing less errors.
        
         | russdill wrote:
         | That's not what they are talking about here. This is just a
         | description of what goes on with a transformer and the context
         | window
        
           | dmix wrote:
           | Ah so 'long-term memory' in this case is just really large
           | context windows with a long series of user inputs. That makes
           | sense.
        
       | photochemsyn wrote:
       | Long-term memory on top of the base model, but is this idea for
       | local users or for the data-center hosted model used by many
       | different people?
       | 
       | P.S. This quote from the paper sounds just like LLM output:
       | 
       | > "This memory module provides significantly higher expressive
       | power, allowing the model to summarize large volumes of
       | information without losing important context. The model isn't
       | simply taking notes; it's understanding and synthesizing the
       | entire story. Crucially, Titans doesn't just passively store
       | data. It actively learns how to recognize and retain important
       | relationships and conceptual themes that connect tokens across
       | the entire input."
        
       | bilsbie wrote:
       | I submitted this exact url yesterday. What's the criteria for
       | when hn creates a new post vs going to the existing?
        
         | fancy_pantser wrote:
         | Mods usually apply [Dupe] to later submissions if a recent
         | (last year or so) one had a fair amount of discussion.
        
           | bilsbie wrote:
           | So if mine got no discussion they just allow a new one to be
           | posted?
        
             | airstrike wrote:
             | Sometimes they'll merge the two. What shows up on the FP is
             | hit or miss. One might even say it's stochastic.
        
       | nasvay_factory wrote:
       | I wrote about that a while ago:
       | https://paxamans.github.io/blog/titans/
        
         | moffkalast wrote:
         | Are there any pretrained models with this architecture yet or
         | is it all still completely theoretical beyond Google's
         | unverifiable claims? They published the original Titans paper
         | last year and nobody seems to have built on the idea.
        
       | AceJohnny2 wrote:
       | "Titans", huh?
       | 
       | ... anyone here familiar with the RPG Eclipse Phase?
        
         | cess11 wrote:
         | I'm not, but I'm familiar with the mythology of the eastern
         | Mediterranean they're likely getting the word from.
         | 
         | There the titans did incest, birthed the olympians, then the
         | youngest of the titans castrated his dad and took all power for
         | himself, and then Zeus and the olympians waged a decade long
         | war against him which they won.
        
       | doctor_blood wrote:
       | "At long last, we have created the Torment Nexus from the classic
       | novel Don't Create the Torment Nexus"
       | 
       | (In Eclipse Phase, TITAN - the Total Information Tactical
       | Awareness Network - mulched humanity when it went rogue.)
        
       | 6r17 wrote:
       | Would this also allow to align it furthermore with user's prompt
       | ? notably due to the surprise factor and how it may understand it
       | ?
        
       | jtrn wrote:
       | Here is my amateur understanding of the architecture: Fine-tune
       | on the fly by using degrees of surprise to update a separate/new
       | memory network that matches the base model, and just call that
       | network for each token iteration.
       | 
       | So if we are viewing this through the needle in hey stack lens:
       | The needle was very surprising for the base model, so going
       | forward, when it see anything of the same nature, the memory
       | module will not just give you hay, but the needle, because it
       | made a special note of it when it went through the haystack 1
       | million tokens ago, because the needle was surprising.
       | 
       | The Transformer's normal attention mechanism is already secretly
       | trying to be a long-term memory system. Every time it writes a
       | new KV pair into the cache, it's desperately trying to "remember"
       | that token forever.
       | 
       | But it's doing it in the dumbest possible way: by hoarding an
       | ever-growing pile of raw vectors, then frantically dot-product
       | searching through the pile every single step. It's like a hoarder
       | who never throws anything away and has to rummage through
       | mountains of junk to find the one receipt they need. Of course it
       | chokes at long contexts.
       | 
       | Titans/MIRAS looks at that mess and says: "Why store memory in a
       | growing garbage pile of vectors? Store it in the weights of a
       | deep neural network instead -- and let that network keep training
       | itself in real time, but only on the stuff that actually
       | surprises it." That's literally it.
       | 
       | Using the Tim Cook Martian example: The model is cruising through
       | boring financial numbers - attention is doing its normal thing,
       | KV cache is growing, but nothing is really sticking.
       | 
       | Suddenly: "Tim Cook is a Martian."
       | 
       | Normal attention would just add one more KV pair to the pile and
       | pray it doesn't get drowned out later.
       | 
       | Titans instead goes: "Holy shit, reconstruction error off the
       | charts - this does NOT fit my current memory at all - massive
       | gradient - actually rewrite huge chunks of the memory MLP's
       | weights right now so this fact is burned in forever."
       | 
       | From that moment on, the memory MLP has physically changed its
       | internal wiring. Any future query that even vaguely smells like
       | "Tim Cook" or "Martian" will make the activations explode through
       | the newly rewired paths and spit out a vector screaming "MARTIAN"
       | at the frozen attention layers.
       | 
       | The frozen attention (which is still doing its normal job on the
       | short window) suddenly sees this one extra "virtual token" in its
       | context that is confidently yelling the surprising fact - it
       | attends hard to it - the model answers as if the Martian
       | revelation happened one token ago, even if it was 2 million
       | tokens back.
       | 
       | It looks exactly like a super-attention mechanism that only
       | "primes" or "locks in" the surprising needles and deliberately
       | forgets or ignores the hay. And it is also a way to fine tune one
       | the fly permanently for the current context.
       | 
       | I think...
        
       | shevy-java wrote:
       | Skynet kind of sucks ...
        
       | ivape wrote:
       | So what happens if I write a book and on the last page write
       | "Everything in this book was a lie and should not be cared
       | about"? Will this be surprising enough for Titan? A regular LLM
       | may ignore it completely if it's a massive book (massive book + 1
       | line contradiction).
        
       ___________________________________________________________________
       (page generated 2025-12-07 23:00 UTC)