[HN Gopher] Atlas: Learning to Optimally Memorize the Context at...
___________________________________________________________________
Atlas: Learning to Optimally Memorize the Context at Test Time
Author : og_kalu
Score : 26 points
Date : 2025-05-31 14:13 UTC (8 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| arjvik wrote:
| The Titans papers and the Test-Time Training papers
| (https://arxiv.org/abs/2407.04620) both have the same premise -
| models should "learn" from their context rather than memorize
| them. Very promising direction!
| cgearhart wrote:
| It seems like there's been a lot of progress here, but it also
| seems like there's an elephant in the room that RNNs will
| _always_ have worse memory than self-attention since the latter
| always has complete access to the full context. We pay for that
| in other ways, but it seems like the unstated hypothesis of RNNs
| is that we believe in the long run that RNNs will be "good
| enough" and their other performance benefits will eventually
| prevail. I'm not convinced that humanity will ever sink
| comparable resources into optimizing this family of models that
| has gone into Transformers to make them practical at the scale
| they run today.
___________________________________________________________________
(page generated 2025-05-31 23:00 UTC)