[HN Gopher] The Continual Learning Problem
___________________________________________________________________
The Continual Learning Problem
Author : kiyanwang
Score : 23 points
Date : 2025-10-23 06:38 UTC (5 days ago)
(HTM) web link (jessylin.com)
(TXT) w3m dump (jessylin.com)
| mynti wrote:
| Super interesting blogpost. I just wonder how this is actually
| different to LORA, since LORA also adds some parameters and
| freezes the rest of the model. This seems like a sparse, memory
| efficient LORA with a couple of extra steps, since it uses
| attention again to make the sparsity work. All while making it a
| lot more effective compared to LORA (performance drop of only 11%
| compared to 71%).
| alyxya wrote:
| I think the solution to continual learning is as simple as using
| context distillation. We know that models are good at in-context
| learning, so we just want an efficient way to distill context
| into the weights. I suspect context rot may come from how the
| softmax in attention gets diluted with a longer context, so this
| wouldn't be an issue with context distillation.
___________________________________________________________________
(page generated 2025-10-28 23:00 UTC)