[HN Gopher] The Continual Learning Problem
       ___________________________________________________________________
        
       The Continual Learning Problem
        
       Author : kiyanwang
       Score  : 23 points
       Date   : 2025-10-23 06:38 UTC (5 days ago)
        
 (HTM) web link (jessylin.com)
 (TXT) w3m dump (jessylin.com)
        
       | mynti wrote:
       | Super interesting blogpost. I just wonder how this is actually
       | different to LORA, since LORA also adds some parameters and
       | freezes the rest of the model. This seems like a sparse, memory
       | efficient LORA with a couple of extra steps, since it uses
       | attention again to make the sparsity work. All while making it a
       | lot more effective compared to LORA (performance drop of only 11%
       | compared to 71%).
        
       | alyxya wrote:
       | I think the solution to continual learning is as simple as using
       | context distillation. We know that models are good at in-context
       | learning, so we just want an efficient way to distill context
       | into the weights. I suspect context rot may come from how the
       | softmax in attention gets diluted with a longer context, so this
       | wouldn't be an issue with context distillation.
        
       ___________________________________________________________________
       (page generated 2025-10-28 23:00 UTC)