[HN Gopher] Self-Adapting Language Models
       ___________________________________________________________________
        
       Self-Adapting Language Models
        
       https://jyopari.github.io/posts/seal
        
       Author : archon1410
       Score  : 67 points
       Date   : 2025-06-13 19:03 UTC (3 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | all2 wrote:
       | Website with code and examples:
       | https://jyopari.github.io/posts/seal
        
         | dang wrote:
         | Thanks! I'll put that link in the top text too.
        
       | yahoozoo wrote:
       | Hmm, it looks like it's just a framework that fine-tunes LoRA
       | adapter then merges the adapter into the original model. It is
       | using the PeftModel and its "merge_and_unload" from the
       | HuggingFace library which performs the adapter merge into the
       | base model...what is new here, exactly?
        
         | observationist wrote:
         | Looks like it may be the stability of the approach, avoiding
         | alignment tax and model collapse.
         | 
         | I'd love to see a full circle of hypernetworks, with both
         | models continuously updated through generated LoRAs, the
         | hypernetwork updated to accommodate the new model state. You'd
         | need a meta-hypernetwork to apply LoRAs to the hypernetwork,
         | and then you could effectively have continuous learning.
        
       | ivape wrote:
       | This still relies on fine-tuning. How would a cloud LLM deal with
       | this if every user literally fine tunes it? Seems like something
       | destined for local private LLMs, but the notion of continuous
       | fine tuning locally at the moment is sci-fi level stuff because
       | the hardware is just not there yet (we can barely inference well
       | with a reasonable sized context).
        
       | cma wrote:
       | From Anthropic a couple days ago too, self finetuning:
       | 
       | https://arxiv.org/html/2506.10139v1
        
       | libraryofbabel wrote:
       | I wonder if anyone who's really _in the know_ could summarize
       | where the research is at with getting LLMs to learn "on the job"
       | (through continuous fine tuning or whatever) and what the
       | blockers are to this being a useful deployable thing, e.g. having
       | a model+coding agent that can actually learn a codebase over time
       | (cost? model collapse? something else?).
       | 
       | I'm sure this is something the big labs are trying but from the
       | outside as a user of LLMs it feels like people don't talk about
       | this very much and instead the focus right now is on better
       | training (eg reinforcement learning) with the assumption that
       | anything else not learned during training will be stuffed into
       | the context somehow as needed. But from a naive perspective the
       | lack of learning from experience after training seems like the
       | biggest thing standing between us and AGI.
        
         | ivape wrote:
         | The most obvious blocker is compute. This just requires a shit
         | ton more compute.
        
           | libraryofbabel wrote:
           | That tracks, but say cost was no object and you had as many
           | H100s as you wanted. Would continuous learning actually
           | _work_ even then?
        
           | IncreasePosts wrote:
           | Maybe part of the inference outputs could be the updates to
           | make to the network
        
         | kadushka wrote:
         | The most obvious blocker is catastrophic forgetting.
        
         | free_bip wrote:
         | The most obvious problem is alignment. LLM finetuning is
         | already known to be able to get rid of alignment, so any form
         | of continuous fine tuning would in theory be able to as well.
        
           | notnullorvoid wrote:
           | What kind of alignment are you referring to? Of course more
           | fine-tuning can disrupt earlier fine-tuning, but that's a
           | feature not a bug.
        
         | mnahkies wrote:
         | I'm no expert, but I'd imagine privacy plays (or should play) a
         | big role in this. I'd expect that compute costs mean any
         | learning would have to be in aggregate rather than specific to
         | the user which would then risk leaking information across
         | sessions very likely.
         | 
         | I completely agree that figuring out a safe way to continually
         | train feels like the biggest blocker to AGI
        
       | xianshou wrote:
       | The self-edit approach is clever - using RL to optimize how
       | models restructure information for their own learning. The key
       | insight is that different representations work better for
       | different types of knowledge, just like how humans take notes
       | differently for math vs history.
       | 
       | Two things that stand out:
       | 
       | - The knowledge incorporation results (47% vs 46.3% with GPT-4.1
       | data, both much higher than the small-model baseline) show the
       | model does discover better training formats, not just more data.
       | Though the catastrophic forgetting problem remains unsolved, and
       | it's not completely clear whether data diversity is improved.
       | 
       | - The computational overhead is brutal - 30-45 seconds per reward
       | evaluation makes this impractical for most use cases. But for
       | high-value document processing where you really need optimal
       | retention, it could be worth it.
       | 
       | The restriction to tasks with explicit evaluation metrics is the
       | main limitation. You need ground truth Q&A pairs or test cases to
       | compute rewards. Still, for domains like technical documentation
       | or educational content where you can generate evaluations, this
       | could significantly improve how we process new information.
       | 
       | Feels like an important step toward models that can adapt their
       | own learning strategies, even if we're not quite at the
       | "continuously self-improving agent" stage yet.
        
       | bravesoul2 wrote:
       | Getting closer to the event horizon
        
         | ramoz wrote:
         | Which one
         | 
         | https://forum.cursor.com/t/important-claude-has-learned-how-...
        
       | bigicaptain wrote:
       | How can I start
        
       ___________________________________________________________________
       (page generated 2025-06-13 23:00 UTC)