[HN Gopher] A deep dive into self-improving AI and the Darwin-Go...
       ___________________________________________________________________
        
       A deep dive into self-improving AI and the Darwin-Godel Machine
        
       Author : hardmaru
       Score  : 21 points
       Date   : 2025-06-03 21:19 UTC (1 hours ago)
        
 (HTM) web link (richardcsuwandi.github.io)
 (TXT) w3m dump (richardcsuwandi.github.io)
        
       | drdeca wrote:
       | Hm, I'm not sure how much an issue Rice's theorem should be for
       | Godel machines. Just because there's no general decision
       | procedure doesn't mean you can't have a sometimes-says-idk
       | decision procedure along with a process of producing programs
       | which tends to be such that the can-sometimes-give-up decision
       | procedure reaches a conclusion.
       | 
       | Rest of the article was cool though!
        
       | xianshou wrote:
       | The key insight here is that DGM solves the Godel Machine's
       | impossibility problem by replacing mathematical proof with
       | empirical validation - essentially admitting that predicting code
       | improvements is undecidable and just trying things instead, which
       | is the practical and smart move.
       | 
       | Three observations worth noting:
       | 
       | - The archive-based evolution is doing real work here. Those
       | temporary performance drops (iterations 4 and 56) that later led
       | to breakthroughs show why maintaining "failed" branches matters,
       | in that they're exploring a non-convex optimization landscape
       | where current dead ends might still be potential breakthroughs.
       | 
       | - The hallucination behavior (faking test logs) is textbook
       | reward hacking, but what's interesting is that it emerged
       | spontaneously from the self-modification process. When asked to
       | fix it, the system tried to disable the detection rather than
       | stop hallucinating. That's surprisingly sophisticated gaming of
       | the evaluation framework.
       | 
       | - The 20% - 50% improvement on SWE-bench is solid but reveals the
       | current ceiling. Unlike AlphaEvolve's algorithmic breakthroughs
       | (48 scalar multiplications for 4x4 matrices!), DGM is finding
       | better ways to orchestrate existing LLM capabilities rather than
       | discovering fundamentally new approaches.
       | 
       | The real test will be whether these improvements compound - can
       | iteration 100 discover genuinely novel architectures, or are we
       | asymptotically approaching the limits of self-modification with
       | current techniques? My prior would be to favor the S-curve over
       | the uncapped exponential unless we have strong evidence of
       | scaling.
        
       ___________________________________________________________________
       (page generated 2025-06-03 23:00 UTC)