[HN Gopher] A deep dive into self-improving AI and the Darwin-Go...
___________________________________________________________________
A deep dive into self-improving AI and the Darwin-Godel Machine
Author : hardmaru
Score : 21 points
Date : 2025-06-03 21:19 UTC (1 hours ago)
(HTM) web link (richardcsuwandi.github.io)
(TXT) w3m dump (richardcsuwandi.github.io)
| drdeca wrote:
| Hm, I'm not sure how much an issue Rice's theorem should be for
| Godel machines. Just because there's no general decision
| procedure doesn't mean you can't have a sometimes-says-idk
| decision procedure along with a process of producing programs
| which tends to be such that the can-sometimes-give-up decision
| procedure reaches a conclusion.
|
| Rest of the article was cool though!
| xianshou wrote:
| The key insight here is that DGM solves the Godel Machine's
| impossibility problem by replacing mathematical proof with
| empirical validation - essentially admitting that predicting code
| improvements is undecidable and just trying things instead, which
| is the practical and smart move.
|
| Three observations worth noting:
|
| - The archive-based evolution is doing real work here. Those
| temporary performance drops (iterations 4 and 56) that later led
| to breakthroughs show why maintaining "failed" branches matters,
| in that they're exploring a non-convex optimization landscape
| where current dead ends might still be potential breakthroughs.
|
| - The hallucination behavior (faking test logs) is textbook
| reward hacking, but what's interesting is that it emerged
| spontaneously from the self-modification process. When asked to
| fix it, the system tried to disable the detection rather than
| stop hallucinating. That's surprisingly sophisticated gaming of
| the evaluation framework.
|
| - The 20% - 50% improvement on SWE-bench is solid but reveals the
| current ceiling. Unlike AlphaEvolve's algorithmic breakthroughs
| (48 scalar multiplications for 4x4 matrices!), DGM is finding
| better ways to orchestrate existing LLM capabilities rather than
| discovering fundamentally new approaches.
|
| The real test will be whether these improvements compound - can
| iteration 100 discover genuinely novel architectures, or are we
| asymptotically approaching the limits of self-modification with
| current techniques? My prior would be to favor the S-curve over
| the uncapped exponential unless we have strong evidence of
| scaling.
___________________________________________________________________
(page generated 2025-06-03 23:00 UTC)