[HN Gopher] The surprising effectiveness of test-time training f...
       ___________________________________________________________________
        
       The surprising effectiveness of test-time training for abstract
       reasoning [pdf]
        
       Author : trott
       Score  : 67 points
       Date   : 2024-11-11 16:23 UTC (6 hours ago)
        
 (HTM) web link (www.mit.edu)
 (TXT) w3m dump (www.mit.edu)
        
       | mikeknoop wrote:
       | Context: ARC Prize 2024 just wrapped up yesterday. ARC Prize's
       | goal is to be a north star towards AGI. The two major categories
       | of this year's progress seem to fall into "program synthesis" and
       | "test-time fine tuning". Both of these techniques are adopted by
       | DeepMind's impressive AlphaProof system [1]. And I'm personally
       | excited to finally see actual code implementation of these ideas
       | [2]!
       | 
       | We still have a long way to go for the grand prize -- we'll be
       | back next year. Also got some new stuff in the works for 2025.
       | 
       | Watch for the official ARC Prize 2024 paper coming Dec 6. We're
       | going to be overviewing all the new AI reasoning code and
       | approaches open sourced via the competition [3].
       | 
       | [1] https://deepmind.google/discover/blog/ai-solves-imo-
       | problems...
       | 
       | [2] https://github.com/ekinakyurek/marc
       | 
       | [3] https://x.com/arcprize
        
         | aithrowawaycomm wrote:
         | I am a bit uncertain about the rules of the ARC-AGI contest,
         | but would this program count? A good chunk of the logic of ARC
         | is essentially hardcoded, including a Python function that
         | checks whether or not the proposed solution makes sense.
         | 
         | The point of the contest is to measure intelligence in general-
         | purpose AI systems: it does not seem in the spirit of the
         | contest that this AI would completely fail if the test was
         | presented on a hexagonal grid.
        
           | 0x1064 wrote:
           | The point in the contest is to measure an algorithms ability
           | to solve ARC problems specifically, no one believes that it's
           | general-purpose AI. They're highly contrived problems by
           | design.
        
             | aithrowawaycomm wrote:
             | My point is that the contest really should be "can solve
             | ARC problems without having anything about ARC problems in
             | its pre-training data or hard-coded in the design of the
             | program." Otherwise these claims from ARC-AGI are simply
             | false:                 Solving ARC-AGI represents a
             | material stepping stone toward AGI. At minimum, solving
             | ARC-AGI would result in a new programming paradigm. It
             | would allow anyone, even those without programming
             | knowledge, to create programs simply by providing a few
             | input-output examples of what they want.            This
             | would dramatically expand who is able to leverage software
             | and automation. Programs could automatically refine
             | themselves when exposed to new data, similar to how humans
             | learn.            If found, a solution to ARC-AGI would be
             | more impactful than the discovery of the Transformer. The
             | solution would open up a new branch of technology.
             | 
             | This program does not represent a "new paradigm" because it
             | requires a bunch of human programming work specifically
             | tailored to the problem, and it cannot be generalized. If
             | software like this wins the contest that really shows the
             | contest has nothing whatsoever to do with AGI.
        
               | antonvs wrote:
               | > Otherwise these claims from ARC-AGI are simply false:
               | 
               | Currently, any claim about AGI other than "we're probably
               | not anywhere close to strong AGI" is simply false.
               | 
               | Of course a lot depends on one's definition of AGI. From
               | another perspective, one could argue that ChatGPT 4 and
               | similar models are already AGI.
        
           | benchmarkist wrote:
           | The contest is misnamed, solving ARC will not get us any
           | closer to AGI.
        
             | naasking wrote:
             | Why?
        
               | benchmarkist wrote:
               | Because it's a set of puzzles on a 2D grid. We don't live
               | on a 2D grid so it's already on the wrong track. A set of
               | puzzles for a 3D sphere wouldn't get us any closer to AGI
               | either but at least it would be a more realistic
               | representation of the world and how a general purpose
               | problem solver should approach reality. Even Minecraft
               | would be a better test and lately people have started
               | testing LLMs in virtual worlds which is a much better
               | test case than ARC.
               | 
               | Insofar as ARC is being used as a benchmark for code
               | synthesis it might be somewhat successful but it doesn't
               | seem like people are using code synthesis to solve the
               | puzzles so it's not really clear how much success on ARC
               | is going to advance the state of the art in AI and code
               | synthesis according to a logical specification.
        
       | sthlmb wrote:
       | I initially read that as "Tea-Time" training and my inner Brit
       | got a little excited..
        
         | antonvs wrote:
         | We won't achieve true AGI until the AGIs are demanding second
         | breakfast.
        
       | arjvik wrote:
       | Test-Time Training is incredibly powerful. Most recently, it has
       | been shown that Self-Attention can in fact be viewed through the
       | lens of test-time training, with a kernel-smoother "learning"
       | from context. Simply replacing that with more powerful models
       | than a kernel-smoother result in very capable and scalable
       | models!
       | 
       | https://arxiv.org/abs/2407.04620
        
       | zbyforgotp wrote:
       | Is test time the same thing as inference time?
        
         | whoisnnamdi wrote:
         | yes
        
       ___________________________________________________________________
       (page generated 2024-11-11 23:01 UTC)