[HN Gopher] The surprising effectiveness of test-time training f...
___________________________________________________________________
The surprising effectiveness of test-time training for abstract
reasoning [pdf]
Author : trott
Score : 67 points
Date : 2024-11-11 16:23 UTC (6 hours ago)
(HTM) web link (www.mit.edu)
(TXT) w3m dump (www.mit.edu)
| mikeknoop wrote:
| Context: ARC Prize 2024 just wrapped up yesterday. ARC Prize's
| goal is to be a north star towards AGI. The two major categories
| of this year's progress seem to fall into "program synthesis" and
| "test-time fine tuning". Both of these techniques are adopted by
| DeepMind's impressive AlphaProof system [1]. And I'm personally
| excited to finally see actual code implementation of these ideas
| [2]!
|
| We still have a long way to go for the grand prize -- we'll be
| back next year. Also got some new stuff in the works for 2025.
|
| Watch for the official ARC Prize 2024 paper coming Dec 6. We're
| going to be overviewing all the new AI reasoning code and
| approaches open sourced via the competition [3].
|
| [1] https://deepmind.google/discover/blog/ai-solves-imo-
| problems...
|
| [2] https://github.com/ekinakyurek/marc
|
| [3] https://x.com/arcprize
| aithrowawaycomm wrote:
| I am a bit uncertain about the rules of the ARC-AGI contest,
| but would this program count? A good chunk of the logic of ARC
| is essentially hardcoded, including a Python function that
| checks whether or not the proposed solution makes sense.
|
| The point of the contest is to measure intelligence in general-
| purpose AI systems: it does not seem in the spirit of the
| contest that this AI would completely fail if the test was
| presented on a hexagonal grid.
| 0x1064 wrote:
| The point in the contest is to measure an algorithms ability
| to solve ARC problems specifically, no one believes that it's
| general-purpose AI. They're highly contrived problems by
| design.
| aithrowawaycomm wrote:
| My point is that the contest really should be "can solve
| ARC problems without having anything about ARC problems in
| its pre-training data or hard-coded in the design of the
| program." Otherwise these claims from ARC-AGI are simply
| false: Solving ARC-AGI represents a
| material stepping stone toward AGI. At minimum, solving
| ARC-AGI would result in a new programming paradigm. It
| would allow anyone, even those without programming
| knowledge, to create programs simply by providing a few
| input-output examples of what they want. This
| would dramatically expand who is able to leverage software
| and automation. Programs could automatically refine
| themselves when exposed to new data, similar to how humans
| learn. If found, a solution to ARC-AGI would be
| more impactful than the discovery of the Transformer. The
| solution would open up a new branch of technology.
|
| This program does not represent a "new paradigm" because it
| requires a bunch of human programming work specifically
| tailored to the problem, and it cannot be generalized. If
| software like this wins the contest that really shows the
| contest has nothing whatsoever to do with AGI.
| antonvs wrote:
| > Otherwise these claims from ARC-AGI are simply false:
|
| Currently, any claim about AGI other than "we're probably
| not anywhere close to strong AGI" is simply false.
|
| Of course a lot depends on one's definition of AGI. From
| another perspective, one could argue that ChatGPT 4 and
| similar models are already AGI.
| benchmarkist wrote:
| The contest is misnamed, solving ARC will not get us any
| closer to AGI.
| naasking wrote:
| Why?
| benchmarkist wrote:
| Because it's a set of puzzles on a 2D grid. We don't live
| on a 2D grid so it's already on the wrong track. A set of
| puzzles for a 3D sphere wouldn't get us any closer to AGI
| either but at least it would be a more realistic
| representation of the world and how a general purpose
| problem solver should approach reality. Even Minecraft
| would be a better test and lately people have started
| testing LLMs in virtual worlds which is a much better
| test case than ARC.
|
| Insofar as ARC is being used as a benchmark for code
| synthesis it might be somewhat successful but it doesn't
| seem like people are using code synthesis to solve the
| puzzles so it's not really clear how much success on ARC
| is going to advance the state of the art in AI and code
| synthesis according to a logical specification.
| sthlmb wrote:
| I initially read that as "Tea-Time" training and my inner Brit
| got a little excited..
| antonvs wrote:
| We won't achieve true AGI until the AGIs are demanding second
| breakfast.
| arjvik wrote:
| Test-Time Training is incredibly powerful. Most recently, it has
| been shown that Self-Attention can in fact be viewed through the
| lens of test-time training, with a kernel-smoother "learning"
| from context. Simply replacing that with more powerful models
| than a kernel-smoother result in very capable and scalable
| models!
|
| https://arxiv.org/abs/2407.04620
| zbyforgotp wrote:
| Is test time the same thing as inference time?
| whoisnnamdi wrote:
| yes
___________________________________________________________________
(page generated 2024-11-11 23:01 UTC)