[HN Gopher] Getting 50% (SoTA) on Arc-AGI with GPT-4o
___________________________________________________________________
Getting 50% (SoTA) on Arc-AGI with GPT-4o
Author : tomduncalf
Score : 26 points
Date : 2024-06-17 21:51 UTC (1 hours ago)
(HTM) web link (redwoodresearch.substack.com)
(TXT) w3m dump (redwoodresearch.substack.com)
| artninja1988 wrote:
| This pretty much confirms my suspicions that arc is too small of
| a domain to function as a scale resistant test for AGI, and the
| fact that this has happened in such a short time means that we
| will almost certainly see >85-90% results within a year from now,
| unless they really increase the difficulty of the hidden test set
| traject_ wrote:
| We don't actually know if it is SOTA, the previous SOTA solution
| also got around the same on the evaluation set.
| extr wrote:
| Very cool. When GPT-4 first came out I tried some very naive
| approaches using JSON representations on the puzzles [0], [1].
| GPT-4 did "okay", but in some cases it felt like it was falling
| for the classic LLM issue of saying all the right things but then
| then failing to grasp some critical bit of logic and missing the
| solution entirely.
|
| At the time I noticed that many of the ARC problems rely on
| visual-spatial priors that are "obvious" when viewing the grids,
| but become less so when transmuted to some other representation.
| Many of them rely on some kind of symmetry, counting, or the very
| human bias to assume a velocity or continued movement when seeing
| particular patterns.
|
| I had always thought maybe multimodality was key: the model needs
| to have similar priors around grounded physical spaces and
| movement to be able to do well. I'm not sure the OP really
| fleshes this line of thinking out, brute forcing python solutions
| is a very "non human" approach.
|
| [0] https://x.com/eatpraydiehard/status/1632671307254099968
|
| [1] https://x.com/eatpraydiehard/status/1632683214329479169
| greatpostman wrote:
| You know you're approaching AGI when creating benchmarks gets
| difficult. This is only just beginning
| rgbrgb wrote:
| [delayed]
___________________________________________________________________
(page generated 2024-06-17 23:00 UTC)