[HN Gopher] Getting 50% (SoTA) on Arc-AGI with GPT-4o
       ___________________________________________________________________
        
       Getting 50% (SoTA) on Arc-AGI with GPT-4o
        
       Author : tomduncalf
       Score  : 26 points
       Date   : 2024-06-17 21:51 UTC (1 hours ago)
        
 (HTM) web link (redwoodresearch.substack.com)
 (TXT) w3m dump (redwoodresearch.substack.com)
        
       | artninja1988 wrote:
       | This pretty much confirms my suspicions that arc is too small of
       | a domain to function as a scale resistant test for AGI, and the
       | fact that this has happened in such a short time means that we
       | will almost certainly see >85-90% results within a year from now,
       | unless they really increase the difficulty of the hidden test set
        
       | traject_ wrote:
       | We don't actually know if it is SOTA, the previous SOTA solution
       | also got around the same on the evaluation set.
        
       | extr wrote:
       | Very cool. When GPT-4 first came out I tried some very naive
       | approaches using JSON representations on the puzzles [0], [1].
       | GPT-4 did "okay", but in some cases it felt like it was falling
       | for the classic LLM issue of saying all the right things but then
       | then failing to grasp some critical bit of logic and missing the
       | solution entirely.
       | 
       | At the time I noticed that many of the ARC problems rely on
       | visual-spatial priors that are "obvious" when viewing the grids,
       | but become less so when transmuted to some other representation.
       | Many of them rely on some kind of symmetry, counting, or the very
       | human bias to assume a velocity or continued movement when seeing
       | particular patterns.
       | 
       | I had always thought maybe multimodality was key: the model needs
       | to have similar priors around grounded physical spaces and
       | movement to be able to do well. I'm not sure the OP really
       | fleshes this line of thinking out, brute forcing python solutions
       | is a very "non human" approach.
       | 
       | [0] https://x.com/eatpraydiehard/status/1632671307254099968
       | 
       | [1] https://x.com/eatpraydiehard/status/1632683214329479169
        
       | greatpostman wrote:
       | You know you're approaching AGI when creating benchmarks gets
       | difficult. This is only just beginning
        
       | rgbrgb wrote:
       | [delayed]
        
       ___________________________________________________________________
       (page generated 2024-06-17 23:00 UTC)