[HN Gopher] Francois Chollet: The Arc Prize and How We Get to AG...
       ___________________________________________________________________
        
       Francois Chollet: The Arc Prize and How We Get to AGI [video]
        
       Author : sandslash
       Score  : 159 points
       Date   : 2025-07-03 14:00 UTC (4 days ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | qoez wrote:
       | I feel like I'm the only one who isn't convinced getting a high
       | score on the ARC eval test means we have AGI. It's mostly about
       | pattern matching (and some of it ambiguous even for humans what
       | the actual true response aught to be). It's like how in humans
       | there's lots of different 'types' of intelligence, and just
       | overfitting on IQ tests doesn't in my mind convince me a person
       | is actually that smart.
        
         | avmich wrote:
         | Roughly speaking, the job of a medical doctor is to diagnose
         | the patient - and then, after the diagnosis is made, to apply
         | the healing from the book, corresponding to the diagnosis.
         | 
         | The diagnosis is pattern matching (again, roughly). It kinda
         | suggests that a lot of "intelligent" problems are focused on
         | pattern matching, and (relatively straightforward) application
         | of "previous experience". So, pattern matching can bring us a
         | great deal towards AGI.
        
           | AnimalMuppet wrote:
           | Pattern matching is instinct. (Or at least, instinct is a
           | kind of pattern matching. And once you learn the patterns,
           | pattern matching can become almost instinctual). And that's
           | fine, for things that fit the pattern. But a human-level
           | intelligence can also deal with problems for which there is
           | no pattern. (I mean, not always successfully - finding a
           | correct solution to a novel problem is difficult. But it is
           | within the capability of at least some humans.)
        
         | yorwba wrote:
         | I think the people behind the ARC Prize agree that getting a
         | high score doesn't mean we have AGI. (They already updated the
         | benchmark once to make it harder.) But an AGI should get a
         | similarly high score as humans do. So current models that get
         | very low scores are definitely not AGI, and likely quite far
         | away from it.
        
           | cubefox wrote:
           | > I think the people behind the ARC Prize agree that getting
           | a high score doesn't mean we have AGI
           | 
           | The benchmark was literally called ARC-AGI. Only after OpenAI
           | cracked it, they started backtracking and saying that it
           | doesn't test for true AGI. Which undermines the whole premise
           | of a benchmark.
        
         | whiplash451 wrote:
         | You're not the only one. ARC-AGI is a laudable effort, but its
         | fundamental premise is indeed debatable:
         | 
         | "We argue that human cognition follows strictly the same
         | pattern as human physical capabilities: both emerged as
         | evolutionary solutions to specific problems in specific
         | evironments" (from page 22 of On the Measure of Intelligence)
         | 
         | https://arxiv.org/pdf/1911.01547
        
           | Davidzheng wrote:
           | But I believe that because of this "even edge" thing which
           | people call of AI weakenesses being not necessarily same of
           | humans, once we run out of these tests which AI is worse than
           | humans it will actually in effect be very much superhuman. My
           | main evidence for this is leela-zero the Go AI who struggled
           | with ladders and some other aspects of Go play well into the
           | superhuman regime (in go it's easier to see when it's
           | superhuman bc you can have elos and play win-rates etc and
           | there's less room for debates)
        
         | energy123 wrote:
         | https://en.m.wikipedia.org/wiki/AI_effect
         | 
         | But on a serious note, I don't think Chollet would disagree.
         | ARC is a necessary but not sufficient condition, and he says
         | that, despite the unfortunate attention-grabbing name choice of
         | the benchmark. I like Chollet's view that we will know that AGI
         | is here when we can't come up with new benchmarks that separate
         | humans from AI.
        
         | loki_ikol wrote:
         | Well for most, the next steps are probably towards removing the
         | highly deterministic and discrete characteristics of current
         | approaches (we certainly don't think in lock steps). Those have
         | no measures. Even the creative aspect is undermined by those
         | characteristics.
        
         | kubb wrote:
         | AGI isn't defined anywhere, so it can be anything you want.
        
           | FrustratedMonky wrote:
           | Yes. And a lot of humans also don't pass for having AGI.
        
           | mindcrime wrote:
           | Oh, it's defined in lots of places. The problem is.. it's
           | defined in _lots_ of places!
        
         | oldge wrote:
         | Today's llms are fancy autocomplete but lack test time self
         | learning or persistent drive. By contrast, an AGI would
         | require: - A goal-generation mechanism (G) that can propose
         | objectives without external prompts - A utility function (U)
         | and policy p(a|s) enabling action selection and hierarchy
         | formation over extended horizons - Stateful memory (M) +
         | feedback integration to evaluate outcomes, revise plans, and
         | execute real-world interventions autonomously Without G, U, p,
         | and M operating llms remain reactive statistical predictors,
         | not human level intelligence.
        
           | KoolKat23 wrote:
           | I'd say we're not far off.
           | 
           | Looking at the human side, it takes a while to actually learn
           | something. If you've recently read something it remains in
           | your "context window". You need to dream about it, to think
           | about, to revisit and repeat until you actually learn it and
           | "update your internal model". We need a mechanism for
           | continuous weight updating.
           | 
           | Goal-generation is pretty much covered by your body
           | constantly drip-feeding your brain various hormones "ongoing
           | input prompts".
        
             | onemoresoop wrote:
             | > I'd say we're not far off.
             | 
             | How are we not far off? How can LLMs generate goals and
             | based on what?
        
               | NetRunnerSu wrote:
               | Minimize prediction errors.
        
               | tsurba wrote:
               | But are we close to doing that in real-time on any
               | reasonably large model? I don't think so.
        
               | FeepingCreature wrote:
               | You just train it on the goal. Then it has that goal.
               | 
               | Alternately, you can train it on following a goal and
               | then you have a system where you can specify a goal.
               | 
               | At sufficient scale, a model will already contain goal-
               | following algorithms because those help predict the next
               | token when the model is basetrained on goal-following
               | entities, ie. humans. Goal-driven RL then brings those
               | algorithms to prominence.
        
               | kelseyfrog wrote:
               | How do you figure goal generation and supervised goal
               | training are interchangeable?
        
               | kordlessagain wrote:
               | Random goal use is showing to be more important than
               | training. Although, last year someone trained on the fly
               | during the competition, which is pretty awesome when you
               | think about it.
        
             | NetRunnerSu wrote:
             | Yes, you're right, that's what we're doing.
             | 
             | https://github.com/dmf-archive/PILF
        
               | KoolKat23 wrote:
               | Very interesting, thanks for the link.
        
           | NetRunnerSu wrote:
           | In fact, there is no technical threshold anymore. As long as
           | the theory is in place, you can see such AGI at most half a
           | year. It will even be more energy efficient than the current
           | dense models.
           | 
           | https://dmf-archive.github.io/docs/posts/beyond-snn-
           | plausibl...
        
         | TheAceOfHearts wrote:
         | Getting a high score on ARC doesn't mean we have AGI and
         | Chollet has always said as much AFAIK, it's meant to push the
         | AI research space in a positive direction. Being able to solve
         | ARC problems is probably a pre-requisite to AGI. It's a
         | directional push into the fog of war, with the claim being that
         | we should explore that area because we expect it's relevant to
         | building AGI.
        
           | lostphilosopher wrote:
           | We don't really have a true test that means "if we pass this
           | test we have AGI" but we have a variety of tests (like ARC)
           | that we believe any true AGI would be able to pass. It's a
           | "necessary but not sufficient" situation. Also ties directly
           | to the challenge in defining what AGI really means. You see a
           | lot of discussions of "moving the goal posts" around AGI, but
           | as I see it we've never had goal posts, we've just got a
           | bunch of lines we'd expect to cross before reaching them.
        
             | MPSimmons wrote:
             | I don't think we actually even have a good definition of
             | "This is what AGI is, and here are the stationary goal
             | posts that, when these thresholds are met, then we will
             | have AGI".
             | 
             | If you judged human intelligence by our AI standards, then
             | would humans even pass as Natural General Intelligence?
             | Human intelligence tests are constantly changing, being
             | invalidated, and rerolled as well.
             | 
             | I maintain that today's modern LLMs would pass sufficiently
             | for AGI and is also very close to passing a Turing Test, if
             | measured in 1950 when the test was proposed.
        
               | fvdessen wrote:
               | Turing test is not really that meaningful anymore because
               | you can always detect the AI by text and timing patterns
               | rather than actual intelligence. In fact the most
               | reliable way to test for AI is probably to ask trivia
               | questions on various niche topics, I don't think any
               | human has as much breath of general knowledge as current
               | AIs.
        
               | QuadmasterXLII wrote:
               | The current definition and goal of AGI is "Artificial
               | intelligence good enough to replace every employee for
               | cheaper" and much of the difficulty people have in
               | defining it is cognitive dissonance about the goal.
        
               | fasterik wrote:
               | _> I don't think we actually even have a good definition
               | of "This is what AGI is, and here are the stationary goal
               | posts that, when these thresholds are met, then we will
               | have AGI"._
               | 
               | Not only do we not have that, I don't think it's possible
               | to have it.
               | 
               | Philosophers have known about this problem for centuries.
               | Wittgenstein recognized that most concepts don't have
               | precise definitions but instead behave more like family
               | resemblances. When we look at a family we recognize that
               | they share physical characteristics, even if there's no
               | single characteristic shared by all of them. They don't
               | need to unanimously share hair color, skin complexion,
               | mannerisms, etc. in order to have a family resemblance.
               | 
               | Outside of a few well-defined things in logic and
               | mathematics, concepts operate in the same way.
               | Intelligence isn't a well-defined concept, but that
               | doesn't mean we can't talk about different types of human
               | intelligence, non-human animal intelligence, or machine
               | intelligence in terms of family resemblances.
               | 
               | Benchmarks are useful tools for assessing relative
               | progress on well-defined tasks. But the decision of what
               | counts as AGI will always come down to fuzzy comparisons
               | and qualitative judgments.
        
             | tedy1996 wrote:
             | I have graduated with a degree in Software engineering and
             | i am bilingual (Bulgarian and English). Currently AI is
             | better than me in everything except adding big numbers or
             | writing code in really niche topics - for example code
             | golfing a Brainfuck interpreter or writing a Rubiks cube
             | solver. I believe AGI has been here for at least a year
             | now.
        
               | fvdessen wrote:
               | I suggest you to try to let the AI think through race
               | conditions scenarios in asynchronous programs; it is not
               | that good at these abstract reasoning tasks.
        
           | ummonk wrote:
           | "Being able to solve ARC problems is probably a pre-requisite
           | to AGI." - is it? Humans have general intelligence and most
           | can't solve the harder ARC problems.
        
             | adastra22 wrote:
             | They, and the other posters posting similar things, don't
             | mean human-like intelligence, or even the rigorously
             | defined solving of unconstrained problem spaces that
             | originally defined Artificial General Intelligence (in
             | contrast to "narrow" intelligence").
             | 
             | They mean an artificial god, and it has become a god of the
             | gaps: we have made artificial general intelligence, and it
             | is more human-like than god-like, and so to make a god we
             | must have it do XYZ precisely because that is something
             | which people can't do.
        
               | ummonk wrote:
               | Right, but there is a very clear term for that which they
               | should be using: ASI
        
             | satellite2 wrote:
             | Didn't he say that 70% in a random sample of the population
             | should get it right?
        
             | singron wrote:
             | https://arcprize.org/leaderboard
             | 
             | "Avg. Mturker" has 77% on ARC1 and costs $3/task. "Stem
             | Grad" has 98% on ARC1 and costs $10/task. I would love a
             | segment like "typical US office employee" or something else
             | in between since I don't think you need a stem degree to do
             | better than 77%.
             | 
             | It's also worth noting the "Human Panel" gets 100% on ARC2
             | at $17/task. All the "Human" models are on the score/cost
             | frontier and exceptional in their score range although too
             | expensive to win the prize obviously.
             | 
             | I think the real argument is that the ARC problems are too
             | abstract and obscure to be relevant to useful AGI, but I
             | think we need a little flexibility in that area so we can
             | have tests that can be objectively and mechanically graded.
             | E.g. "write a NYT bestseller" is an impractical test in
             | many ways even if it's closer to what AGI should be.
        
           | echelon wrote:
           | My problem with AGI is the lack of a simple, concrete
           | definition.
           | 
           | Can we formalize it as giving out a task expressible in, say,
           | n^m bytes of information that encodes a task of n^(m+q) real
           | algorithmic and verification complexity -- then solving that
           | task within a certain time, compute, and attempt bounds?
           | 
           | Something that captures "the AI was able to unwind the
           | underlying unspoken complexity of the novel problem".
           | 
           | I feel like one could map a variety of easy human "brain
           | teaser" type tasks to heuristics that fit within some
           | mathematical framework and then grow the formalism from
           | there.
        
             | kordlessagain wrote:
             | After researching this a fair amount, my opinion is that
             | consciousness/intelligence (can you have one without the
             | other?) emerges from some sort of weird entropy exchange in
             | domains in the brain. The theory goes that we aren't
             | conscious, but we DO consciousness, sometimes. Maybe
             | entropy, or the inverse of it, gives way to intelligence,
             | somehow.
             | 
             | This entropy angle has real theoretical backing. Some
             | researchers propose consciousness emerges from the brain's
             | ability to integrate information across different scales
             | and timeframes. This would essentially create temporary
             | "islands of low entropy" in neural networks. Giulio
             | Tononi's Integrated Information Theory suggests
             | consciousness corresponds to a system's ability to generate
             | integrated information, which relates to how it reduces
             | uncertainty (entropy) about its internal states. Then there
             | is Hammeroff and Penrose, which I commented about on here
             | years ago and got blasted for it. Meh. I'm a learner, and I
             | learn by entertaining truths. But I always remain critical
             | of theories until I'm sold.
             | 
             | I'm not selling any of this as a truth, because the fact
             | remains we have no idea what "consciousness" is. We have a
             | better handle on "intelligence", but as others point out,
             | most humans aren't that intelligent. They still manage to
             | drive to the store and feed their dogs, however.
             | 
             | A lot of the current leading ARC solutions use random
             | sampling, which sorta makes sense once you start thinking
             | about having to handle all the different types of problems.
             | At least it seems to be helping out in paring down the
             | decision tree.
        
             | glenstein wrote:
             | >My problem with AGI is the lack of a simple, concrete
             | definition.
             | 
             | You can't always start from definitions. There are many
             | research areas where the object of research is to know
             | something well enough that you could converge on such a
             | thing as a definition, e.g. dark matter, consciousness,
             | intelligence, colony collapse syndrome, SIDS. We
             | nevertheless can progress in our understanding of them in a
             | whole motley of strategic ways, by case studies that best
             | exhibit salient properties, trace the outer boundaries of
             | the problem space, track the central cluster of "family
             | resemblances" that seem to characterize the problem,
             | entertain candidate explanations that are closer or further
             | away, etc. Essentially a practical attitude.
             | 
             | I don't doubt in principle that we could arrive at such a
             | thing as a definition that satisfies most people, but I
             | suspect you're more likely to have that at the end than the
             | beginning.
        
             | apwell23 wrote:
             | one of those cases where defining it and solving it is the
             | same. If you know how to define it then you've solved it.
        
           | kordlessagain wrote:
           | ARC is definitely about achieving AGI and it doesn't matter
           | whether we "have" it or not right now. That is the goal:
           | 
           | > where he introduced the "Abstract and Reasoning Corpus for
           | Artificial General Intelligence" (ARC-AGI) benchmark to
           | measure intelligence
           | 
           | So, a high enough score is a threshold to claim AGI. And, if
           | you use an LLM to work these types of problems, it becomes
           | pretty clear that passing more tests indicates a level of
           | "awareness" that goes beyond rational algorithms.
           | 
           | I thought I had seen everything until I started working on
           | some of the problems with agents. I'm still sorta in awe
           | about how the reasoning manifests. (And don't get me wrong,
           | LLMs like Claude still go completely off the rails where even
           | a less intelligent human would know better.)
        
             | MPSimmons wrote:
             | >a high enough score is a threshold to claim AGI
             | 
             | I'm pretty sure he said that AGI would achieve a high
             | score, not that a high score was indicative of AGI
        
           | cubefox wrote:
           | > Getting a high score on ARC doesn't mean we have AGI and
           | Chollet has always said as much AFAIK
           | 
           | He only seems to say this recently, since OpenAI cracked the
           | ARC-AGI benchmark. But in the original 2019 abstract he said
           | this:
           | 
           | > We argue that ARC can be used to measure a human-like form
           | of general fluid intelligence and that it enables fair
           | general intelligence comparisons between AI systems and
           | humans.
           | 
           | https://arxiv.org/abs/1911.01547
           | 
           | Now he seems to backtrack, with the release of harder ARC-
           | like benchmarks, implying that the first one didn't actually
           | test for really general human-like intelligence.
           | 
           | This sounds a bit like saying that a machine beating chess
           | would require general intelligence -- but then adding, after
           | Deep Blue beats chess, that chess doesn't actually count as a
           | test for AGI, and that Go is the real AGI benchmark. And
           | after a narrow system beats Go, moving the goalpost to
           | beating Atari, and then to beating StarCraft II, then to
           | MineCraft, etc.
           | 
           | At some point, intuitively real "AGI" will be necessary to
           | beat one of these increasingly difficult benchmarks, but only
           | because otherwise yet another benchmark would have been
           | invented. Which makes these benchmarks mostly post hoc
           | rationalizations.
           | 
           | A better approach would be to question what went wrong with
           | coming up with the very first benchmark, and why a similar
           | thing wouldn't occur with the second.
        
         | ben_w wrote:
         | You're not alone in this; I expect us to have not yet
         | enumerated all the things that we ourselves mean by
         | "intelligence".
         | 
         | But conversely, _not_ passing this test is a proof of _not_
         | being as general as a human 's intelligence.
        
           | NetRunnerSu wrote:
           | Unfortunately, we did it. All that is left is to assemble the
           | parts.
           | 
           | https://news.ycombinator.com/item?id=44488126
        
           | kypro wrote:
           | I find the "what is intelligence?" discussion a little
           | pointless if I'm honest. It's similar to asking a question
           | like does it mean to be a "good person" and would we know
           | whether an AI or person is really "good"?
           | 
           | While understanding why a person or AI is doing what it's
           | doing can be important (perhaps specifically in safety
           | contexts) at the end of the day all that's really going to
           | matter to most people is the outcomes.
           | 
           | So if an AI can use what appears to be intelligence to solve
           | general problems and can act in ways that are broadly good
           | for society, whether or not it meets some philosophical
           | definition of "intelligent" or "good" doesn't matter much -
           | at least in most contexts.
           | 
           | That said, my own opinion on this is that the truth is likely
           | in between. LLMs today seem extremely good at being glorified
           | auto-completes, and I suspect most (95%+) of what they do is
           | just recalling patterns in their weights. But unlike
           | traditional auto-completes they do seem to have some ability
           | to reason and solve truly novel problems. As it stands I'd
           | argue that ability is fairly poor, but this might only
           | represent 1-2% of what we use intelligence for.
           | 
           | If I were to guess why this is I suspect it's not that LLM
           | architecture today is completely wrong, but that the way LLMs
           | are trained means that in general knowledge recall is
           | rewarded more than reasoning. This is similar to the trade-
           | off we humans have with education - do you prioritise the
           | acquisition of knowledge or critical thinking? Maybe believe
           | critical thinking is more important and should be prioritised
           | more, but I suspect for the vast majority of tasks we're
           | interested in solving knowledge storage and recall is
           | actually more important.
        
             | ben_w wrote:
             | That's certainly a valid way of looking at their abilities
             | at any given task -- "The question of whether a computer
             | can think is no more interesting than the question of
             | whether a submarine can swim".
             | 
             | But when the question is "are they going to more important
             | to the economy than humans?", then they have to be good at
             | basically everything a human can do, otherwise we just see
             | a variant of Amdahl's law in action and the AI perform an
             | arbitrary speed-up of n % of the economy while humans are
             | needed for the remaining 100-n %.
             | 
             | I may be wrong, but it seems to me that the ARC prize is
             | more about the latter.
        
               | IanCal wrote:
               | > are they going to more important to the economy than
               | humans?", then they have to be good at basically
               | everything a human can do,
               | 
               | I really don't think that's the case. A robot that can
               | stack shelves faster than a human is more valuable at
               | that job than someone who can move items and also
               | appreciate comedy. One that can write software more
               | reliably than person X is more valuable than them at that
               | job even if X is well rounded and can do cryptic
               | crosswords and play the guitar.
               | 
               | Also many tasks they can be worse but cheaper.
               | 
               | I do wonder how many tasks something like o3 or o3 pro
               | can't do as well as a median employee.
        
               | ben_w wrote:
               | > I really don't think that's the case. A robot that can
               | stack shelves faster than a human is more valuable at
               | that job than someone who can move items and also
               | appreciate comedy.
               | 
               | Yes, until all the shelves are stacked and that is no
               | longer your limiting factor.
               | 
               | > One that can write software more reliably than person X
               | is more valuable than them at that job even if X is well
               | rounded and can do cryptic crosswords and play the
               | guitar.
               | 
               | Cryptic crosswords and guitar playing are already
               | something computers can do, so they're not great
               | examples.
               | 
               | Consider a different example: "computer" used to be a job
               | title of a person who computes. A single Raspberry Pi
               | model zero, given away for free on a magazine cover at
               | launch, can do this faster than the entire human
               | population combined even if we all worked at the speed of
               | the world record holder 24/7. But that wasn't enough to
               | replace all human labour.
        
               | tedy1996 wrote:
               | AFAIK 55% of PRs written by latest GPT model get
               | approved.
        
         | OtomotO wrote:
         | You're not alone in this, no.
         | 
         | My definition of AGI is the one I was brought up with, not an
         | ever moving goal post (to the "easier" side).
         | 
         | And no, I also don't buy that we are just stochastic parrots.
         | 
         | But whatever. I've seen many hypes and if I don't die and the
         | world doesn't go to shit, I'll see a few more in the next
         | couple of decades
        
         | NetRunnerSu wrote:
         | To pass Arc, you need a living model with sentient abilities,
         | not the dead frog now.
         | 
         | https://news.ycombinator.com/item?id=44488126
        
         | nxobject wrote:
         | I understand Chollet is transparent that the "branding" of the
         | ARC-AGI-n suites is meant to be suggestive of its purpose, than
         | substantial.
         | 
         | However, it does rub me the wrong way - as someone who's
         | cynical of how branding can enable breathless AI hype by bad
         | journalism. A hypothetical comparison would be labelling
         | SHRDLU's (1968) performance on Block World planning tasks as
         | "ARC-AGI-(-1)".[0]
         | 
         | A less loaded name like (bad strawman option) "ARC-
         | VeryToughSymbolicReasoning" should capture how the ARC-AGI-n
         | suite is genuinely and intrinsically very hard for current AIs,
         | and what progress satisfactory performance on the benchmark
         | suite would represent. Which Chollet has done, and has grounded
         | him throughout! [1]
         | 
         | [0] https://en.wikipedia.org/wiki/SHRDLU [1]
         | https://arxiv.org/abs/1911.01547
        
           | heymijo wrote:
           | I get what you're saying about perception being reality and
           | that ARC-AGI suggests beating it means AGI has been achieved.
           | 
           | In practice when I have seen ARC brought up, it has more
           | nuance than any of the other benchmarks.
           | 
           | Unlike, Humanity's Last Exam, which is the most egregious
           | example I have seen in naming and when it is referenced in
           | terms of an LLMs capability.
        
         | maaaaattttt wrote:
         | I've said this somewhere else, but we have the perfect test for
         | AGI in the form of any open world game. Give the instructions
         | to the AGI that it should finish the game and how to control
         | it. Give the frames as input and wait. When I think of the
         | latest Zelda games and especially how the Shrine chanllenges
         | are desgined they especially feel like the perfect environement
         | for an AGI test.
        
           | Lerc wrote:
           | And if someone makes a machine that does all that and another
           | person says
           | 
           | "That's not really AGI because xyz"
           | 
           | What then? The difficulty in coming up with a test for AGI is
           | coming up with something that people will accept a passing
           | grade as AGI.
           | 
           | In many respects I feel like all of the claims that models
           | don't really understand or have internal representation or
           | whatever tend to lean on nebulous or circular definitions of
           | the properties in question. Trying to pin the arguments down
           | usually end up with dualism and/or religion.
           | 
           | Doing what Chollet has done is infinitely better, if a person
           | can easily do something and a model cannot then there is
           | clearly something significant missing
           | 
           | It doesn't matter what the property is or what it is called.
           | Such tests might even help us see what those properties are.
           | 
           | Anyone who wants to claim the fundamental inability of these
           | models should be able to provide a task that it is clearly
           | possible to tell when it has been succeeded, and to show that
           | humans can do it (if that's the bar we are claiming can't be
           | met). If they are right, then no future model should be able
           | to solve that class of problems.
        
             | maaaaattttt wrote:
             | Given your premise (which I agree with) I think the issue
             | in general comes from the lack of a good, broadly accepted
             | definition of what AGI is. My initial comment originates
             | from the fact that in my internal definition, an AGI would
             | have a de facto understanding of the physics of "our
             | world". Or better, could infer them by trial and error.
             | But, indeed, it doesn't have to be the case. (The other
             | advantage of the Zelda games is that they introduce new
             | abilities that don't exist in our world, and for which most
             | children -I've seen- understand the mechanisms and how they
             | could be applied to solve a problem quite naturaly even
             | they've never had that ability before).
        
               | wat10000 wrote:
               | I'd say the issue is the lack of a good, broadly accepted
               | definition of what I is. We all know "smart" when we see
               | it, but actually defining it in a rigorous way is tough.
        
               | ta8645 wrote:
               | This difficulty is interesting in and of itself.
               | 
               | When people catalogue the deficiencies in AI systems,
               | they often (at least implicitly) forgive all of our own
               | such limitations. When someone points to something that
               | an AI system clearly doesn't understand, they say that
               | proves it isn't AGI. But if you point at any random
               | human, who fails at the very same task, you wouldn't say
               | they lack "HGI", even if they're too personally limited
               | to ever be taught the skill.
               | 
               | All of which, is to say, I don't think pointing at a
               | limitation of an AI system, really proves it lacks AGI.
               | It's a more slippery definition, than that.
        
             | jcranmer wrote:
             | > The difficulty in coming up with a test for AGI is coming
             | up with something that people will accept a passing grade
             | as AGI.
             | 
             | The difficulty with intelligence is we don't even know what
             | it is in the first place (in a psychology sense, we don't
             | even have a reliable model of anything that corresponds to
             | what humans point at and call intelligence; IQ and g are
             | really poor substitutes).
             | 
             | Add into that Goodhart's Law (essentially, propose a test
             | as a metric for something, and people will optimize for the
             | test rather than what the test is trying to measure), and
             | it's really no surprise that there's no test for AGI.
        
             | bonoboTP wrote:
             | > It doesn't matter what the property is or what it is
             | called. Such tests might even help us see what those
             | properties are.
             | 
             | This is a very good point and somewhat novel to me in its
             | explicitness.
             | 
             | There's no reason to think that we already have the
             | concepts and terminology to point out the gaps between the
             | current state and human-level intelligence and beyond. It's
             | incredibly naive to think we have armchair-generated
             | already those concepts by pure self-reflection and
             | philosophizing. This is obvious in fields like physics.
             | Experiments were necessary to even come up with the basic
             | concepts of electromagnetism or relativity or quantum
             | mechanics.
             | 
             | I think the reason is that pure philosophizing is still
             | more prestigious than getting down in the weeds and dirt
             | and doing limited-scope well-defined experiments on
             | concrete things. So people feel smart by wielding poorly
             | defined concepts like "understanding" or "reasoning" or
             | "thinking", contrasting it with "mere pattern matching", a
             | bit like the stalemate that philosophy as a field often
             | hits, as opposed to the more pragmatic approach in the
             | sciences, where empirical contact with reality allows more
             | consensus and clarity without getting caught up in mere
             | semantics.
        
         | davidclark wrote:
         | In the video, Francois Chollet, creator of the ARC benchmarks,
         | says that beating ARC does not equate to AGI. He specifically
         | says they will be able to be beaten without AGI.
        
           | cubefox wrote:
           | He only says this because otherwise he would have to say that
           | 
           | - OpenAI's o3 counts as "AGI" when it did unexpectedly beat
           | the ARC-AGI benchmark or
           | 
           | - Explicitly admit that he was wrong when assuming that ARC-
           | AGI would test for AGI
        
             | sweezyjeezy wrote:
             | FWIW the original ARC was published in 2019, just after
             | GPT-2 but a while before GPT-3. I work in the field, I
             | think that discussing AGI seriously is actually kind of a
             | recent thing (I'm not sure I ever heard the term 'AGI'
             | until a few years ago). I'm not saying I know he didn't
             | feel that, but he doesn't talk in such terms in the
             | original paper.
        
         | cainxinth wrote:
         | > It's mostly about pattern matching...
         | 
         | For all we know, human intelligence is just an emergent
         | property of really good pattern matching.
        
         | cttet wrote:
         | The point is not that having a high score -> AGI, their ideas
         | are more of having a low score -> we don't have AGI yet.
        
         | CamperBob2 wrote:
         | If you can write code to solve ARC by "overfitting," then give
         | it a shot! There's prize money to be won, as long as your model
         | does a good job on the hidden test set. Zuckerberg is said to
         | be throwing around 8-figure signing bonuses for talent like
         | that.
         | 
         | But then, I guess it wouldn't be "overfitting" after all, would
         | it?
        
         | gonzobonzo wrote:
         | I agree with you but I'll go a step further - these benchmarks
         | are a good example of how far we are from AGI.
         | 
         | A good base test would be to give a manager a mixed team of
         | remote workers, half being human and half being AI, and seeing
         | if the manager or any of the coworkers would be able to tell
         | the difference. We wouldn't be able to say that AI that passed
         | that test would necessarily be AGI, since we would have to test
         | it in other situations. But we could say that AI that couldn't
         | pass that test wouldn't qualify, since it wouldn't be able to
         | successfully accomplish some tasks that humans are able to.
         | 
         | But of course, current AI is nowhere near that level yet. We're
         | left with benchmarks, because we all know how far away we are
         | from actual AGI.
        
           | criddell wrote:
           | The AGI test I think makes sense is to put it in a robot body
           | and let it navigate the world. Can I take the robot to my
           | back yard and have it weed my vegetable garden? Can I show it
           | how to fold my laundry? Can I take it to the grocery store
           | and tell it "go pick up 4 yellow bananas and two avocados
           | that will be ready to eat in the next day or two, and then
           | meet me in dairy"? Can I ask it to dice an onion for me
           | during meal prep?
           | 
           | These are all things my kids would do when they were pretty
           | young.
        
             | gonzobonzo wrote:
             | I agree, I think of that as the next level beyond the
             | digital assistant test - a physical assistant test. Once
             | there are sufficiently capable robots, hook one up to the
             | AI. Tell it to mow your lawn, drive your car to the
             | mechanic and have the mechanic to get checked, box up an
             | item, take it to the post office, and have it shiped, pick
             | up your dry cleaning, buy ingredients from a grocery store,
             | cook dinner, etc. Basic tasks an low-skilled worker would
             | do as someone's assistant.
        
             | bumby wrote:
             | I think the next harder level in AGI testing would be
             | "convince my kids to weed the garden and fold the laundry"
             | :-)
        
           | godshatter wrote:
           | The problem with "spot the difference" tests, imho, is that I
           | would expect an AGI to be easily spotted. There's going to be
           | a speed of calculation difference, at the very least. If
           | nothing else, typing speed would be completely different
           | unless the AGI is supposed to be deceptive. Who knows what
           | it's personality would be like. I'd say it's a simple enough
           | test just to see if an AGI could be hired as, for example, an
           | entry level software developer and keep it's job based on the
           | same criteria base-level humans have to meet.
           | 
           | I agree that current AI is nowhere near that level yet. If AI
           | isn't even trying to extract meaning from the words it smiths
           | or the pictures it diffuses then it's nothing more than a
           | cute (albeit useful) parlor trick.
        
         | SubiculumCode wrote:
         | [1]https://app.rescript.info/public/share/W_T7E1OC2Wj49ccqlIOOz
         | ...
         | 
         | Perhaps it's because the representations are fractured. The
         | link above is to the transcript of an episode of Machine
         | Learning Street Talk with Kenneth O. Stanleyabout The Fractured
         | Entangled Representation Hypothesis[1]
        
         | crazylogger wrote:
         | I think next year's AI benchmarks are going to be like this
         | project: https://www.anthropic.com/research/project-vend-1
         | 
         | Give the AI tools and let it do _real stuff_ in the world:
         | 
         | "FounderBench": Ask the AI to build a successful business,
         | whatever that business may be - the AI decides. Maybe try to
         | get funded by YC - hiring a human presenter for Demo Day is
         | allowed. They will be graded on profit / loss, and valuation.
         | 
         | Testing plain LLM on whiteboard-style question is meaningless
         | now. Going forward, it will all be multi-agent systems with
         | computer use, long-term memory & goals, and delegation.
        
         | mindcrime wrote:
         | > I feel like I'm the only one who isn't convinced getting a
         | high score on the ARC eval test means we have AGI.
         | 
         | Wait, what? Approximately nobody is claiming that "getting a
         | high score on the ARC eval test means we have AGI". It's a
         | useful eval for measuring progress along the way, but I don't
         | think anybody considers it the final word.
        
         | andoando wrote:
         | Who says intelligence is anything more than "pattern matching"?
         | Everything is patterns
        
         | sva_ wrote:
         | It is a necessary condition, but not a sufficient one.
        
         | tippytippytango wrote:
         | He's playing the game. You have to say AGI is your goal to get
         | attention. It's just like the YouTube thumbnail game. You can
         | hate it, but you still have to play if you want people to pay
         | attention.
        
       | hackinthebochs wrote:
       | Has Chollet ever talked about his change of heart regarding AGI?
       | It wasn't that long ago when he was one of the loudest voices
       | decrying even the concept of AGI, let alone us being on the path
       | to creating it. Now he's an advocate and has his own prize
       | dataset? Seems rather convenient to change your tune once
       | hundreds of billions are being thrown at AGI (not that I would
       | blame him).
        
         | zamderax wrote:
         | People are allowed to evolve opinions. It seems to me he
         | believes that a combination of transformer and program
         | synthesis are key. The big unknown at the moment is how to do
         | program search.
        
           | hackinthebochs wrote:
           | Absolutely. Presumably there is some specific considerations
           | or evidence that helped him evolve his opinion. I would be
           | interested in seeing a writeup about it. With him having been
           | a very public advocate against AGI, a writeup of his
           | evolution seems appropriate and would be very edifying for a
           | lot of people.
        
             | blibble wrote:
             | > Presumably there is some specific considerations or
             | evidence that helped him evolve his opinion.
             | 
             | suitcases full of money?
        
             | Bjorkbat wrote:
             | I recall it as less an evolution and more a complete tonal
             | shift the moment o3 was evaluated on ARC-AGI. I remember on
             | Twitter Sam made some dumb post suggesting they had beaten
             | the benchmark internally and Francois calling him out on
             | his vagueposting. Soon as they publicly released the
             | scores, it was like he was all-in on reasoning.
             | 
             | Which I have to admit I was kind of disappointed by.
        
         | cubefox wrote:
         | ARC-AGI was introduced in 2019:
         | 
         | https://arxiv.org/abs/1911.01547
         | 
         | GPT-3 didn't come out until 2020.
        
           | hackinthebochs wrote:
           | In my view that just makes his evolution more interesting as
           | it wasn't just a matter of being wow'ed by what ChatGPT could
           | do.
        
         | 0xCE0 wrote:
         | He has recently co-founded NDEA company, so he has to align
         | himself for that. Same kind of vibe change feels for Joscha
         | Bach after having some position in Liquid AI company.
         | Communication is not so relaxed anymore.
         | 
         | That said, I'd still listen these two guys (+ Schmidhuber) more
         | than any other AI-guy.
        
       | roenxi wrote:
       | By both definitions of intelligence in the presentation we should
       | be saying "how we got to AGI" in the past tense. We're already
       | there. AI's can deal with situations they weren't prepared for in
       | any sense that a human can. They might not do well, but they'll
       | have a crack at it. We can trivially build systems that collect
       | data and do a bit more offline training if that is what someone
       | wants to see, but there doesn't really seem to be a commercial
       | need for that right now. Similarly, AIs can whip most humans at
       | most domains that require intelligence.
       | 
       | I think the debate hqas been flat-footed by the speed all this
       | happened. We're not talking AGI any more, we're talking about how
       | to build superintelligences hitherto unseen in nature.
        
         | cubefox wrote:
         | Well, there is also robotics, active inference, online
         | learning, etc. Things animals can do well.
        
           | AIPedant wrote:
           | Current robots perform very badly on my patented and highly
           | scientific ROACH-AGI benchmark - "is this thing smarter at
           | navigating unfamiliar 3D spaces than a cockroach?"
        
         | tmvphil wrote:
         | According to this presentation at least, ARC-AGI-2 shows that
         | there is a big meaningful gap in fluid intelligence between
         | normal non-genius humans and the best models currently, which
         | seems to indicate we are not "already there".
        
           | saberience wrote:
           | There's already a big meaningful gap between the things AIs
           | can do which humans can't, so why do you only count as
           | "meaningful" the things humans can do which AIs can't?
           | 
           | I enjoy seeing people repeatedly move the goalposts for
           | "intelligence" as AIs simply get smarter and smarter every
           | week. Soon AI will have to beat Einstein in Physics, Usain
           | Bolt in running, and Steve Jobs in marketing to be considered
           | AGI...
        
             | tmvphil wrote:
             | > There's already a big meaningful gap between the things
             | AIs can do which humans can't, so why do you only count as
             | "meaningful" the things humans can do which AIs can't?
             | 
             | Where did I say there was nothing meaningful about current
             | capabilities? I'm saying that's what is novel about a claim
             | of "AGI" (as opposed to a claim of "computer does something
             | better than humans", which has been an obviously true
             | statement since the ENIAC) is the ability to do at some
             | level _everything_ a normal human intelligence can do.
        
       | TheAceOfHearts wrote:
       | The first highlight from this video is getting to see a preview
       | of the next ARC dataset. Otherwise it feels like most of what
       | Chollet says here has already been repeated in his other podcast
       | appearances and videos. It's a good video if you're not
       | familiarized with his work, but if you've seen some of his recent
       | interviews then you can probably skip the first 20 minutes.
       | 
       | The second highlight from this video is the section from 29
       | minutes onward, where he talks about designing systems that can
       | build up rich libraries of abstractions which can be applied to
       | new problems. I wish he had lingered more on exploring and
       | explaining this approach, but maybe they're trying to keep a bit
       | of secret sauce because it's what his company is actively working
       | on.
       | 
       | One of the major points which seems to be emerging from recent AI
       | discourse is that the ability to integrate continuous learning
       | seems like it'll be a key element in building AGI. Context is
       | fine for short tasks, but if lessons are never preserved you're
       | severely capped with how far the system can go.
        
       | vixen99 wrote:
       | Is the text available for those who don't hear so well?
        
         | jasonlotito wrote:
         | At the very least, YouTube provides a transcript and a "Show
         | Transcript" button in the video description, which you can
         | click on to follow along.
        
           | heymijo wrote:
           | When I watched the video I had the subtitles on. The
           | automatic transcript is pretty good. "Test-time" which is
           | used frequently gets translated as "Tesla" so watch out for
           | that.
        
       | saberience wrote:
       | The Arc prize/benchmark is a terrible judge of whether we got to
       | AGI.
       | 
       | If we assume that humans have "general intelligence", we would
       | assume all humans could ace Arc... but they can't. Try asking
       | your average person, i.e. supermarket workers, gas station
       | attendants etc to do the Arc puzzles, they will do poorly,
       | especially on the newer ones, but AI has to do perfectly to prove
       | they have general intelligence? (not trying to throw shade here
       | but the reality is this test is more like an IQ test than an AGI
       | test).
       | 
       | Arc is a great example of AI researchers moving the goal posts
       | for what we consider intelligent.
       | 
       | Let's get real, Claude Opus is smarter than 99% of people right
       | now, and I would trust its decision making over 99% of people I
       | know in most situations, except perhaps emotion driven ones.
       | 
       | Arc agi benchmark is just a gimmick. Also, since it's a visual
       | test and the current models are text based it's actually a rigged
       | (against the AI models) test anyway, since their datasets were
       | completely text based.
       | 
       | Basically, it's a test of some kind, but it doesn't mean quite as
       | much as Chollet thinks it means.
        
         | leumon wrote:
         | He said in the video that they tested regular people (uber
         | driver, etc.) on arc-agi2 and at least 2 people were able to
         | solve each task (an average of 9-10 people saw each task). Also
         | this quote from the paper: _None of the self-reported
         | demographic factors recorded for all participants--including
         | occupation, industry, technical experience, programming
         | proficiency, mathematical background, puzzle-solving aptitude,
         | and var- ious other measured attributes--demonstrated clear,
         | statistically significant relationships with performance
         | outcomes. This finding suggests that ARC-AGI-2 tasks assess
         | general problem-solving capabilities rather than domain-
         | specific knowledge or specialized skills acquired through
         | particular professional or educational experiences._
        
         | daveguy wrote:
         | It is not a judge of whether we got to AGI. _And literally no
         | one except straw-manning critics are trying to claim it is_.
         | The point is, an AGI should easily be able to pass it. But it
         | can obviously be passed without getting to AGI (as . It 's a
         | necessary but not sufficient criteria. If something can't pass
         | a test as simple as AGI (which _no AI currently can_ ) then
         | it's definitely not AGI. Anyone claiming AGI should be able to
         | point their AI at the problem and have an 80+% solution rate.
         | Current attempts on the second ARC are less than 10% with zero
         | shot attempts even worse. Even the better performing LLMs on
         | the first ARC couldn't do well without significant pre-
         | training. In short, the G in AGI stands for _general_.
        
           | saberience wrote:
           | So do you agree that a human that CANNOT solve ARC doesn't
           | have general intelligence?
           | 
           | If we think humans have "GI" then I think we have AIs right
           | now with "GI" too. Just like humans do, AIs spike in various
           | directions. They are amazing at some things and weak at
           | visual/IQ test type problems like ARC.
        
             | adamgordonbell wrote:
             | It's a good question. But only complicated answers are
             | possible. A puppy and crow and a raccoon all have
             | intelligence but certainly can't all pass the ARC
             | challenge.
             | 
             | I think the charitable interpretation is that, if
             | intelligence is made up of many skills, and AIs are super
             | human at some, like image recognition.
             | 
             | And that therefore, future efforts need to be on the areas
             | where AIs are significantly less skilled. And also, since
             | they are good at memorizing things, knowledge questions are
             | the wrong direction and anything most humans could solve
             | but that AIs can not, especially if as generic as pattern
             | matching, should be an important target.
        
         | cttet wrote:
         | Maybe it is a cultural difference aspect, but I feel that
         | "supermarket workers, gas station attendants" (in an Asian
         | country) that I know of should be quite capable of most ARC
         | tasks.
        
         | profchemai wrote:
         | Out of 100 of evals, ARC is a very distinct and unique eval,
         | most frontier models are also visual now, don't see the harm in
         | having this instead of another text eval.
        
         | Workaccount2 wrote:
         | This is what is called "spikey" intelligence, where a model
         | might be able to crack phd physics problems and solve byzantine
         | pattern matching games at the 90th percentile, but also can't
         | figure out how to look up a company and copy their address on
         | the "customer" line of an invoice.
        
       | chromaton wrote:
       | Current AI systems don't have a great ability to take
       | instructions or information about the state of the world and
       | produce new output based upon that. Benchmarks that emphasize
       | this ability help greatly in progress toward AGI.
        
       | jacquesm wrote:
       | Let's not. Seriously. I absolutely love Francois and have used
       | his work extensively. But looking around me at the social impact
       | of AI I am really not convinced that this is what the world needs
       | right now and that if we can stave off the turning point for
       | another decade or two that humanity will likely benefit from
       | that. The last thing we need is to inject yet another instability
       | into a planet that is already fighting existential crisis on a
       | number of fronts.
        
         | thatguy0900 wrote:
         | It doesn't matter what should or should not happen. Technology
         | will continue to race forward at breakneck speed while everyone
         | involved pats each other on the back for making a bunch of
         | money before the consequences hit
        
           | nessbot wrote:
           | technology doesn't just advance itself
        
             | lo_zamoyski wrote:
             | This is true. We have a choice...in principle.
             | 
             | But in practice, it's like stopping an arms race.
        
             | bnchrch wrote:
             | No, but one thing is certain, in large human systems you
             | can only redirect greed, you can't stop it.
        
             | alex_duf wrote:
             | If the incentive is there, the technology will advance. I
             | hear "we need to slow down the progress of technology", but
             | that's misunderstanding _why_ it progresses. I'm assuming
             | the slow down camp really need to look into what's the
             | incentive to slow down.
             | 
             | Personally I don't think it's possible at this stage. The
             | cat's out of the bag (this new class of tools are working)
             | the economic incentive is way too strong.
        
       | modeless wrote:
       | ARC-AGI 3 remindes me of PuzzleScript games:
       | https://www.puzzlescript.net/Gallery/index.html
       | 
       | There are dozens of ready-made, well-designed, and very creative
       | games there. All are tile-based and solved with only arrow keys
       | and a single action button. Maybe someone should make a
       | PuzzleScript AGI benchmark?
        
         | mNovak wrote:
         | This game is great!
         | 
         | https://nebu-soku.itch.io/golfshall-we-golf
         | 
         | Maybe someone can make an MCP connection for the AIs to
         | practice. But I think the idea of the benchmark is to reserve
         | some puzzles for private evaluation, so that they're not in the
         | training data.
        
       | visarga wrote:
       | I think intelligence is search. Search is exploration and
       | learning. So intelligence is not in the model, or in the
       | environment, but in their mutual dance. A river is not the banks,
       | nor the water, but their relation.
        
       | visarga wrote:
       | I think intelligence is search. Search is exploration + learning.
       | So intelligence is not in the model or in the environment, but in
       | their mutual dance. A river is not the banks, nor the water, but
       | their relation. ARC is just a frozen snapshot of the banks, not
       | the dynamic environment we have.
        
         | ipunchghosts wrote:
         | I agree strongly with this take but find it hard to convince
         | others of it. Instead, people keep thinking there is a magic
         | bullet to discover resulting in a lot of wasted resources and
         | money.
        
       | bogtog wrote:
       | I wonder how much slow progress on ARC can be explained by their
       | visual properties making them easy for humans but hard for LLMs.
       | 
       | My impression is that models are pretty bad at interpreting grids
       | of characters. Yesterday, I was trying to get Claude to convert a
       | message into a cipher where it converted a 98-character string
       | into 7x14 grid where the sequential letters moved 2-right and
       | 1-down (i.e., like a knight it chess). Claude seriously
       | struggled.
       | 
       | Yet, Francois always pumps up the "fluid intelligence" component
       | of this test and emphasizes how easy these are for humans. Yet,
       | humans would presumably be terrible at the tasks if they looked
       | at it character-by-character
       | 
       | This feels like a somewhat similar (intuition-lie?) case as the
       | Apple paper showing how reasoning model's can't do tower of hanoi
       | past 10+ disks. Readers will intuitively think about how they
       | themselves could tediously do an infinitely long tower of hanoi,
       | which is what the paper is trying to allude to. However, the more
       | appropriate analogy would be writing out all >1000 moves on a
       | piece of paper at once and being 100% correct, which is obviously
       | much harder
        
         | krackers wrote:
         | I thought so too back when the test was first released, but now
         | that we have multimodal models which can take images directly
         | as input, shouldn't this point be moot?
        
       | ltbarcly3 wrote:
       | There is some kind of massive brigading happening on this thread.
       | Lots of thoughtful comments are downmodded or flagged (including
       | mine, which I thought was pretty thoughtful. I even said poop
       | instead of shit.).
       | 
       | https://news.ycombinator.com/item?id=44492241
       | 
       | My comment was basically instantly flagged. I see at least 3
       | other flagged comments that I can't imagine deserve to be
       | flagged.
        
         | layer8 wrote:
         | You didn't address anything from the actual talk.
        
           | ltbarcly3 wrote:
           | I addressed the entire concept of the talk, and made other
           | relevant points. The correct response to "let me tell you
           | something I can't possibly know" isn't to argue the points
           | within that frame.
           | 
           | If you see a talk like: "How we will develop diplomacy with
           | the rat-people of TRAPPIST-5." you don't have to make some
           | argument about super-earths and gravity and the rocket
           | equation. You can just point out it's absurd to pretend to
           | know something like whether there are rat-people there.
           | 
           | Either way, it isn't flag-able!
        
             | layer8 wrote:
             | Did you actually watch the talk?
             | 
             | The flagging is probably due to your aggressively indignant
             | style.
        
       | lawlessone wrote:
       | How do we define AGI?
       | 
       | I would have thought/considered AGI to be something that is
       | constantly aware, a biological brain is always on. An LLM is on
       | briefly while it's inferring.
       | 
       | A biological brain constantly updates itself adds memories of
       | things. Those memories generally stick around.
        
       | khalic wrote:
       | This quest for an ill defined AGI is going to create a million of
       | Cpt Ahab
        
       | gtech1 wrote:
       | This may be a silly question, I'm no expert. But why not simply
       | define as AGI any system that can answer a question that no human
       | can. So for example, ask AGI to find out, from current knowledge,
       | how to reconcile gravity and qed.
        
         | m11a wrote:
         | That would be ASI I think.
         | 
         | But consider: technically AlphaTensor found new algorithms no
         | human did before (https://en.wikipedia.org/wiki/Matrix_multipli
         | cation_algorith...). So isn't it AGI by your definition of
         | answering a question no human could before: how to do 4x4
         | matrix multiplication in 47 steps?
        
         | imiric wrote:
         | "What is the meaning of life, the universe, and everything?"
        
           | ta8645 wrote:
           | 42
        
         | soVeryTired wrote:
         | Computers can already do a lot of things that no human can
         | though. They can reliably find the best chess or go move better
         | than a human.
         | 
         | It's conceivable (though not likely) that given training enough
         | training in symbolic mathematics and some experimental data, an
         | LLM-style AI could figure out a neat reconciliation of the two
         | theories. I wouldn't say that makes it AGI though. You could
         | achieve that unification with an AI that was limted to
         | mathematics rather than being something that can function in
         | many domains like a human can.
        
         | layer8 wrote:
         | Aside from other objections already mentioned, your example
         | would require feasible experiments for verification, and likely
         | the process of finding a successful theory of quantum gravity
         | requires a back and forth between experimenters and theorists.
        
       ___________________________________________________________________
       (page generated 2025-07-07 23:00 UTC)