[HN Gopher] ARC-AGI without pretraining
       ___________________________________________________________________
        
       ARC-AGI without pretraining
        
       Author : georgehill
       Score  : 129 points
       Date   : 2025-03-04 19:52 UTC (3 hours ago)
        
 (HTM) web link (iliao2345.github.io)
 (TXT) w3m dump (iliao2345.github.io)
        
       | pona-a wrote:
       | I feel like extensive pretraining goes against the spirit of
       | generality.
       | 
       | If you can create a general machine that can take 3 examples and
       | synthesize a program that predicts the 4th, you've just solved
       | oracle synthesis. If you train a network on all human knowledge,
       | including puzzle making, and then fine-tune it on 99% of the
       | dataset and give it a dozen attempts for the last 1%, you've just
       | made an expensive compressor for test-maker's psychology.
        
         | ta8645 wrote:
         | The issue is that general intelligence is useless without vast
         | knowledge. The pretraining is the knowledge, not the
         | intelligence.
        
           | raducu wrote:
           | > The pretraining is the knowledge, not the intelligence.
           | 
           | I thought the knowledge is the training set and the
           | intelligence is the emergent/side effect of reproducing that
           | knowledge by making sure the reproduction is not rote
           | memorisation?
        
             | ta8645 wrote:
             | I'd say that it takes intelligence to encode knowledge, and
             | the more knowledge you have, the more intelligently you can
             | encode further knowledge, in a virtuous cycle. But once you
             | have a data set of knowledge, there's nothing to emerge,
             | there are no side effects. It just sits there doing
             | nothing. The intelligence is in the algorithms that access
             | that encoded knowledge to produce something else.
        
               | esafak wrote:
               | The data set is flawed, noisy, and its pieces are
               | disconnected. It takes intelligence to correct its flaws
               | and connect them parsimoniously.
        
               | ta8645 wrote:
               | It takes knowledge to even know they're flawed, noisy,
               | and disconnected. There's no reason to "correct"
               | anything, unless you have knowledge that applying
               | previously "understood" data has in fact produced
               | deficient results in some application.
               | 
               | That's reinforcement learning -- an algorithm, which
               | requires accurate knowledge acquisition, to be effective.
        
           | pona-a wrote:
           | I don't think so. A lot of useful specialized problems are
           | just patterns. Imagine your IDE could take 5 examples of
           | matching strings and produce a regex you can count on
           | working? It doesn't need to know the capital of Togo,
           | metabolic pathways of the eukaryotic cell, or human
           | psychology.
           | 
           | For that matter, if it had no pre-training, it means it can
           | generalize to any new programming languages, libraries, and
           | entire tasks. You can use it to analyze the grammar of a
           | dying African language, write stories in the style of
           | Hemingway, and diagnose cancer on patient data. In all of
           | these, there are only so many samples to fit on.
        
             | ta8645 wrote:
             | Of course, none of us have exhaustive knowledge. I don't
             | know the capital of Togo.
             | 
             | But I do have enough knowledge to know what an IDE is, and
             | where that sits in a technological stack, i know what a
             | string is, and all that it relies on etc. There's a huge
             | body of knowledge that is required to even begin
             | approaching the problem. If you posted that challenge to an
             | intelligent person from 2000 years ago, they would just
             | stare at you blankly. It doesn't matter how intelligent
             | they are, they have no context to understand anything about
             | the task.
        
               | pona-a wrote:
               | > If you posted that challenge to an intelligent person
               | from 20,00 years ago, they would just stare at you
               | blankly.
               | 
               | Depending on how you pose it. If I give you a long enough
               | series of ordered cards, you'll on some basic level begin
               | to understand the spatiotemporal dynamics of them. You'll
               | get the intuition that there's a stack of heads scanning
               | the input, moving forward each turn, either growing the
               | mark, falling back, or aborting. If not constrained by
               | using matrices, I can draw you a state diagram, which
               | would have much clearer immediate metaphors than colored
               | squares.
               | 
               | Do these explanations correspond to some priors in human
               | cognition? I suppose. But I don't think you strictly need
               | them for effective few-shot learning. My main point is
               | that learning itself is a skill, which generalist LLMs do
               | possess, but only as one of their competencies.
        
               | ta8645 wrote:
               | Well Dr. Michael Levin would agree with you in the sense
               | that he ascribes intelligence to any system that can
               | accomplish a goal through multiple pathways. So for
               | instance the single-celled Lacrymaria, lacking a brain or
               | nervous system, can still navigate its environment to
               | find food and fulfill its metabolic needs.
               | 
               | However, I assumed what we're talking about when we
               | discuss AGI is what we'd expect a human to be able to
               | accomplish in the world at our scale. The examples of
               | learning without knowledge you've given, to my mind at
               | least, are a lower level of intelligence that doesn't
               | really approach human level AGI.
        
             | bloomingkales wrote:
             | _A lot of useful specialized problems are just patterns._
             | 
             |  _It doesn 't need to know the capital of Togo, metabolic
             | pathways of the eukaryotic cell, or human psychology._
             | 
             | What if knowing those things distills down to a pattern
             | that matches a pattern of your code and vice versa? There's
             | a pattern in everything, so know everything, and be ready
             | to pattern match.
             | 
             | If you just look at object oriented programming, you can
             | easily see how knowing a lot translates to abstract
             | concepts. There's no reason those concepts can't be
             | translated bidirectionally.
        
           | dchichkov wrote:
           | For long context sizes AGI is not useless without vast
           | knowledge. You could always put a bootstrap sequence into the
           | context (think Arecibo Message), followed by your prompt. A
           | general enough reasoner with enough compute should be able to
           | establish the context and reason about your prompt.
        
             | conradev wrote:
             | Isn't knowledge of language necessary to decode prompts?
        
             | ta8645 wrote:
             | Yes, but that just effectively recreates the pretraining.
             | You're going to have to explain everything down to what an
             | atom is, and essentially all human knowledge if you want to
             | have any ability to consider abstract solutions that call
             | on lessons from foreign domains.
             | 
             | There's a reason people with comparable intelligence
             | operate at varying degrees of effectiveness, and it has to
             | do with how knowledgeable they are.
        
               | pona-a wrote:
               | Would that make in-context learning a superset or a
               | subset of pretraining?
               | 
               | This paper claimed transformers learn a gradient-descent
               | mesa-optimizer as part of in-context learning, while
               | being guided by the pretraining objective, and as the
               | parent mentioned, any general reasoner can bootstrap a
               | world model from first principles.
               | 
               | [0] https://arxiv.org/pdf/2212.07677
        
               | ta8645 wrote:
               | > Would that make in-context learning a superset or a
               | subset of pretraining?
               | 
               | I guess a superset. But it doesn't really matter either
               | way. Ultimately, there's no useful distinction between
               | pretraining and in-context learning. They're just an
               | artifact of the current technology.
        
           | tripplyons wrote:
           | I'm not at all experienced in neuroscience, but I think that
           | humans and other animals primarily gain intelligence by
           | learning from their sensory input.
        
             | FergusArgyll wrote:
             | You don't think a lot is encoded in genes from before we're
             | born?
        
               | aaronblohowiak wrote:
               | >a lot
               | 
               | this is pretty vague. I certainly dont think a mastery of
               | any concept invented in last thousand years would be
               | considered encoded in genes though we would want or
               | expect an AGI to be able to learn calculus for instance.
               | In terms of "encoded in genes", I'd say most of what is
               | asked or expected of AGI goes beyond what feral children
               | (https://en.wikipedia.org/wiki/Feral_child) were able to
               | demonstrate.
        
         | tripplyons wrote:
         | I think that most human learning comes from years of sensory
         | input. Why should we expect a machine to generalize well
         | without any background?
        
           | Krasnol wrote:
           | I'd guess it's because we don't want to have another human.
           | We want something better. Therefore, the expectations on the
           | learning process are way beyond what humans do. I guess some
           | are expecting some magic word (formula) which would be like a
           | seed with unlimited potential.
           | 
           | So like humans after all but faster.
           | 
           | I guess it's just hard to write a book about the way you
           | write that book.
        
           | andoando wrote:
           | It does but it also generalizes extremely well
        
           | aithrowawaycomm wrote:
           | Newborns (and certainly toddlers) seem to understand the
           | underlying concepts for these things when it comes to
           | visual/hepatic object identification and "folk physics":
           | A short list of abilities that cannot be performed by
           | CompressARC includes:            Assigning two colors to each
           | other (see puzzle 0d3d703e)       Repeating an operation in
           | series many times (see puzzle 0a938d79)
           | Counting/numbers (see puzzle ce9e57f2)       Translation,
           | rotation, reflections, rescaling, image duplication (see
           | puzzles 0e206a2e, 5ad4f10b, and 2bcee788)       Detecting
           | topological properties such as connectivity (see puzzle
           | 7b6016b9)
           | 
           | Note: I am _not_ saying newborns can solve the corresponding
           | ARC problems! The point is there is a lot of evidence that
           | many of the concepts ARC-AGI is (allegedly) measuring are
           | innate in humans, and maybe most animals; e.g. cockroaches
           | can quickly identify connected /disconnected components when
           | it comes to pathfinding. Again, not saying cockroaches can
           | solve ARC :) OTOH even if orcas were smarter than humans they
           | would struggle with ARC - it would be way too baffling and
           | obtuse if your culture doesn't have the concept of written
           | standardized tests. (I was solving state-mandated ARCish
           | problems since elementary school.) This also applies to
           | hunter-gatherers, and note the converse: if you plopped me
           | down among the Khoisan in the Kalahari, they would think I
           | was an ignorant moron. But it makes as much sense
           | scientifically to say "human-level intelligence" entails
           | "human-level hunter-gathering" instead of "human-level IQ
           | problems."
        
             | Ukv wrote:
             | > there is a lot of evidence that many of the concepts ARC-
             | AGI is (allegedly) measuring are innate in humans
             | 
             | I'd argue that "innate" here still includes a brain
             | structure/nervous system that evolved on 3.5 billion years
             | worth of data. Extensive pre-training of one kind or
             | another currently seems the best way to achieve generality.
        
         | jshmrsn wrote:
         | If the machine can decide how to train itself (adjust weights)
         | when faced with a type of problem it hasn't seen before, then I
         | don't think that would go against the spirit of general
         | intelligence. I think that's basically what humans do when they
         | decide to get better at something, they figure out how to
         | practice that task until they get better at it.
        
           | pona-a wrote:
           | In-context learning is a very different problem from regular
           | prediction. It is quite simple to fit a stationary solution
           | to noisy data, that's just a matter of tuning some parameters
           | with fairly even gradients. In-context learning implies
           | you're essentially learning a mesa-optimizer for the class of
           | problems you're facing, which in the form of transformers
           | means essentially means fitting something not that far from a
           | differentiable Turing machine with no inductive biases.
        
         | fsndz wrote:
         | Exactly. That's basically the problem with a lot of the current
         | paradigm, they don't allow true generalisation. That's why some
         | people say there won't be any AGI anytime soon:
         | https://www.lycee.ai/blog/why-no-agi-openai
        
           | exe34 wrote:
           | "true generalisation" isn't really something a lot of humans
           | can do.
        
             | fsndz wrote:
             | the thing is LLMs don't even do the kind of generalisations
             | the dumbest human can do. while simultaneously doing some
             | stuff the smartest human probably can't
        
       | AIorNot wrote:
       | I was thinking about this lex friedman podcast with Marcus
       | Hutter. Also, Joshua Bach defined intelligence as the ability to
       | accurately model reality.. is lossless compression itself
       | intelligence or a best fit model- is there a difference?
       | https://www.youtube.com/watch?v=E1AxVXt2Gv4
        
       | d--b wrote:
       | > ARC-AGI, introduced in 2019, is an artificial intelligence
       | benchmark designed to test a system's ability to infer and
       | generalize abstract rules from minimal examples. The dataset
       | consists of IQ-test-like puzzles, where each puzzle provides
       | several example images that demonstrate an underlying rule, along
       | with a test image that requires completing or applying that rule.
       | While some have suggested that solving ARC-AGI might signal the
       | advent of artificial general intelligence (AGI), its true purpose
       | is to spotlight the current challenges hindering progress toward
       | AGI
       | 
       | Well they kind of define intelligence as the ability to compress
       | information into a set of rules, so yes, compression does that...
        
       | programjames wrote:
       | Here's what they did:
       | 
       | 1. Choose random samples z ~ N(m, S) as the "encoding" of a
       | puzzle, and a distribution of neural network weights p(th) ~
       | N(th, <very small variance>).
       | 
       | 2. For a given z and th, you can decode to get a distribution of
       | pixel colors. We want these pixel colors to match the ones in our
       | samples, but they're not guaranteed to, so we'll have to add some
       | correction e.
       | 
       | 3. Specifying e takes KL(decoded colors || actual colors) bits.
       | If we had sources of randomness q(z), q(th), specifying z and th
       | would take KL(p(z) || q(z)) and KL(p(th) || q(th)) bits.
       | 
       | 4. The authors choose q(z) ~ N(0, 1) so KL(p(z) || q(z)) =
       | 0.5(m^2 + S^2 - 1 - 2ln S). Similarly, they choose q(th) ~ N(0,
       | 1/2l), and since Var(th) is very small, this gives KL(p(th) ||
       | q(th)) = lth^2.
       | 
       | 5. The fewer bits they use, the lower the Kolmogorov complexity,
       | and the more likely it is to be correct. So, they want to
       | minimize the number of bits
       | 
       | a * 0.5(m^2 + S^2 - 1 - 2ln S) + l * th^2 + c * KL(decoded colors
       | || actual colors).
       | 
       | 6. Larger a gives a smaller latent, larger l gives a smaller
       | neural network, and larger c gives a more accurate solution. I
       | think all they mention is they choose c = 10a, and that l was
       | pretty large.
       | 
       | They can then train m, S, th until it solves the examples for a
       | given puzzle. Decoding will then give all the answers, including
       | the unknown answer! The main drawback to this method is, like
       | Gaussian splatting, they have to train an entire neural network
       | for every puzzle. But, the neural networks are pretty small, so
       | you could train a "hypernetwork" that predicts m, S, th for a
       | given puzzle, and even predicts how to train these parameters.
        
       ___________________________________________________________________
       (page generated 2025-03-04 23:00 UTC)