[HN Gopher] ARC-AGI without pretraining
___________________________________________________________________
ARC-AGI without pretraining
Author : georgehill
Score : 129 points
Date : 2025-03-04 19:52 UTC (3 hours ago)
(HTM) web link (iliao2345.github.io)
(TXT) w3m dump (iliao2345.github.io)
| pona-a wrote:
| I feel like extensive pretraining goes against the spirit of
| generality.
|
| If you can create a general machine that can take 3 examples and
| synthesize a program that predicts the 4th, you've just solved
| oracle synthesis. If you train a network on all human knowledge,
| including puzzle making, and then fine-tune it on 99% of the
| dataset and give it a dozen attempts for the last 1%, you've just
| made an expensive compressor for test-maker's psychology.
| ta8645 wrote:
| The issue is that general intelligence is useless without vast
| knowledge. The pretraining is the knowledge, not the
| intelligence.
| raducu wrote:
| > The pretraining is the knowledge, not the intelligence.
|
| I thought the knowledge is the training set and the
| intelligence is the emergent/side effect of reproducing that
| knowledge by making sure the reproduction is not rote
| memorisation?
| ta8645 wrote:
| I'd say that it takes intelligence to encode knowledge, and
| the more knowledge you have, the more intelligently you can
| encode further knowledge, in a virtuous cycle. But once you
| have a data set of knowledge, there's nothing to emerge,
| there are no side effects. It just sits there doing
| nothing. The intelligence is in the algorithms that access
| that encoded knowledge to produce something else.
| esafak wrote:
| The data set is flawed, noisy, and its pieces are
| disconnected. It takes intelligence to correct its flaws
| and connect them parsimoniously.
| ta8645 wrote:
| It takes knowledge to even know they're flawed, noisy,
| and disconnected. There's no reason to "correct"
| anything, unless you have knowledge that applying
| previously "understood" data has in fact produced
| deficient results in some application.
|
| That's reinforcement learning -- an algorithm, which
| requires accurate knowledge acquisition, to be effective.
| pona-a wrote:
| I don't think so. A lot of useful specialized problems are
| just patterns. Imagine your IDE could take 5 examples of
| matching strings and produce a regex you can count on
| working? It doesn't need to know the capital of Togo,
| metabolic pathways of the eukaryotic cell, or human
| psychology.
|
| For that matter, if it had no pre-training, it means it can
| generalize to any new programming languages, libraries, and
| entire tasks. You can use it to analyze the grammar of a
| dying African language, write stories in the style of
| Hemingway, and diagnose cancer on patient data. In all of
| these, there are only so many samples to fit on.
| ta8645 wrote:
| Of course, none of us have exhaustive knowledge. I don't
| know the capital of Togo.
|
| But I do have enough knowledge to know what an IDE is, and
| where that sits in a technological stack, i know what a
| string is, and all that it relies on etc. There's a huge
| body of knowledge that is required to even begin
| approaching the problem. If you posted that challenge to an
| intelligent person from 2000 years ago, they would just
| stare at you blankly. It doesn't matter how intelligent
| they are, they have no context to understand anything about
| the task.
| pona-a wrote:
| > If you posted that challenge to an intelligent person
| from 20,00 years ago, they would just stare at you
| blankly.
|
| Depending on how you pose it. If I give you a long enough
| series of ordered cards, you'll on some basic level begin
| to understand the spatiotemporal dynamics of them. You'll
| get the intuition that there's a stack of heads scanning
| the input, moving forward each turn, either growing the
| mark, falling back, or aborting. If not constrained by
| using matrices, I can draw you a state diagram, which
| would have much clearer immediate metaphors than colored
| squares.
|
| Do these explanations correspond to some priors in human
| cognition? I suppose. But I don't think you strictly need
| them for effective few-shot learning. My main point is
| that learning itself is a skill, which generalist LLMs do
| possess, but only as one of their competencies.
| ta8645 wrote:
| Well Dr. Michael Levin would agree with you in the sense
| that he ascribes intelligence to any system that can
| accomplish a goal through multiple pathways. So for
| instance the single-celled Lacrymaria, lacking a brain or
| nervous system, can still navigate its environment to
| find food and fulfill its metabolic needs.
|
| However, I assumed what we're talking about when we
| discuss AGI is what we'd expect a human to be able to
| accomplish in the world at our scale. The examples of
| learning without knowledge you've given, to my mind at
| least, are a lower level of intelligence that doesn't
| really approach human level AGI.
| bloomingkales wrote:
| _A lot of useful specialized problems are just patterns._
|
| _It doesn 't need to know the capital of Togo, metabolic
| pathways of the eukaryotic cell, or human psychology._
|
| What if knowing those things distills down to a pattern
| that matches a pattern of your code and vice versa? There's
| a pattern in everything, so know everything, and be ready
| to pattern match.
|
| If you just look at object oriented programming, you can
| easily see how knowing a lot translates to abstract
| concepts. There's no reason those concepts can't be
| translated bidirectionally.
| dchichkov wrote:
| For long context sizes AGI is not useless without vast
| knowledge. You could always put a bootstrap sequence into the
| context (think Arecibo Message), followed by your prompt. A
| general enough reasoner with enough compute should be able to
| establish the context and reason about your prompt.
| conradev wrote:
| Isn't knowledge of language necessary to decode prompts?
| ta8645 wrote:
| Yes, but that just effectively recreates the pretraining.
| You're going to have to explain everything down to what an
| atom is, and essentially all human knowledge if you want to
| have any ability to consider abstract solutions that call
| on lessons from foreign domains.
|
| There's a reason people with comparable intelligence
| operate at varying degrees of effectiveness, and it has to
| do with how knowledgeable they are.
| pona-a wrote:
| Would that make in-context learning a superset or a
| subset of pretraining?
|
| This paper claimed transformers learn a gradient-descent
| mesa-optimizer as part of in-context learning, while
| being guided by the pretraining objective, and as the
| parent mentioned, any general reasoner can bootstrap a
| world model from first principles.
|
| [0] https://arxiv.org/pdf/2212.07677
| ta8645 wrote:
| > Would that make in-context learning a superset or a
| subset of pretraining?
|
| I guess a superset. But it doesn't really matter either
| way. Ultimately, there's no useful distinction between
| pretraining and in-context learning. They're just an
| artifact of the current technology.
| tripplyons wrote:
| I'm not at all experienced in neuroscience, but I think that
| humans and other animals primarily gain intelligence by
| learning from their sensory input.
| FergusArgyll wrote:
| You don't think a lot is encoded in genes from before we're
| born?
| aaronblohowiak wrote:
| >a lot
|
| this is pretty vague. I certainly dont think a mastery of
| any concept invented in last thousand years would be
| considered encoded in genes though we would want or
| expect an AGI to be able to learn calculus for instance.
| In terms of "encoded in genes", I'd say most of what is
| asked or expected of AGI goes beyond what feral children
| (https://en.wikipedia.org/wiki/Feral_child) were able to
| demonstrate.
| tripplyons wrote:
| I think that most human learning comes from years of sensory
| input. Why should we expect a machine to generalize well
| without any background?
| Krasnol wrote:
| I'd guess it's because we don't want to have another human.
| We want something better. Therefore, the expectations on the
| learning process are way beyond what humans do. I guess some
| are expecting some magic word (formula) which would be like a
| seed with unlimited potential.
|
| So like humans after all but faster.
|
| I guess it's just hard to write a book about the way you
| write that book.
| andoando wrote:
| It does but it also generalizes extremely well
| aithrowawaycomm wrote:
| Newborns (and certainly toddlers) seem to understand the
| underlying concepts for these things when it comes to
| visual/hepatic object identification and "folk physics":
| A short list of abilities that cannot be performed by
| CompressARC includes: Assigning two colors to each
| other (see puzzle 0d3d703e) Repeating an operation in
| series many times (see puzzle 0a938d79)
| Counting/numbers (see puzzle ce9e57f2) Translation,
| rotation, reflections, rescaling, image duplication (see
| puzzles 0e206a2e, 5ad4f10b, and 2bcee788) Detecting
| topological properties such as connectivity (see puzzle
| 7b6016b9)
|
| Note: I am _not_ saying newborns can solve the corresponding
| ARC problems! The point is there is a lot of evidence that
| many of the concepts ARC-AGI is (allegedly) measuring are
| innate in humans, and maybe most animals; e.g. cockroaches
| can quickly identify connected /disconnected components when
| it comes to pathfinding. Again, not saying cockroaches can
| solve ARC :) OTOH even if orcas were smarter than humans they
| would struggle with ARC - it would be way too baffling and
| obtuse if your culture doesn't have the concept of written
| standardized tests. (I was solving state-mandated ARCish
| problems since elementary school.) This also applies to
| hunter-gatherers, and note the converse: if you plopped me
| down among the Khoisan in the Kalahari, they would think I
| was an ignorant moron. But it makes as much sense
| scientifically to say "human-level intelligence" entails
| "human-level hunter-gathering" instead of "human-level IQ
| problems."
| Ukv wrote:
| > there is a lot of evidence that many of the concepts ARC-
| AGI is (allegedly) measuring are innate in humans
|
| I'd argue that "innate" here still includes a brain
| structure/nervous system that evolved on 3.5 billion years
| worth of data. Extensive pre-training of one kind or
| another currently seems the best way to achieve generality.
| jshmrsn wrote:
| If the machine can decide how to train itself (adjust weights)
| when faced with a type of problem it hasn't seen before, then I
| don't think that would go against the spirit of general
| intelligence. I think that's basically what humans do when they
| decide to get better at something, they figure out how to
| practice that task until they get better at it.
| pona-a wrote:
| In-context learning is a very different problem from regular
| prediction. It is quite simple to fit a stationary solution
| to noisy data, that's just a matter of tuning some parameters
| with fairly even gradients. In-context learning implies
| you're essentially learning a mesa-optimizer for the class of
| problems you're facing, which in the form of transformers
| means essentially means fitting something not that far from a
| differentiable Turing machine with no inductive biases.
| fsndz wrote:
| Exactly. That's basically the problem with a lot of the current
| paradigm, they don't allow true generalisation. That's why some
| people say there won't be any AGI anytime soon:
| https://www.lycee.ai/blog/why-no-agi-openai
| exe34 wrote:
| "true generalisation" isn't really something a lot of humans
| can do.
| fsndz wrote:
| the thing is LLMs don't even do the kind of generalisations
| the dumbest human can do. while simultaneously doing some
| stuff the smartest human probably can't
| AIorNot wrote:
| I was thinking about this lex friedman podcast with Marcus
| Hutter. Also, Joshua Bach defined intelligence as the ability to
| accurately model reality.. is lossless compression itself
| intelligence or a best fit model- is there a difference?
| https://www.youtube.com/watch?v=E1AxVXt2Gv4
| d--b wrote:
| > ARC-AGI, introduced in 2019, is an artificial intelligence
| benchmark designed to test a system's ability to infer and
| generalize abstract rules from minimal examples. The dataset
| consists of IQ-test-like puzzles, where each puzzle provides
| several example images that demonstrate an underlying rule, along
| with a test image that requires completing or applying that rule.
| While some have suggested that solving ARC-AGI might signal the
| advent of artificial general intelligence (AGI), its true purpose
| is to spotlight the current challenges hindering progress toward
| AGI
|
| Well they kind of define intelligence as the ability to compress
| information into a set of rules, so yes, compression does that...
| programjames wrote:
| Here's what they did:
|
| 1. Choose random samples z ~ N(m, S) as the "encoding" of a
| puzzle, and a distribution of neural network weights p(th) ~
| N(th, <very small variance>).
|
| 2. For a given z and th, you can decode to get a distribution of
| pixel colors. We want these pixel colors to match the ones in our
| samples, but they're not guaranteed to, so we'll have to add some
| correction e.
|
| 3. Specifying e takes KL(decoded colors || actual colors) bits.
| If we had sources of randomness q(z), q(th), specifying z and th
| would take KL(p(z) || q(z)) and KL(p(th) || q(th)) bits.
|
| 4. The authors choose q(z) ~ N(0, 1) so KL(p(z) || q(z)) =
| 0.5(m^2 + S^2 - 1 - 2ln S). Similarly, they choose q(th) ~ N(0,
| 1/2l), and since Var(th) is very small, this gives KL(p(th) ||
| q(th)) = lth^2.
|
| 5. The fewer bits they use, the lower the Kolmogorov complexity,
| and the more likely it is to be correct. So, they want to
| minimize the number of bits
|
| a * 0.5(m^2 + S^2 - 1 - 2ln S) + l * th^2 + c * KL(decoded colors
| || actual colors).
|
| 6. Larger a gives a smaller latent, larger l gives a smaller
| neural network, and larger c gives a more accurate solution. I
| think all they mention is they choose c = 10a, and that l was
| pretty large.
|
| They can then train m, S, th until it solves the examples for a
| given puzzle. Decoding will then give all the answers, including
| the unknown answer! The main drawback to this method is, like
| Gaussian splatting, they have to train an entire neural network
| for every puzzle. But, the neural networks are pretty small, so
| you could train a "hypernetwork" that predicts m, S, th for a
| given puzzle, and even predicts how to train these parameters.
___________________________________________________________________
(page generated 2025-03-04 23:00 UTC)