[HN Gopher] There are no new ideas in AI only new datasets
       ___________________________________________________________________
        
       There are no new ideas in AI only new datasets
        
       Author : bilsbie
       Score  : 246 points
       Date   : 2025-06-30 14:43 UTC (8 hours ago)
        
 (HTM) web link (blog.jxmo.io)
 (TXT) w3m dump (blog.jxmo.io)
        
       | ctoth wrote:
       | Reinforcement learning from self-play/AlphaWhatever? Nah must
       | just be datasets. :)
        
         | grumpopotamus wrote:
         | https://en.wikipedia.org/wiki/TD-Gammon
        
           | Y_Y wrote:
           | You raise a really interesting point. I'm sure it's just
           | missed my notice, but I'm not familiar with any projects from
           | antediluvian AI that have been resurrected to run on modern
           | hardware and see where they'd really asymptote if they'd had
           | the compute they deserved.
        
             | FeepingCreature wrote:
             | To be fair, usually those projects would need considerable
             | work to be ported to modern multicore machines, let alone
             | GPUs.
        
               | genewitch wrote:
               | can you name a couple so i can see how much work is
               | involved? markov chains compile fast and respond fast,
               | sure, and neural nets train pretty quick too, so i'm
               | wondering where the cutoff is; expert systems?
        
           | zahlman wrote:
           | For that matter, https://gist.github.com/deebs67/8fbcf8b127a6
           | 3e70d4a3f8590c97... .
        
         | NitpickLawyer wrote:
         | And architecture stuff like actually useful long context.
         | Whatever they did with gemini 2.5 is miles ahead in long
         | context useful results compared to the previous models. I'd be
         | very surprised if gemini 2.5 is "just" gemini 1 w/ better data.
        
           | shwouchk wrote:
           | i dont know what all the hype is with gemini 2.5, at least
           | the currently running instance. from my experience at least
           | in conversation mode, it cannot remember my instructions to
           | avoid apologies and similar platitudes from either the
           | "persona", personal instructions, or from ine message to the
           | next.
        
         | nyrikki wrote:
         | Big difference between a perfect information, completely
         | specified zero sum game and the real world.
         | 
         | As a simple analogy, read out the following sentence multiple
         | times, stressing a different word each time.
         | 
         | "I never said she stole my money"
         | 
         | Note how the meaning changes and is often unique?
         | 
         | That is a lens I to the frame problem and it's inverse, the
         | specification problem.
         | 
         | The above problem quickly becomes tower-complete, and recent
         | studies suggest that RL is reinforcing or increasing the weight
         | of existing patterns.
         | 
         | As the open domain frame problem and similar challenges are
         | equivalent to HALT, finding new ways to extract useful
         | information will be important for generalization IMHO.
         | 
         | Synthetic data is useful, but not a complete solution,
         | especially for tower problems.
        
           | genewitch wrote:
           | The one we use is "I always pay my taxes"
           | 
           | and as far as synthetic vs real data, there's a lot of gaps
           | in LLM knowledge; and vision models suffer from "limited
           | tags", which used to have workarounds with textual embeddings
           | and the like, but those went by the wayside as LoRA,
           | controlnet, etc. appeared.
           | 
           | There's people who are fairly well known that LLMs have no
           | idea about. There's things in books i own that the AI
           | confidently tells me are either wrong or don't exist.
           | 
           | That one page about compressing 1 gig wikipedia as small as
           | possible implicitly and explicitly states that AI is
           | "basically compression" - and if the data isn't there, it's
           | not in the compressed set (weights) either.
           | 
           | And i'll reply to another comment here, about "24/7 rolling/
           | for looped" AI - i thought of doing this when i first found
           | out about LLMs, but context windows are the enemy, here. I
           | have a couple of ideas about how to have a continuous AI, but
           | i don't have the capital to test it out.
        
       | kogus wrote:
       | To be fair, if you imagine a system that successfully reproduced
       | human intelligence, then 'changing datasets' would probably be a
       | fair summary of what it would take to have different models.
       | After all, our own memories, training, education, background, etc
       | are a very large component of our own problem solving abilities.
        
       | jschveibinz wrote:
       | I will respectfully disagree. All "new" ideas come from old
       | ideas. AI is a tool to access old ideas with speed and with new
       | perspectives that hasn't been available up until now.
       | 
       | Innovation is in the cracks: recognition of holes, intersections,
       | tangents, etc. on old ideas. It has bent said that innovation is
       | done on the shoulders of giants.
       | 
       | So AI can be an express elevator up to an army of giant's
       | shoulders? It all depends on how you use the tools.
        
         | gametorch wrote:
         | Exactly!
         | 
         | Can you imagine if we applied the same gatekeeping logic to
         | science?
         | 
         | Imagine you weren't allowed to use someone else's scientific
         | work or any derivative of it.
         | 
         | We would make no progress.
         | 
         | The only legitimate defense I have ever seen here revolves
         | around IP and copyright infringement, which I couldn't care
         | less about.
        
         | alfalfasprout wrote:
         | Access old ideas? Yes. With new perspectives? Not necessarily.
         | An LLM may be able to assist in interpreting data with new
         | perspectives but in practice they're still fairly bad at
         | greenfield work.
         | 
         | As with most things, the truth lies somewhere in the middle.
         | LLMs can be helpful as a way of accelerating certain kinds and
         | certain aspects of research but not others.
        
           | stevep98 wrote:
           | > Access old ideas? Yes. With new perspectives?
           | 
           | I wonder if we can mine patent databases for old ideas that
           | never worked out in the past, but now are more useful.
           | Perhaps due to modern machining or newer materials or just
           | new applications of the idea.
        
         | bcrosby95 wrote:
         | The article is discussing working in AI innovation vs focusing
         | on getting more and better data. And while there have been key
         | breakthroughs in new ideas, one of the best ways to increase
         | the performance of these systems is getting more and better
         | data. And how many people think data is the primary avenue to
         | improvement.
         | 
         | It reminds me of an AI talk a few decades ago, about how the
         | cycle goes: more data -> more layers -> repeat...
         | 
         | Anyways, I'm not sure how your comment relates to these two
         | avenues of improvement.
        
         | jjtheblunt wrote:
         | > I will respectfully disagree. All "new" ideas come from old
         | ideas.
         | 
         | The insight into the structure of the benzene ring famously
         | came in a dream, hadn't been seen before, but was imagined as a
         | snake bitings its own tail.
        
           | troupo wrote:
           | And as we all know, it came in a dream to a complete novice
           | in chemistry with zero knowledge of any old ideas in
           | chemistry: https://en.wikipedia.org/wiki/August_Kekul%C3%A9
           | 
           | --- start quote ---
           | 
           | The empirical formula for benzene had been long known, but
           | its highly unsaturated structure was a challenge to
           | determine. Archibald Scott Couper in 1858 and Joseph
           | Loschmidt in 1861 suggested possible structures that
           | contained multiple double bonds or multiple rings, but the
           | study of aromatic compounds was in its earliest years, and
           | too little evidence was then available to help chemists
           | decide on any particular structure.
           | 
           | More evidence was available by 1865, especially regarding the
           | relationships of aromatic isomers.
           | 
           | [ Kekule claimed to have had the dream in 1865 ]
           | 
           | --- end quote ---
           | 
           | The dream claim came from Kekule himself _25 years after his
           | proposal_ that he had to modify 10 years after he proposed
           | it.
        
         | baxtr wrote:
         | Imagine a human had read every book/publication in every field
         | of knowledge that mankind has ever produced AND couldn't come
         | up with anything entirely new. Hard to imagine.
        
       | tippytippytango wrote:
       | Sometimes we get confused by the difference between technological
       | and scientific progress. When science makes progress it unlocks
       | new S-curves that progress at an incredible pace until you get
       | into the diminishing returns region. People complain of slowing
       | progress but it was always slow, you just didn't notice that
       | nothing new was happening during the exponential take off of the
       | S-curve, just furious optimization.
        
         | baxtr wrote:
         | Fully agree.
         | 
         | And at the same time I have noticed that people don't
         | understand the difference between an S-curve and an exponential
         | function. They can look almost identical at certain intervals.
        
       | ks2048 wrote:
       | The latest LLMs are simply multiplying and adding various numbers
       | together... Babylonians were doing that 4000 years ago.
        
         | bobson381 wrote:
         | You are just a lot of interactions of waves. All meaning is
         | assigned. I prefer to think of this like the Goedel generator
         | that found new formal expressions for the Principia - because
         | we have a way of indexing concept-space, there's no telling
         | what we might find in the gaps.
        
         | thenaturalist wrote:
         | But on clay tables, not in semi-conductive electron prisons
         | separated by one-atom-thick walls.
         | 
         | Slight difference to those methods, wouldn't you agree?
        
       | voxleone wrote:
       | I'd say with confidence: we're living in the early days. AI has
       | made jaw-dropping progress in two major domains: language and
       | vision. With large language models (LLMs) like GPT-4 and Claude,
       | and vision models like CLIP and DALL*E, we've seen machines that
       | can generate poetry, write code, describe photos, and even hold
       | eerily humanlike conversations.
       | 
       | But as impressive as this is, it's easy to lose sight of the
       | bigger picture: we've only scratched the surface of what
       | artificial intelligence could be -- because we've only scaled two
       | modalities: text and images.
       | 
       | That's like saying we've modeled human intelligence by mastering
       | reading and eyesight, while ignoring touch, taste, smell, motion,
       | memory, emotion, and everything else that makes our cognition
       | rich, embodied, and contextual.
       | 
       | Human intelligence is multimodal. We make sense of the world
       | through:
       | 
       | Touch (the texture of a surface, the feedback of pressure, the
       | warmth of skin0; Smell and taste (deeply tied to memory, danger,
       | pleasure, and even creativity); Proprioception (the sense of
       | where your body is in space -- how you move and balance);
       | Emotional and internal states (hunger, pain, comfort, fear,
       | motivation).
       | 
       | None of these are captured by current LLMs or vision
       | transformers. Not even close. And yet, our cognitive lives depend
       | on them.
       | 
       | Language and vision are just the beginning -- the parts we were
       | able to digitize first - not necessarily the most central to
       | intelligence.
       | 
       | The real frontier of AI lies in the messy, rich, sensory world
       | where people live. We'll need new hardware (sensors), new data
       | representations (beyond tokens), and new ways to train models
       | that grow understanding from experience, not just patterns.
        
         | Swizec wrote:
         | > The real frontier of AI lies in the messy, rich, sensory
         | world where people live. We'll need new hardware (sensors), new
         | data representations (beyond tokens), and new ways to train
         | models that grow understanding from experience, not just
         | patterns.
         | 
         | Like Dr. Who said: DALEKs aren't brains in a machine, they
         | _are_ the machine!
         | 
         | Same is true for humans. We really are the whole body, we're
         | not just driving it around.
        
           | nomel wrote:
           | There are many people who mentally developed while paralyzed
           | that literally drive around their bodies via motorized
           | wheelchair. I don't think there's any evidence that a brain
           | couldn't exist or develop in a jar, given only the inputs
           | modern AI now has (text, video, audio).
        
             | Swizec wrote:
             | > any evidence that a brain couldn't exist or develop in a
             | jar
             | 
             | The brain _could_. Of course it could. It 's just a signals
             | processing machine.
             | 
             | But would it be missing anything we consider core to the
             | way humans think? Would it struggle with parts of
             | cognition?
             | 
             | For example: experiments were done with cats growing up in
             | environments with vertical lines only. They were then put
             | in a normal room and had a hard time understanding flat
             | surfaces.
             | 
             | https://computervisionblog.wordpress.com/2013/06/01/cats-
             | and...
        
               | nomel wrote:
               | This isn't remotely a hypothetical, so I imagine there
               | are some examples out there, especially from back when
               | polio was a problem. Although, for practical reasons,
               | they might have had limited _exposure to novelty_ , which
               | could have negative consequences.
        
         | skydhash wrote:
         | Yeah, but are there new ideas or only wishes?
        
           | jdgoesmarching wrote:
           | It's pure magical thinking that would be correctly dismissed
           | if it didn't have AI attached to it. Imagine talking this way
           | about anything else.
           | 
           | "We've barely scratched the surface with Rust, so far we're
           | only focused on code and haven't even explored building
           | mansions or ending world hunger"
        
             | tim333 wrote:
             | AI has some real possibilities of building mansions and
             | ending hunger in a way that Rust doesn't.
        
         | dinfinity wrote:
         | > Language and vision are just the beginning -- the parts we
         | were able to digitize first - not necessarily the most central
         | to intelligence.
         | 
         | I respectfully disagree. Touch gives pretty cool skills, but
         | language, video and audio are all that are needed _for all
         | online interactions_. We use touch for typing and pointing, but
         | that is only because we don 't have a more efficient and
         | effective interface.
         | 
         | Now I'm not saying that all other senses are uninteresting.
         | Integrating touch, extensive proprioception, and olfaction is
         | going to unlock a lot of 'real world' behavior, but your
         | comment was specifically about intelligence.
         | 
         | Compare humans to apes and other animals and the thing that
         | sets us apart is definitely not in the 'remaining' senses, but
         | firmly in the realm of audio, video and language.
        
           | voxleone wrote:
           | > Language and vision are just the beginning -- the parts we
           | were able to digitize first - not necessarily the most
           | central to intelligence.
           | 
           | I probably made a mistake when i asserted that -- should have
           | thought it over. Vision is evolutionarily older and more
           | "primitive", while language is uniquely human [or maybe, more
           | broadly, primate, cetacean, cephalopod, avian...] symbolic,
           | and abstract -- arguably a different order of cognition
           | altogether. But i maintain that each and every sense is
           | important as far as human cognition -- and its replication --
           | is concerned.
        
             | wizzwizz4 wrote:
             | People who lack one of those senses, or even two of them,
             | tend to do just fine.
        
         | chasd00 wrote:
         | > Language and vision are just the beginning..
         | 
         | Based on the architectures we have they may also be the ending.
         | There's been a lot of news in the past couple years about LLMs
         | but has there been any breakthroughs making headlines anywhere
         | else in AI?
        
           | dragonwriter wrote:
           | > There's been a lot of news in the past couple years about
           | LLMs but has there been any breakthroughs making headlines
           | anywhere else in AI?
           | 
           | Yeah, lots of stuff tied to robotics, for instance; this
           | overlaps with vision, but the advances go beyond vision.
           | 
           | Audio has seen quite a bit. And I imagine there is stuff
           | happening in niche areas that just aren't as publicly
           | interesting as language, vision/imagery, audio, and robotics.
        
           | nomel wrote:
           | Two Nobel prizes in chemistry:
           | https://www.nature.com/articles/s41746-024-01345-9
        
           | edanm wrote:
           | Sure. In physics, math, chemistry, biology. To name a few.
        
         | mr_world wrote:
         | Organic adaption and persistence of memory I would say are the
         | two major advancements that need to happen.
         | 
         | Human neural networks are dynamic, they change and rearrange,
         | grow and sever. An LLM is fixed and relies on context, if you
         | give it the right answer it won't "learn" that is the correct
         | answer unless it is fed back into the system and trained over
         | months. What if it's only the right answer for a limited period
         | of time?
         | 
         | To build an intelligent machine, it must be able train itself
         | in real time and remember.
        
           | specialist wrote:
           | Yes and: and forget.
        
       | anon291 wrote:
       | I mean there's no new ideas for saas but just new applications
       | and that worked out pretty well
        
       | luppy47474 wrote:
       | Hmmm
        
       | rar00 wrote:
       | disagree, there are a few organisations exploring novel paths.
       | It's just that throwing new data at an "old" algorithm is much
       | easier and has been a winning strategy. And, also, there's no
       | incentive for a private org to advertise a new idea that seems to
       | be working (mine's a notable exception :D).
        
       | tantalor wrote:
       | > If data is the only thing that matters, why are 95% of people
       | working on new methods?
       | 
       | Because new methods unlock access to new datasets.
       | 
       | Edit: Oh I see this was a rhetorical question answered in the
       | next paragraph. D'oh
        
       | piinbinary wrote:
       | AI training is currently a process of making the AI remember the
       | dataset. It doesn't involve the AI thinking about the dataset and
       | drawing (and remembering) conclusions.
       | 
       | It can probably remember more facts about a topic than a PhD in
       | that topic, but the PhD will be better at thinking about that
       | topic.
        
         | jayd16 wrote:
         | Its a bit more complex than that. Its more about baking out the
         | dataset into heuristics that a machine can use to match a
         | satisfying result to an input. Sometimes these heuristics are
         | surprising to a human and can solve a problem in a novel way.
         | 
         | "Thinking" is too broad a term to apply usefully but I would
         | say its pretty clear we are not close to AGI.
        
         | nkrisc wrote:
         | > It can probably remember more facts about a topic than a PhD
         | in that topic
         | 
         | So can a notebook.
        
         | tantalor wrote:
         | Maybe that's why PhDs keep the textbooks they use at hand, so
         | they don't have to remember everything.
         | 
         | Why should the model need to memorize facts we already have
         | written down somewhere?
        
       | EternalFury wrote:
       | What John Carmack is exploring is pretty revealing. Train models
       | to play 2D video games to a superhuman level, then ask them to
       | play a level they have not seen before or another 2D video game
       | they have not seen before. The transfer function is negative. So,
       | in my definition, no intelligence has been developed, only
       | expertise in a narrow set of tasks.
       | 
       | It's apparently much easier to scare the masses with visions of
       | ASI, than to build a general intelligence that can pick up a new
       | 2D video game faster than a human being.
        
         | ferguess_k wrote:
         | Can you please explain "the transfer function is negative"?
         | 
         | I'm wondering whether one has tested with the same model but on
         | two situations:
         | 
         | 1) Bring it to superhuman level in game A and then present game
         | B, which is similar to A, to it.
         | 
         | 2) Present B to it without presenting A.
         | 
         | If 1) is not significantly better than 2) then maybe it is not
         | carrying much "knowledge", or maybe we simply did not program
         | it correctly.
        
           | tough wrote:
           | I think the problem is we train models to pattern match, not
           | to learn or reason about world models
        
             | NBJack wrote:
             | In other words, they learn the game, not how to _play
             | games_.
        
               | fsmv wrote:
               | They memorize the answers not the process to arrive at
               | answers
        
               | IshKebab wrote:
               | This has been disproven so many times... They clearly do
               | both. You can trivially prove this yourself.
        
               | 0xWTF wrote:
               | > You can trivially prove this yourself.
               | 
               | Given the long list of dead philosophers of mind, if you
               | have a trivial proof, would you mind providing a link?
        
               | pdabbadabba wrote:
               | It's really easy: go to Claude and ask it a novel
               | question. It will generally reason its way to a perfectly
               | good answer even if there is no direct example of it in
               | the training data.
        
               | MichaelZuo wrote:
               | How do you know it's a novel question?
        
               | IshKebab wrote:
               | It's not exactly difficult to come up with a question
               | that's so unusual the chance of it being in the training
               | set is effectively zero.
        
               | troupo wrote:
               | And as any programmer will tell you: they immediately
               | devolve into "hallucinating" answers, not trying to
               | actually reason about the world. Because that's what they
               | do: they create statistically plausible answers even if
               | those answers are complete nonsense.
        
               | MichaelZuo wrote:
               | Can you provide some examples of these genuinely unique
               | questions?
        
               | keerthiko wrote:
               | When LLM's come up with answers to questions that aren't
               | directly exampled in the training data, that's not proof
               | at all that it reasoned its way there -- it can very much
               | still be pattern matching without insight from the actual
               | code execution of the answer generation.
               | 
               | If we were taking a walk and you asked me for an
               | explanation for a mathematical concept I have not
               | actually studied, I am fully capable of hazarding a
               | casual guess based on the other topics I _have_ studied
               | within seconds. This is the default approach of an LLM,
               | except with much greater breadth and recall of studied
               | topics than I, as a human, have.
               | 
               | This would be very different than if we sat down at a
               | library and I applied the various concepts and theorems I
               | already knew to make inferences, built upon them, and
               | then derived an understanding based on reasoning of the
               | steps I took (often after backtracking from several
               | reasoning dead ends) before providing the explanation.
               | 
               | If you ask an LLM to explain their reasoning, it's
               | unclear whether it just guessed the explanation and
               | reasoning too, or if that was actually the set of steps
               | it took to get to the first answer they gave you. This is
               | why LLMs are able to correct themselves after claiming
               | strawberry has 2 rs, but when providing (guessing again)
               | their explanations they make more "relevant" guesses.
        
               | IshKebab wrote:
               | LLMs clearly don't reason in the same way that humans or
               | SMT solvers do. That doesn't mean they aren't reasoning.
        
               | IshKebab wrote:
               | Just go and ask ChatGPT or Claude something that can't
               | possibly be in its training set. Make something up. If it
               | is _only_ memorising answers then it will be impossible
               | for it to get the correct result.
               | 
               | A simple nonsense programming task would suffice. For
               | example "write a Python function to erase every character
               | from a string unless either of its adjacent characters
               | are also adjacent to it in the alphabet. The string only
               | contains lowercase a-z"
               | 
               | That task isn't anywhere in its training set so they
               | can't memorise the answer. But I bet ChatGPT and Claude
               | can still do it.
               | 
               | Honestly this is sooooo obvious to anyone that has used
               | these tools, it's really insane that people are still
               | parroting (heh) the "it just memorises" line.
        
               | troupo wrote:
               | People who say that LLMs memorize stuff are just as
               | clueless who assume that there's any reasoning happening.
               | 
               | They generate statistically plausible answers (to
               | simplify the answer) based on the training set and
               | weights they have.
        
               | imiric wrote:
               | LLMs don't "memorize" concepts like humans do. They
               | generate output based on token patterns in their training
               | data. So instead of having to be trained on every
               | possible problem, they can still generate output that
               | solves it by referencing the most probable combination of
               | tokens for the specified input tokens. To humans this
               | seems like they're truly solving novel problems, but it's
               | merely a trick of statistics. These tools can reference
               | and generate patterns that no human ever could. This is
               | what makes them useful and powerful, but I would argue
               | not intelligent.
        
               | EternalFury wrote:
               | They learn the value of specific actions in specific
               | contexts based on the rewards they received during their
               | play time. Specific actions and specific contexts are not
               | transferable for various reasons. John quoted that
               | varying frame rates and variable latency between action
               | and effect really confuse the models.
        
               | nightpool wrote:
               | Okay, so fuzz the frame rate and latency? That feels very
               | easy to fix.
        
               | IshKebab wrote:
               | Well yeah... If you only ever played one game in your
               | life you would probably be pretty shit at other games
               | too. This does not seem very revealing to me.
        
               | trainerxr50 wrote:
               | I am decent at chess but barely know how the pieces in Go
               | move.
               | 
               | Of course, this because I have spent a lot of time
               | TRAINING to play chess and basically none training to
               | play go.
               | 
               | I am good on guitar because I started training young but
               | can't play the flute or piano to save my life.
               | 
               | Most complicated skills have basically no transfer or
               | carry over other than knowing how to train on a new
               | skill.
        
               | beefnugs wrote:
               | yeahhhh why isnt there a training structure where you
               | play 5000 games, and the reward function is based on
               | doing well in all of them?
               | 
               | I guess its a totaly different level of control: instead
               | of immediately choosing a certain button to press, you
               | need to set longer term goals. "press whatever sequence
               | over this time i need to do to end up closer to this
               | result"
               | 
               | There is some kind of nested multidimensional thing to
               | train on here instead of immediate limited choices
        
             | antisthenes wrote:
             | Where do you draw the line between pattern matching and
             | reasoning about world models?
             | 
             | A lot of intelligence is just pattern matching and being
             | quick about it.
        
             | singron wrote:
             | I think this is clearly a case of over fitting and failure
             | to generalize, which are really well understood concepts.
             | We don't have to philosophize about what pattern matching
             | really means.
        
             | ferguess_k wrote:
             | I kinda think I'm more or less the same...OK maybe we have
             | different definitions of "pattern matching".
        
               | veqz wrote:
               | It's Plato's cave:
               | 
               | We train the models on what are basically shadows, and
               | they learn how to pattern match the shadows.
               | 
               | But the shadows are only depictions of the real world,
               | and the LLMs never learn about that.
        
               | EternalFury wrote:
               | 100%
        
         | YokoZar wrote:
         | I wonder if this is a case of overfitting from allowing the
         | model to grow too large, and if you might cajole it into
         | learning more generic heuristics by putting some constraints on
         | it.
         | 
         | It sounds like the "best" AI without constraint would just be
         | something like a replay of a record speedrun rather than a
         | smaller set of heuristics of getting through a game, though the
         | latter is clearly much more important with unseen content.
        
         | moralestapia wrote:
         | I wonder how much performance decreases if they just use
         | slightly modified versions of the same game. Like a different
         | color scheme, or a couple different sprites.
        
         | vladimirralev wrote:
         | He is not using appropriate models for this conclusion and
         | neither is he using state of the art models in this research
         | and moreover he doesn't have an expensive foundational model to
         | build upon for 2d games. It's just a fun project.
         | 
         | A serious attempt at video/vision would involve some
         | probabilistic latent space that can be noised in ways that make
         | sense for games in general. I think veo3 proves that ai can
         | generalize 2d and even 3d games, generating a video under
         | prompt constraints is basically playing a game. I think you
         | could prompt veo3 to play any game for a few seconds and it
         | will generally make sense even though it is not fine tuned.
        
           | sigmoid10 wrote:
           | Veo3's world model is still pretty limited. That becomes
           | obvious very fast once you prompt out of distribution video
           | content (i.e. stuff that you are unlikely to find on
           | youtube). It's extremely good at creating photorealistic
           | surfaces and lighting. It even has some reasonably solid
           | understanding of fluid dynamics for simulating water. But for
           | complex human behaviour (in particular certain motions) it
           | simply lacks the training data. Although that's not really a
           | fault of the model and I'm pretty sure there will be a way to
           | overcome this as well. Maybe some kind of physics based
           | simulation as supplement training data.
        
           | keerthiko wrote:
           | > generating a video under prompt constraints is basically
           | playing a game
           | 
           | Besides static puzzles (like a maze or jigsaw) I don't
           | believe this analogy holds? A model working with prompt
           | constraints that aren't evolving or being added over the
           | course of "navigating" the generation of the model's output
           | means it needs to process 0 new information that it didn't
           | come up with itself -- playing a game is different from other
           | generation because it's primarily about reacting to input you
           | didn't know the precise timing/spatial details of, but can
           | learn that they come within a known set of higher order
           | rules. Obviously the more finite/deterministic/predictably
           | probabilistic the video game's solution space, the more it
           | can be inferred from the initial state, aka reduce to the
           | same type of problem as generating a video from a prompt),
           | which is why models are still able to play video games. But
           | as GP pointed out, transfer function negative in such cases
           | -- the overarching rules are not predictable enough across
           | disparate genres.
           | 
           | > I think you could prompt veo3 to play any game for a few
           | seconds
           | 
           | I'm curious what your threshold for what constitutes "play
           | any game" is in this claim? If I wrote a script that maps
           | button combinations to average pixel color of a portion of
           | the screen buffer, by what metric(s) would veo3 be "playing"
           | the game more or better than that script "for a few seconds"?
           | 
           | edit: removing knee-jerk reaction language
        
             | hluska wrote:
             | Nothing the parent said makes this level of aggression
             | necessary or even tasteful. This isn't the Colosseum - we
             | can learn from each other and consider different points of
             | view without acting like savages.
        
               | keerthiko wrote:
               | fair, and I edited my choice of words, but if you're
               | reading that much aggression from my initial comment
               | (which contains topical discussion) to say what you did,
               | you must find the internet a far more savage place than
               | it really is :/
        
               | hluska wrote:
               | Thanks for your concern.
               | 
               | The internet is fine - your comment was way beyond what
               | any reasonable person should tolerate. You don't need to
               | overcompensate for your lack of education in such
               | aggressive ways. Most people don't care.
               | 
               | Good job on the edit though and have an excellent day.
        
             | vladimirralev wrote:
             | It's not ideal, but you can prompt it with an image of a
             | game frame, explain the objects and physics in text and let
             | it generate a few frames of gameplay as a substitute for
             | controller input as well as what it expects as an outcome.
             | I am not talking about real interactive gameplay.
             | 
             | I am just saying we have proof that it can understand
             | complex worlds and sets of rules, and then abide by them.
             | It doesn't know how to use a controller and it doesn't know
             | how to explore the game physics on its own, but those steps
             | are much easier to implement based on how coding agents are
             | able to iterate and explore solutions.
        
           | altairprime wrote:
           | Is _any_ model currently known to succeed in the scenario
           | that Carmack's inappropriate model failed?
        
             | outofpaper wrote:
             | No monolithic models but us ng hybrid approaches we've been
             | able to beet humans for some time now.
        
           | 317070 wrote:
           | What you're thinking of is much more like the Genie model
           | from DeepMind [0]. That one is like Veo, but interactive (but
           | not publically available)
           | 
           | [0] https://deepmind.google/discover/blog/genie-2-a-large-
           | scale-...
        
           | troupo wrote:
           | > I think veo3 proves that ai can generalize 2d and even 3d
           | games
           | 
           | It doesn't. And you said it yourself:
           | 
           | > generating a video under prompt constraints is basically
           | playing a game.
           | 
           | No. It's neither generating a game (that people can play) nor
           | is it playing a game (it's generating _a video_ ).
           | 
           | Since it's not a model of the world in any sense of the word,
           | there are issues with even the most basic object permanenece.
           | E.g. here's veo3 generating a GTA-style video. Oh look, the
           | car spins 360 and ends up on a completely different street
           | than the one it was driving down previously:
           | https://www.youtube.com/watch?v=ja2PVllZcsI
        
             | vladimirralev wrote:
             | It is still doing a great job for a few frames, you could
             | keep it more anchored to the state of the game if you
             | prompt it. Much like you can prompt coding agents to keep a
             | log of all decisions previously made. Permanenece is
             | excellent, it slips often but it mostly because it is not
             | grounded to specific game state by the prompt or by the
             | decision log.
        
           | pshc wrote:
           | I think we need a spatial/physics model handling movement and
           | tactics watched over by a high level strategy model (maybe an
           | LLM).
        
         | smokel wrote:
         | The subject you are referring to is most likely Meta-
         | Reinforcement Learning [1]. It is great that John Carmack is
         | looking into this, but it is not a new field of research.
         | 
         | [1] https://instadeep.com/2021/10/a-simple-introduction-to-
         | meta-...
        
         | t55 wrote:
         | this is what deepmind did 10 years ago lol
        
           | smokel wrote:
           | No, they (and many others before them) are genuinely trying
           | to improve on the original research.
           | 
           | The original paper "Playing Atari with Deep Reinforcement
           | Learning" (2013) from Deepmind describes how agents can play
           | Atari games, but these agents would have to be specifically
           | trained on every individual game using millions of frames. To
           | accomplish this, simulators were run in parallel, and much
           | faster than in real-time.
           | 
           | Also, additional trickery was added to extract a reward
           | signal from the games, and there is some minor cheating on
           | supplying inputs.
           | 
           | What Carmack (and others before him) is interested in, is
           | trying to learn in a real-life setting, similar to how humans
           | learn.
        
         | justanotherjoe wrote:
         | I don't get why people are so invested in framing it this way.
         | I'm sure there are ways to do the stated objective. John
         | Carmack isn't even an AI guy why is he suddenly the standard.
        
           | varjag wrote:
           | What in your opinion constitutes an AI guy?
        
           | qaq wrote:
           | Keen includes researchers like Richard Sutton, Joseph Modayil
           | etc. Also John has being doing it full time for almost 5
           | years now so given his background and aptitude for learning I
           | would imaging by this time he is more of an AI guy then a
           | fairly large percentage of AI PhDs.
        
           | refulgentis wrote:
           | Names >> all, and increasingly so.
           | 
           | One phenomena that bared this to me, in a substantive way,
           | was noticing an increasing # of reverent comments re: Geohot
           | in odd places here, that are just as quickly replied to by
           | people with a sense of how he _works_ , as opposed to the
           | _keywords he associates himself with_. But that only happens
           | here AFAIK.
           | 
           | Yapping, or, inducing people to yap about me, unfortunately,
           | is much more salient to my expected mindshare than the work I
           | do.
           | 
           | It's getting claustrophobic intellectually, as a result.
           | 
           | Example from the last week is the phrase "context
           | engineering" - Shopify CEO says he likes it better than
           | prompt engineering, Karpathy QTs to affirm, SimonW writes it
           | up as fait accompli. Now I have to rework my site to not use
           | "prompt engineering" and have a Take(tm) on "context
           | engineering". Because of a couple tweets + a blog
           | reverberating over 2-3 days.
           | 
           | Nothing against Carmack, or anyone else named, at all. i.e.
           | in the context engineering case, they're just sharing their
           | thoughts in realtime. (i.e. I don't wanna get rolled up into
           | a downvote brigade because it seems like I'm affirming the
           | loose assertion Carmack is "not an AI guy", or, that it seems
           | I'm criticizing anyone's conduct at all)
           | 
           | EDIT: The context engineering example was not in reference to
           | another post at the time of writing, now one is the top of
           | front page.
        
             | dvfjsdhgfv wrote:
             | > Now I have to rework my site to not use "prompt
             | engineering" and have a Take(tm) on "context engineering".
             | Because of a couple tweets + a blog reverberating over 2-3
             | days.
             | 
             | The difference here is that your example shows a trivial
             | statement and a change period of 3 days, whereas what
             | Carmack is doing is taking years.
        
               | refulgentis wrote:
               | Right. Nothing against Carmack. Grew up on the guy. I
               | haven't looked into, at all, any of the disputed stuff
               | and should actively proclaim I'm a yuge Carmack fanboy.
        
           | raincole wrote:
           | Because it "confirms" what they already believe in.
        
         | goatlover wrote:
         | I've wondered about the claim that the models played those
         | Atari/2D video games at superhuman levels, because I clearly
         | recall some humans achieving superhuman levels before models
         | were capable of it. Must have been superhuman compared to
         | average human player, not someone who spent an inordinate
         | amount of time mastering the game.
        
           | raincole wrote:
           | I'm not sure why you think so. AI outperforms humans in many
           | games already. Basically all the games we care to put money
           | to train a model.
           | 
           | AI has beat the best human players in Chess, Go, Mahjong,
           | Texas hold'em, Dota, Starcraft, etc. It would be really,
           | really surprising that some Atari game is the holy grail of
           | human performance that AI cannot beat.
        
             | tsimionescu wrote:
             | I recall this not being true at all for Dota and Starcraft.
             | I recall AlphaStar performed much better than the top non-
             | pro players, but it couldn't consistently beat the pro
             | players with the budget that Google was willing to spend,
             | and I believe the same was true of Dota II (and there they
             | were even playing a limited form of the game, with fewer
             | heroes and without the hero choice part, I believe).
        
         | fullshark wrote:
         | Just sounds like an example of overfitting. This is all machine
         | learning at its root.
        
         | hluska wrote:
         | When I finished my degree, the idea that a software system
         | could develop that level of expertise was relegated to science
         | fiction. It is an unbelievable human accomplishment to get to
         | that point and honestly, a bit of awe makes life more pleasant.
         | 
         | Less quality of life focused, I don't believe that the models
         | he uses for this research are capable of more. Is it really
         | that revealing?
        
         | Uehreka wrote:
         | These questions of whether the model is "really intelligent" or
         | whatever might be of interest to academics theorizing about
         | AGI, but to the vast swaths of people getting useful stuff out
         | of LLMs, it doesn't really matter. We don't care if the current
         | path leads to AGI. If the line stopped at Claude 4 I'd still
         | keep using it.
         | 
         | And like I get it, it's fun to complain about the obnoxious and
         | irrational AGI people. But the discussion about how people are
         | using these things in their everyday lives is way more
         | interesting.
        
       | Kapura wrote:
       | Here's an idea: make the AIs consistent at doing things computers
       | are good at. Here's an anecdote from a friend who's living in
       | Japan:
       | 
       | > i used chatgpt for the first time today and have some lite rage
       | if you wanna hear it. tldr it wasnt correct. i thought of one
       | simple task that it should be good at and it couldnt do that.
       | 
       | > (The kangxi radicals are neatly in order in unicode so you can
       | just ++ thru em. The cjks are not. I couldnt see any clear
       | mapping so i asked gpt to do it. Big mess i had to untangle
       | manually anyway it woulda been faster to look them up by hand
       | (theres 214))
       | 
       | > The big kicker was like, it gave me 213. And i was like, "why
       | is one missing?" Then i put it back in and said count how many
       | numbers are here and it said 214, and there just werent. Like
       | come on you SHOULD be able to count.
       | 
       | If you can make the language models actually interface with what
       | we've been able to do with computers for decades, i imagine many
       | paths open up.
        
         | cheevly wrote:
         | Many of us have solved this with internal tooling that has not
         | yet been shared or released to the public.
        
           | layer8 wrote:
           | This needs to be generalized however. For example, if you
           | present an AI with a drawing of some directed graph (a state
           | diagram, for example), it should be able to answer questions
           | based on the precise set of all possible paths in that graph,
           | without someone having to write tooling for diagram or graph
           | processing and traversal. Or, given a photo of a dropped box
           | of matches, an AI should be able to precisely count the
           | matches, as far as they are individually visible (which a
           | human could do by keeping a tally while coloring the
           | matches). There are probably better examples, these are off
           | the cuff.
           | 
           | There's an infinite repertoire of such tasks that combine AI
           | capabilities with traditional computer algorithms, and I
           | don't think we have a generic way of having AI autonomously
           | outsource whatever parts require precision in a reliable way.
        
             | snapcaster wrote:
             | What you're describing sounds like agentic tool usage. Have
             | you kept up with the latest developments on that? it's
             | already solved depending on how strict you define your
             | criteria above
        
               | layer8 wrote:
               | My understanding is that you need to provide and
               | configure task-specific tools. You can't combine the AI
               | with just a general-purpose computer and have the AI
               | figure out on its own how to make use of it to achieve
               | with reliability and precision whatever task it is given.
               | In other words, the current tool usage isn't general-
               | purpose in the way the LLM itself is, and also the LLM
               | doesn't reason about its own capabilities in order to
               | decide how to incorporate computer use to compensate for
               | its own weaknesses. Instead you have to tell the LLM what
               | it should apply the tooling for.
        
       | b0a04gl wrote:
       | if datasets are the new codebases ,then the real IP can be
       | dataset version control. how you fork ,diff ,merge and audit
       | datasets like code. every team says 'we trained on 10B tokens'
       | but what if we can answer 'which 5M tokens made reasoning
       | better', 'which 100k made it worse'. then we can start being
       | targeted leverage
        
       | krunck wrote:
       | Until these "AI" systems become always-on, always-thinking,
       | always-processing, progress is stuck. The current push button AI
       | - meaning it only processes when we prompt it - is not how the
       | kind of AI that everyone is dreaming of needs to function.
        
         | fwip wrote:
         | From a technical perspective, we can do that with a for loop.
         | 
         | The reason we don't do it isn't because it's hard, it's because
         | it yields worse results for increased cost.
        
       | nyrulez wrote:
       | Things haven't changed much in terms of truly new ideas since
       | electricity was invented. Everything else is just applications on
       | top of that. Make the electrons flow in a different way and you
       | get a different outcome.
        
         | nomel wrote:
         | > Make the electrons flow in a different way and you get a
         | different outcome.
         | 
         | This happens to be the basis of every aspect of our biology.
        
       | seydor wrote:
       | There are new ideas, people are finding new ways to build vision
       | models, which then are applied to language models and vice versa
       | (like diffusion).
       | 
       | The original idea of connectionism is that neural networks can
       | represent any function, which is the fundamental mathematical
       | fact. So we should be optimistic, neural nets will be able to do
       | anything. Which neural nets? So far people have stumbled on a few
       | productive architectures, but it appears to be more alchemy than
       | science. There is no reason why we should think there won't be
       | both new ideas and new data. Biology did it, humans will do it
       | too.
       | 
       | > we're engaged in a decentralized globalized exercise of
       | Science, where findings are shared openly
       | 
       | Maybe the findings are shared, if they make the Company look
       | good. But the methods are not anymore
        
       | lossolo wrote:
       | I wrote about it around a year ago here:
       | 
       | "There weren't really any advancements from around 2018. The
       | majority of the 'advancements' were in the amount of parameters,
       | training data, and its applications. What was the GPT-3 to
       | ChatGPT transition? It involved fine-tuning, using specifically
       | crafted training data. What changed from GPT-3 to GPT-4? It was
       | the increase in the number of parameters, improved training data,
       | and the addition of another modality. From GPT-4 to GPT-40? There
       | was more optimization and the introduction of a new modality. The
       | only thing left that could further improve models is to add one
       | more modality, which could be video or other sensory inputs,
       | along with some optimization and more parameters. We are
       | approaching diminishing returns." [1]
       | 
       | 10 months ago around o1 release:
       | 
       | "It's because there is nothing novel here from an architectural
       | point of view. Again, the secret sauce is only in the training
       | data. O1 seems like a variant of RLRF
       | https://arxiv.org/abs/2403.14238
       | 
       | Soon you will see similar models from competitors." [2]
       | 
       | Winter is coming.
       | 
       | 1. https://news.ycombinator.com/item?id=40624112
       | 
       | 2. https://news.ycombinator.com/item?id=41526039
        
         | tolerance wrote:
         | And when winter does arrive, then what? The technology is
         | slowing down while its popularity picks up. Can sparks fly out
         | of snow?
        
       | LarsDu88 wrote:
       | If datasets are what we are talking about, I'd like to bring
       | attention to the biological datasets out there that have yet to
       | be fully harnessed.
       | 
       | The ability to collect gene expression data at a tissue specific
       | level has only been invented and automated in the last 4-5 years
       | (see 10X Genomics Xenium, MERFISH). We've only recently figured
       | out how to collect this data at the scale of millions of cells. A
       | breakthrough on this front may be the next big area of
       | advancement.
        
       | Night_Thastus wrote:
       | Man I can't wait for this '''''AI''''' stuff to blow over. The
       | back and forth gets a bit exhausting.
        
       | alganet wrote:
       | Dataset? That's so 2000s.
       | 
       | Each crawl on the internet is actually a discrete chunk of a more
       | abstractly defined, constant influx of information streams. Let's
       | call them rivers (it's a big stream).
       | 
       | These rivers can dry up, present seasonal shifts, be poisoned, be
       | barraged.
       | 
       | It will never "get there" and gather enough data to "be done".
       | 
       | --
       | 
       | Regarding "new ideas in AI", I think there could be. But this
       | whole thing is not about AI anymore.
        
       | strangescript wrote:
       | If you work with model architecture and read papers, how could
       | not know there are a flood of new ideas? Only few yield
       | interesting results though.
       | 
       | I kind of wonder if libraries like pytorch have hurt experimental
       | development. So many basic concepts no one thinks about anymore
       | because they just use the out of the box solutions. And maybe
       | those solutions are great and those parts are "solved", but I am
       | not sure. How many models are using someone else's tokenizer, or
       | someone else's strapped on vision model just to check a box in
       | the model card?
        
         | kevmo314 wrote:
         | The people who don't think about such things probably wouldn't
         | develop experimentally sans pytorch either.
        
         | thenaturalist wrote:
         | That's been the very normal way of the human world.
         | 
         | When the foundation layer at a given moment doesn't yield an
         | ROI on intellectual exploration - say because you can
         | overcompensate with VC funded raw compute and make more progess
         | elsewhere -, few(er) will go there.
         | 
         | But inevitably, as other domains reach diminishing returns,
         | bright minds will take a look around where significant gains
         | for their effort can be found.
         | 
         | And so will the next generation of PyTorch or foundational
         | technologies evolve.
        
       | russellbeattie wrote:
       | Paradigm shifts are often just a conglomeration of previous ideas
       | with one little tweak that suddenly propels a technology ahead
       | 10x which opens up a whole new era.
       | 
       | The iPhone is a perfect example. There were smartphones with
       | cameras and web browsers before. But when the iPhone launched, it
       | added a capacitive touch screen that was so responsive there was
       | no need for a keyboard. The importance of that one technical
       | innovation can't be overstated.
       | 
       | Then the "new new thing" is followed by a period of years where
       | the innovation is refined, distributed, applied to different
       | contexts, and incrementally improved.
       | 
       | The iPhone launched in 2007 is not really that much different
       | than the one you have in your pocket today. The last 20 years has
       | been about improvements. The web browser before that is also
       | pretty much the same as the one you use today.
       | 
       | We've seen the same pattern happen with LLMs. The author of the
       | article points out that many of AI's breakthroughs have been
       | around since the 1990s. Sure! And the Internet was created in the
       | 1970s and mobile phones were invented in the 1980s. That doesn't
       | mean the web and smartphones weren't monumental technological
       | events. And it doesn't mean LLMs and AI innovation is somehow not
       | proceeding apace.
       | 
       | It's just how this stuff works.
        
       | lsy wrote:
       | This seems simplistic, tech and infrastructure play a huge part
       | here. A short and incomplete list of things that contributed:
       | 
       | - Moore's law petering out, steering hardware advancements
       | towards parallelism
       | 
       | - Fast-enough internet creating shift to processing and storage
       | in large server farms, enabling both high-cost training and
       | remote storage of large models
       | 
       | - Social media + search both enlisting consumers as data
       | producers, and necessitating the creation of armies of Mturkers
       | for content moderation + evaluation, later becoming available for
       | tagging and rlhf
       | 
       | - A long-term shift to a text-oriented society, beginning with
       | print capitalism and continuing through the rise of "knowledge
       | work" through to the migration of daily tasks (work, bill paying,
       | shopping) online, that allows a program that only produces text
       | to appear capable of doing many of the things a person does
       | 
       | We may have previously had the technical ideas in the 1990s but
       | we certainly didn't have the ripened infrastructure to put them
       | into practice. If we had the dataset to create an LLM in the 90s,
       | it still would have been astronomically cost-prohibitive to
       | train, both in CPU and human labor, and it wouldn't have as much
       | of an effect on society because you wouldn't be able to hook it
       | up to commerce or day-to-day activities (far fewer texts, emails,
       | ecommerce).
        
       | somebodythere wrote:
       | I don't know if it matters. Even if the best we can do is get
       | really good at interpolating between solutions to cognitive tasks
       | on the data manifold, the only economically useful human labor
       | left asymptotes toward frontier work; work that only a single-
       | digit percentage of people can actually perform.
        
       | tim333 wrote:
       | An interesting step forward, although an old idea we seem close
       | to is recursive self improvement. Get the AI to make a modified
       | version of itself to try to think better.
        
       | cadamsdotcom wrote:
       | What about actively obtained data - models seeking data, rather
       | than being fed. Human babies put things in their mouths, they try
       | to stand and fall over. They "do stuff" to learn what works.
       | Right now we're just telling models what works.
       | 
       | What about simulation: models can make 3D objects so why not give
       | them a physics simulator? We have amazing high fidelity (and low
       | cost!) game engines that would be a great building block.
       | 
       | What about rumination: behind every Cursor rule for example, is a
       | whole story of why a user added it. Why not take the rule, ask a
       | reasoning model to hypothesize about why that rule was created,
       | and add that rumination (along with the rule) to the training
       | data. Providing opportunities to reflect on the choices made by
       | their users might deepen any insights, squeezing more juice out
       | of the data.
        
         | kevmo314 wrote:
         | That would be reinforcement learning. The juice is quite hard
         | to squeeze.
        
           | cadamsdotcom wrote:
           | Agreed for most cases.
           | 
           | Each Cursor rule is a byproduct of tons of work and probably
           | contains lots that can be unpacked. Any research on that?
        
         | Centigonal wrote:
         | Simulation and embodied AI (putting the AI in a robotic arm or
         | a car so it can try stuff and gather information about the
         | results) are very actively being explored.
        
           | cadamsdotcom wrote:
           | What about at inference time? ie. in response to a query.
           | 
           | We let models write code and run it. Which gives them a high
           | chance of getting arithmetic right.
           | 
           | Solving the "crossing the river" problem by letting the model
           | create and run a simulation would give a pretty high chance
           | of getting it right.
        
       ___________________________________________________________________
       (page generated 2025-06-30 23:00 UTC)