[HN Gopher] There are no new ideas in AI only new datasets
___________________________________________________________________
There are no new ideas in AI only new datasets
Author : bilsbie
Score : 246 points
Date : 2025-06-30 14:43 UTC (8 hours ago)
(HTM) web link (blog.jxmo.io)
(TXT) w3m dump (blog.jxmo.io)
| ctoth wrote:
| Reinforcement learning from self-play/AlphaWhatever? Nah must
| just be datasets. :)
| grumpopotamus wrote:
| https://en.wikipedia.org/wiki/TD-Gammon
| Y_Y wrote:
| You raise a really interesting point. I'm sure it's just
| missed my notice, but I'm not familiar with any projects from
| antediluvian AI that have been resurrected to run on modern
| hardware and see where they'd really asymptote if they'd had
| the compute they deserved.
| FeepingCreature wrote:
| To be fair, usually those projects would need considerable
| work to be ported to modern multicore machines, let alone
| GPUs.
| genewitch wrote:
| can you name a couple so i can see how much work is
| involved? markov chains compile fast and respond fast,
| sure, and neural nets train pretty quick too, so i'm
| wondering where the cutoff is; expert systems?
| zahlman wrote:
| For that matter, https://gist.github.com/deebs67/8fbcf8b127a6
| 3e70d4a3f8590c97... .
| NitpickLawyer wrote:
| And architecture stuff like actually useful long context.
| Whatever they did with gemini 2.5 is miles ahead in long
| context useful results compared to the previous models. I'd be
| very surprised if gemini 2.5 is "just" gemini 1 w/ better data.
| shwouchk wrote:
| i dont know what all the hype is with gemini 2.5, at least
| the currently running instance. from my experience at least
| in conversation mode, it cannot remember my instructions to
| avoid apologies and similar platitudes from either the
| "persona", personal instructions, or from ine message to the
| next.
| nyrikki wrote:
| Big difference between a perfect information, completely
| specified zero sum game and the real world.
|
| As a simple analogy, read out the following sentence multiple
| times, stressing a different word each time.
|
| "I never said she stole my money"
|
| Note how the meaning changes and is often unique?
|
| That is a lens I to the frame problem and it's inverse, the
| specification problem.
|
| The above problem quickly becomes tower-complete, and recent
| studies suggest that RL is reinforcing or increasing the weight
| of existing patterns.
|
| As the open domain frame problem and similar challenges are
| equivalent to HALT, finding new ways to extract useful
| information will be important for generalization IMHO.
|
| Synthetic data is useful, but not a complete solution,
| especially for tower problems.
| genewitch wrote:
| The one we use is "I always pay my taxes"
|
| and as far as synthetic vs real data, there's a lot of gaps
| in LLM knowledge; and vision models suffer from "limited
| tags", which used to have workarounds with textual embeddings
| and the like, but those went by the wayside as LoRA,
| controlnet, etc. appeared.
|
| There's people who are fairly well known that LLMs have no
| idea about. There's things in books i own that the AI
| confidently tells me are either wrong or don't exist.
|
| That one page about compressing 1 gig wikipedia as small as
| possible implicitly and explicitly states that AI is
| "basically compression" - and if the data isn't there, it's
| not in the compressed set (weights) either.
|
| And i'll reply to another comment here, about "24/7 rolling/
| for looped" AI - i thought of doing this when i first found
| out about LLMs, but context windows are the enemy, here. I
| have a couple of ideas about how to have a continuous AI, but
| i don't have the capital to test it out.
| kogus wrote:
| To be fair, if you imagine a system that successfully reproduced
| human intelligence, then 'changing datasets' would probably be a
| fair summary of what it would take to have different models.
| After all, our own memories, training, education, background, etc
| are a very large component of our own problem solving abilities.
| jschveibinz wrote:
| I will respectfully disagree. All "new" ideas come from old
| ideas. AI is a tool to access old ideas with speed and with new
| perspectives that hasn't been available up until now.
|
| Innovation is in the cracks: recognition of holes, intersections,
| tangents, etc. on old ideas. It has bent said that innovation is
| done on the shoulders of giants.
|
| So AI can be an express elevator up to an army of giant's
| shoulders? It all depends on how you use the tools.
| gametorch wrote:
| Exactly!
|
| Can you imagine if we applied the same gatekeeping logic to
| science?
|
| Imagine you weren't allowed to use someone else's scientific
| work or any derivative of it.
|
| We would make no progress.
|
| The only legitimate defense I have ever seen here revolves
| around IP and copyright infringement, which I couldn't care
| less about.
| alfalfasprout wrote:
| Access old ideas? Yes. With new perspectives? Not necessarily.
| An LLM may be able to assist in interpreting data with new
| perspectives but in practice they're still fairly bad at
| greenfield work.
|
| As with most things, the truth lies somewhere in the middle.
| LLMs can be helpful as a way of accelerating certain kinds and
| certain aspects of research but not others.
| stevep98 wrote:
| > Access old ideas? Yes. With new perspectives?
|
| I wonder if we can mine patent databases for old ideas that
| never worked out in the past, but now are more useful.
| Perhaps due to modern machining or newer materials or just
| new applications of the idea.
| bcrosby95 wrote:
| The article is discussing working in AI innovation vs focusing
| on getting more and better data. And while there have been key
| breakthroughs in new ideas, one of the best ways to increase
| the performance of these systems is getting more and better
| data. And how many people think data is the primary avenue to
| improvement.
|
| It reminds me of an AI talk a few decades ago, about how the
| cycle goes: more data -> more layers -> repeat...
|
| Anyways, I'm not sure how your comment relates to these two
| avenues of improvement.
| jjtheblunt wrote:
| > I will respectfully disagree. All "new" ideas come from old
| ideas.
|
| The insight into the structure of the benzene ring famously
| came in a dream, hadn't been seen before, but was imagined as a
| snake bitings its own tail.
| troupo wrote:
| And as we all know, it came in a dream to a complete novice
| in chemistry with zero knowledge of any old ideas in
| chemistry: https://en.wikipedia.org/wiki/August_Kekul%C3%A9
|
| --- start quote ---
|
| The empirical formula for benzene had been long known, but
| its highly unsaturated structure was a challenge to
| determine. Archibald Scott Couper in 1858 and Joseph
| Loschmidt in 1861 suggested possible structures that
| contained multiple double bonds or multiple rings, but the
| study of aromatic compounds was in its earliest years, and
| too little evidence was then available to help chemists
| decide on any particular structure.
|
| More evidence was available by 1865, especially regarding the
| relationships of aromatic isomers.
|
| [ Kekule claimed to have had the dream in 1865 ]
|
| --- end quote ---
|
| The dream claim came from Kekule himself _25 years after his
| proposal_ that he had to modify 10 years after he proposed
| it.
| baxtr wrote:
| Imagine a human had read every book/publication in every field
| of knowledge that mankind has ever produced AND couldn't come
| up with anything entirely new. Hard to imagine.
| tippytippytango wrote:
| Sometimes we get confused by the difference between technological
| and scientific progress. When science makes progress it unlocks
| new S-curves that progress at an incredible pace until you get
| into the diminishing returns region. People complain of slowing
| progress but it was always slow, you just didn't notice that
| nothing new was happening during the exponential take off of the
| S-curve, just furious optimization.
| baxtr wrote:
| Fully agree.
|
| And at the same time I have noticed that people don't
| understand the difference between an S-curve and an exponential
| function. They can look almost identical at certain intervals.
| ks2048 wrote:
| The latest LLMs are simply multiplying and adding various numbers
| together... Babylonians were doing that 4000 years ago.
| bobson381 wrote:
| You are just a lot of interactions of waves. All meaning is
| assigned. I prefer to think of this like the Goedel generator
| that found new formal expressions for the Principia - because
| we have a way of indexing concept-space, there's no telling
| what we might find in the gaps.
| thenaturalist wrote:
| But on clay tables, not in semi-conductive electron prisons
| separated by one-atom-thick walls.
|
| Slight difference to those methods, wouldn't you agree?
| voxleone wrote:
| I'd say with confidence: we're living in the early days. AI has
| made jaw-dropping progress in two major domains: language and
| vision. With large language models (LLMs) like GPT-4 and Claude,
| and vision models like CLIP and DALL*E, we've seen machines that
| can generate poetry, write code, describe photos, and even hold
| eerily humanlike conversations.
|
| But as impressive as this is, it's easy to lose sight of the
| bigger picture: we've only scratched the surface of what
| artificial intelligence could be -- because we've only scaled two
| modalities: text and images.
|
| That's like saying we've modeled human intelligence by mastering
| reading and eyesight, while ignoring touch, taste, smell, motion,
| memory, emotion, and everything else that makes our cognition
| rich, embodied, and contextual.
|
| Human intelligence is multimodal. We make sense of the world
| through:
|
| Touch (the texture of a surface, the feedback of pressure, the
| warmth of skin0; Smell and taste (deeply tied to memory, danger,
| pleasure, and even creativity); Proprioception (the sense of
| where your body is in space -- how you move and balance);
| Emotional and internal states (hunger, pain, comfort, fear,
| motivation).
|
| None of these are captured by current LLMs or vision
| transformers. Not even close. And yet, our cognitive lives depend
| on them.
|
| Language and vision are just the beginning -- the parts we were
| able to digitize first - not necessarily the most central to
| intelligence.
|
| The real frontier of AI lies in the messy, rich, sensory world
| where people live. We'll need new hardware (sensors), new data
| representations (beyond tokens), and new ways to train models
| that grow understanding from experience, not just patterns.
| Swizec wrote:
| > The real frontier of AI lies in the messy, rich, sensory
| world where people live. We'll need new hardware (sensors), new
| data representations (beyond tokens), and new ways to train
| models that grow understanding from experience, not just
| patterns.
|
| Like Dr. Who said: DALEKs aren't brains in a machine, they
| _are_ the machine!
|
| Same is true for humans. We really are the whole body, we're
| not just driving it around.
| nomel wrote:
| There are many people who mentally developed while paralyzed
| that literally drive around their bodies via motorized
| wheelchair. I don't think there's any evidence that a brain
| couldn't exist or develop in a jar, given only the inputs
| modern AI now has (text, video, audio).
| Swizec wrote:
| > any evidence that a brain couldn't exist or develop in a
| jar
|
| The brain _could_. Of course it could. It 's just a signals
| processing machine.
|
| But would it be missing anything we consider core to the
| way humans think? Would it struggle with parts of
| cognition?
|
| For example: experiments were done with cats growing up in
| environments with vertical lines only. They were then put
| in a normal room and had a hard time understanding flat
| surfaces.
|
| https://computervisionblog.wordpress.com/2013/06/01/cats-
| and...
| nomel wrote:
| This isn't remotely a hypothetical, so I imagine there
| are some examples out there, especially from back when
| polio was a problem. Although, for practical reasons,
| they might have had limited _exposure to novelty_ , which
| could have negative consequences.
| skydhash wrote:
| Yeah, but are there new ideas or only wishes?
| jdgoesmarching wrote:
| It's pure magical thinking that would be correctly dismissed
| if it didn't have AI attached to it. Imagine talking this way
| about anything else.
|
| "We've barely scratched the surface with Rust, so far we're
| only focused on code and haven't even explored building
| mansions or ending world hunger"
| tim333 wrote:
| AI has some real possibilities of building mansions and
| ending hunger in a way that Rust doesn't.
| dinfinity wrote:
| > Language and vision are just the beginning -- the parts we
| were able to digitize first - not necessarily the most central
| to intelligence.
|
| I respectfully disagree. Touch gives pretty cool skills, but
| language, video and audio are all that are needed _for all
| online interactions_. We use touch for typing and pointing, but
| that is only because we don 't have a more efficient and
| effective interface.
|
| Now I'm not saying that all other senses are uninteresting.
| Integrating touch, extensive proprioception, and olfaction is
| going to unlock a lot of 'real world' behavior, but your
| comment was specifically about intelligence.
|
| Compare humans to apes and other animals and the thing that
| sets us apart is definitely not in the 'remaining' senses, but
| firmly in the realm of audio, video and language.
| voxleone wrote:
| > Language and vision are just the beginning -- the parts we
| were able to digitize first - not necessarily the most
| central to intelligence.
|
| I probably made a mistake when i asserted that -- should have
| thought it over. Vision is evolutionarily older and more
| "primitive", while language is uniquely human [or maybe, more
| broadly, primate, cetacean, cephalopod, avian...] symbolic,
| and abstract -- arguably a different order of cognition
| altogether. But i maintain that each and every sense is
| important as far as human cognition -- and its replication --
| is concerned.
| wizzwizz4 wrote:
| People who lack one of those senses, or even two of them,
| tend to do just fine.
| chasd00 wrote:
| > Language and vision are just the beginning..
|
| Based on the architectures we have they may also be the ending.
| There's been a lot of news in the past couple years about LLMs
| but has there been any breakthroughs making headlines anywhere
| else in AI?
| dragonwriter wrote:
| > There's been a lot of news in the past couple years about
| LLMs but has there been any breakthroughs making headlines
| anywhere else in AI?
|
| Yeah, lots of stuff tied to robotics, for instance; this
| overlaps with vision, but the advances go beyond vision.
|
| Audio has seen quite a bit. And I imagine there is stuff
| happening in niche areas that just aren't as publicly
| interesting as language, vision/imagery, audio, and robotics.
| nomel wrote:
| Two Nobel prizes in chemistry:
| https://www.nature.com/articles/s41746-024-01345-9
| edanm wrote:
| Sure. In physics, math, chemistry, biology. To name a few.
| mr_world wrote:
| Organic adaption and persistence of memory I would say are the
| two major advancements that need to happen.
|
| Human neural networks are dynamic, they change and rearrange,
| grow and sever. An LLM is fixed and relies on context, if you
| give it the right answer it won't "learn" that is the correct
| answer unless it is fed back into the system and trained over
| months. What if it's only the right answer for a limited period
| of time?
|
| To build an intelligent machine, it must be able train itself
| in real time and remember.
| specialist wrote:
| Yes and: and forget.
| anon291 wrote:
| I mean there's no new ideas for saas but just new applications
| and that worked out pretty well
| luppy47474 wrote:
| Hmmm
| rar00 wrote:
| disagree, there are a few organisations exploring novel paths.
| It's just that throwing new data at an "old" algorithm is much
| easier and has been a winning strategy. And, also, there's no
| incentive for a private org to advertise a new idea that seems to
| be working (mine's a notable exception :D).
| tantalor wrote:
| > If data is the only thing that matters, why are 95% of people
| working on new methods?
|
| Because new methods unlock access to new datasets.
|
| Edit: Oh I see this was a rhetorical question answered in the
| next paragraph. D'oh
| piinbinary wrote:
| AI training is currently a process of making the AI remember the
| dataset. It doesn't involve the AI thinking about the dataset and
| drawing (and remembering) conclusions.
|
| It can probably remember more facts about a topic than a PhD in
| that topic, but the PhD will be better at thinking about that
| topic.
| jayd16 wrote:
| Its a bit more complex than that. Its more about baking out the
| dataset into heuristics that a machine can use to match a
| satisfying result to an input. Sometimes these heuristics are
| surprising to a human and can solve a problem in a novel way.
|
| "Thinking" is too broad a term to apply usefully but I would
| say its pretty clear we are not close to AGI.
| nkrisc wrote:
| > It can probably remember more facts about a topic than a PhD
| in that topic
|
| So can a notebook.
| tantalor wrote:
| Maybe that's why PhDs keep the textbooks they use at hand, so
| they don't have to remember everything.
|
| Why should the model need to memorize facts we already have
| written down somewhere?
| EternalFury wrote:
| What John Carmack is exploring is pretty revealing. Train models
| to play 2D video games to a superhuman level, then ask them to
| play a level they have not seen before or another 2D video game
| they have not seen before. The transfer function is negative. So,
| in my definition, no intelligence has been developed, only
| expertise in a narrow set of tasks.
|
| It's apparently much easier to scare the masses with visions of
| ASI, than to build a general intelligence that can pick up a new
| 2D video game faster than a human being.
| ferguess_k wrote:
| Can you please explain "the transfer function is negative"?
|
| I'm wondering whether one has tested with the same model but on
| two situations:
|
| 1) Bring it to superhuman level in game A and then present game
| B, which is similar to A, to it.
|
| 2) Present B to it without presenting A.
|
| If 1) is not significantly better than 2) then maybe it is not
| carrying much "knowledge", or maybe we simply did not program
| it correctly.
| tough wrote:
| I think the problem is we train models to pattern match, not
| to learn or reason about world models
| NBJack wrote:
| In other words, they learn the game, not how to _play
| games_.
| fsmv wrote:
| They memorize the answers not the process to arrive at
| answers
| IshKebab wrote:
| This has been disproven so many times... They clearly do
| both. You can trivially prove this yourself.
| 0xWTF wrote:
| > You can trivially prove this yourself.
|
| Given the long list of dead philosophers of mind, if you
| have a trivial proof, would you mind providing a link?
| pdabbadabba wrote:
| It's really easy: go to Claude and ask it a novel
| question. It will generally reason its way to a perfectly
| good answer even if there is no direct example of it in
| the training data.
| MichaelZuo wrote:
| How do you know it's a novel question?
| IshKebab wrote:
| It's not exactly difficult to come up with a question
| that's so unusual the chance of it being in the training
| set is effectively zero.
| troupo wrote:
| And as any programmer will tell you: they immediately
| devolve into "hallucinating" answers, not trying to
| actually reason about the world. Because that's what they
| do: they create statistically plausible answers even if
| those answers are complete nonsense.
| MichaelZuo wrote:
| Can you provide some examples of these genuinely unique
| questions?
| keerthiko wrote:
| When LLM's come up with answers to questions that aren't
| directly exampled in the training data, that's not proof
| at all that it reasoned its way there -- it can very much
| still be pattern matching without insight from the actual
| code execution of the answer generation.
|
| If we were taking a walk and you asked me for an
| explanation for a mathematical concept I have not
| actually studied, I am fully capable of hazarding a
| casual guess based on the other topics I _have_ studied
| within seconds. This is the default approach of an LLM,
| except with much greater breadth and recall of studied
| topics than I, as a human, have.
|
| This would be very different than if we sat down at a
| library and I applied the various concepts and theorems I
| already knew to make inferences, built upon them, and
| then derived an understanding based on reasoning of the
| steps I took (often after backtracking from several
| reasoning dead ends) before providing the explanation.
|
| If you ask an LLM to explain their reasoning, it's
| unclear whether it just guessed the explanation and
| reasoning too, or if that was actually the set of steps
| it took to get to the first answer they gave you. This is
| why LLMs are able to correct themselves after claiming
| strawberry has 2 rs, but when providing (guessing again)
| their explanations they make more "relevant" guesses.
| IshKebab wrote:
| LLMs clearly don't reason in the same way that humans or
| SMT solvers do. That doesn't mean they aren't reasoning.
| IshKebab wrote:
| Just go and ask ChatGPT or Claude something that can't
| possibly be in its training set. Make something up. If it
| is _only_ memorising answers then it will be impossible
| for it to get the correct result.
|
| A simple nonsense programming task would suffice. For
| example "write a Python function to erase every character
| from a string unless either of its adjacent characters
| are also adjacent to it in the alphabet. The string only
| contains lowercase a-z"
|
| That task isn't anywhere in its training set so they
| can't memorise the answer. But I bet ChatGPT and Claude
| can still do it.
|
| Honestly this is sooooo obvious to anyone that has used
| these tools, it's really insane that people are still
| parroting (heh) the "it just memorises" line.
| troupo wrote:
| People who say that LLMs memorize stuff are just as
| clueless who assume that there's any reasoning happening.
|
| They generate statistically plausible answers (to
| simplify the answer) based on the training set and
| weights they have.
| imiric wrote:
| LLMs don't "memorize" concepts like humans do. They
| generate output based on token patterns in their training
| data. So instead of having to be trained on every
| possible problem, they can still generate output that
| solves it by referencing the most probable combination of
| tokens for the specified input tokens. To humans this
| seems like they're truly solving novel problems, but it's
| merely a trick of statistics. These tools can reference
| and generate patterns that no human ever could. This is
| what makes them useful and powerful, but I would argue
| not intelligent.
| EternalFury wrote:
| They learn the value of specific actions in specific
| contexts based on the rewards they received during their
| play time. Specific actions and specific contexts are not
| transferable for various reasons. John quoted that
| varying frame rates and variable latency between action
| and effect really confuse the models.
| nightpool wrote:
| Okay, so fuzz the frame rate and latency? That feels very
| easy to fix.
| IshKebab wrote:
| Well yeah... If you only ever played one game in your
| life you would probably be pretty shit at other games
| too. This does not seem very revealing to me.
| trainerxr50 wrote:
| I am decent at chess but barely know how the pieces in Go
| move.
|
| Of course, this because I have spent a lot of time
| TRAINING to play chess and basically none training to
| play go.
|
| I am good on guitar because I started training young but
| can't play the flute or piano to save my life.
|
| Most complicated skills have basically no transfer or
| carry over other than knowing how to train on a new
| skill.
| beefnugs wrote:
| yeahhhh why isnt there a training structure where you
| play 5000 games, and the reward function is based on
| doing well in all of them?
|
| I guess its a totaly different level of control: instead
| of immediately choosing a certain button to press, you
| need to set longer term goals. "press whatever sequence
| over this time i need to do to end up closer to this
| result"
|
| There is some kind of nested multidimensional thing to
| train on here instead of immediate limited choices
| antisthenes wrote:
| Where do you draw the line between pattern matching and
| reasoning about world models?
|
| A lot of intelligence is just pattern matching and being
| quick about it.
| singron wrote:
| I think this is clearly a case of over fitting and failure
| to generalize, which are really well understood concepts.
| We don't have to philosophize about what pattern matching
| really means.
| ferguess_k wrote:
| I kinda think I'm more or less the same...OK maybe we have
| different definitions of "pattern matching".
| veqz wrote:
| It's Plato's cave:
|
| We train the models on what are basically shadows, and
| they learn how to pattern match the shadows.
|
| But the shadows are only depictions of the real world,
| and the LLMs never learn about that.
| EternalFury wrote:
| 100%
| YokoZar wrote:
| I wonder if this is a case of overfitting from allowing the
| model to grow too large, and if you might cajole it into
| learning more generic heuristics by putting some constraints on
| it.
|
| It sounds like the "best" AI without constraint would just be
| something like a replay of a record speedrun rather than a
| smaller set of heuristics of getting through a game, though the
| latter is clearly much more important with unseen content.
| moralestapia wrote:
| I wonder how much performance decreases if they just use
| slightly modified versions of the same game. Like a different
| color scheme, or a couple different sprites.
| vladimirralev wrote:
| He is not using appropriate models for this conclusion and
| neither is he using state of the art models in this research
| and moreover he doesn't have an expensive foundational model to
| build upon for 2d games. It's just a fun project.
|
| A serious attempt at video/vision would involve some
| probabilistic latent space that can be noised in ways that make
| sense for games in general. I think veo3 proves that ai can
| generalize 2d and even 3d games, generating a video under
| prompt constraints is basically playing a game. I think you
| could prompt veo3 to play any game for a few seconds and it
| will generally make sense even though it is not fine tuned.
| sigmoid10 wrote:
| Veo3's world model is still pretty limited. That becomes
| obvious very fast once you prompt out of distribution video
| content (i.e. stuff that you are unlikely to find on
| youtube). It's extremely good at creating photorealistic
| surfaces and lighting. It even has some reasonably solid
| understanding of fluid dynamics for simulating water. But for
| complex human behaviour (in particular certain motions) it
| simply lacks the training data. Although that's not really a
| fault of the model and I'm pretty sure there will be a way to
| overcome this as well. Maybe some kind of physics based
| simulation as supplement training data.
| keerthiko wrote:
| > generating a video under prompt constraints is basically
| playing a game
|
| Besides static puzzles (like a maze or jigsaw) I don't
| believe this analogy holds? A model working with prompt
| constraints that aren't evolving or being added over the
| course of "navigating" the generation of the model's output
| means it needs to process 0 new information that it didn't
| come up with itself -- playing a game is different from other
| generation because it's primarily about reacting to input you
| didn't know the precise timing/spatial details of, but can
| learn that they come within a known set of higher order
| rules. Obviously the more finite/deterministic/predictably
| probabilistic the video game's solution space, the more it
| can be inferred from the initial state, aka reduce to the
| same type of problem as generating a video from a prompt),
| which is why models are still able to play video games. But
| as GP pointed out, transfer function negative in such cases
| -- the overarching rules are not predictable enough across
| disparate genres.
|
| > I think you could prompt veo3 to play any game for a few
| seconds
|
| I'm curious what your threshold for what constitutes "play
| any game" is in this claim? If I wrote a script that maps
| button combinations to average pixel color of a portion of
| the screen buffer, by what metric(s) would veo3 be "playing"
| the game more or better than that script "for a few seconds"?
|
| edit: removing knee-jerk reaction language
| hluska wrote:
| Nothing the parent said makes this level of aggression
| necessary or even tasteful. This isn't the Colosseum - we
| can learn from each other and consider different points of
| view without acting like savages.
| keerthiko wrote:
| fair, and I edited my choice of words, but if you're
| reading that much aggression from my initial comment
| (which contains topical discussion) to say what you did,
| you must find the internet a far more savage place than
| it really is :/
| hluska wrote:
| Thanks for your concern.
|
| The internet is fine - your comment was way beyond what
| any reasonable person should tolerate. You don't need to
| overcompensate for your lack of education in such
| aggressive ways. Most people don't care.
|
| Good job on the edit though and have an excellent day.
| vladimirralev wrote:
| It's not ideal, but you can prompt it with an image of a
| game frame, explain the objects and physics in text and let
| it generate a few frames of gameplay as a substitute for
| controller input as well as what it expects as an outcome.
| I am not talking about real interactive gameplay.
|
| I am just saying we have proof that it can understand
| complex worlds and sets of rules, and then abide by them.
| It doesn't know how to use a controller and it doesn't know
| how to explore the game physics on its own, but those steps
| are much easier to implement based on how coding agents are
| able to iterate and explore solutions.
| altairprime wrote:
| Is _any_ model currently known to succeed in the scenario
| that Carmack's inappropriate model failed?
| outofpaper wrote:
| No monolithic models but us ng hybrid approaches we've been
| able to beet humans for some time now.
| 317070 wrote:
| What you're thinking of is much more like the Genie model
| from DeepMind [0]. That one is like Veo, but interactive (but
| not publically available)
|
| [0] https://deepmind.google/discover/blog/genie-2-a-large-
| scale-...
| troupo wrote:
| > I think veo3 proves that ai can generalize 2d and even 3d
| games
|
| It doesn't. And you said it yourself:
|
| > generating a video under prompt constraints is basically
| playing a game.
|
| No. It's neither generating a game (that people can play) nor
| is it playing a game (it's generating _a video_ ).
|
| Since it's not a model of the world in any sense of the word,
| there are issues with even the most basic object permanenece.
| E.g. here's veo3 generating a GTA-style video. Oh look, the
| car spins 360 and ends up on a completely different street
| than the one it was driving down previously:
| https://www.youtube.com/watch?v=ja2PVllZcsI
| vladimirralev wrote:
| It is still doing a great job for a few frames, you could
| keep it more anchored to the state of the game if you
| prompt it. Much like you can prompt coding agents to keep a
| log of all decisions previously made. Permanenece is
| excellent, it slips often but it mostly because it is not
| grounded to specific game state by the prompt or by the
| decision log.
| pshc wrote:
| I think we need a spatial/physics model handling movement and
| tactics watched over by a high level strategy model (maybe an
| LLM).
| smokel wrote:
| The subject you are referring to is most likely Meta-
| Reinforcement Learning [1]. It is great that John Carmack is
| looking into this, but it is not a new field of research.
|
| [1] https://instadeep.com/2021/10/a-simple-introduction-to-
| meta-...
| t55 wrote:
| this is what deepmind did 10 years ago lol
| smokel wrote:
| No, they (and many others before them) are genuinely trying
| to improve on the original research.
|
| The original paper "Playing Atari with Deep Reinforcement
| Learning" (2013) from Deepmind describes how agents can play
| Atari games, but these agents would have to be specifically
| trained on every individual game using millions of frames. To
| accomplish this, simulators were run in parallel, and much
| faster than in real-time.
|
| Also, additional trickery was added to extract a reward
| signal from the games, and there is some minor cheating on
| supplying inputs.
|
| What Carmack (and others before him) is interested in, is
| trying to learn in a real-life setting, similar to how humans
| learn.
| justanotherjoe wrote:
| I don't get why people are so invested in framing it this way.
| I'm sure there are ways to do the stated objective. John
| Carmack isn't even an AI guy why is he suddenly the standard.
| varjag wrote:
| What in your opinion constitutes an AI guy?
| qaq wrote:
| Keen includes researchers like Richard Sutton, Joseph Modayil
| etc. Also John has being doing it full time for almost 5
| years now so given his background and aptitude for learning I
| would imaging by this time he is more of an AI guy then a
| fairly large percentage of AI PhDs.
| refulgentis wrote:
| Names >> all, and increasingly so.
|
| One phenomena that bared this to me, in a substantive way,
| was noticing an increasing # of reverent comments re: Geohot
| in odd places here, that are just as quickly replied to by
| people with a sense of how he _works_ , as opposed to the
| _keywords he associates himself with_. But that only happens
| here AFAIK.
|
| Yapping, or, inducing people to yap about me, unfortunately,
| is much more salient to my expected mindshare than the work I
| do.
|
| It's getting claustrophobic intellectually, as a result.
|
| Example from the last week is the phrase "context
| engineering" - Shopify CEO says he likes it better than
| prompt engineering, Karpathy QTs to affirm, SimonW writes it
| up as fait accompli. Now I have to rework my site to not use
| "prompt engineering" and have a Take(tm) on "context
| engineering". Because of a couple tweets + a blog
| reverberating over 2-3 days.
|
| Nothing against Carmack, or anyone else named, at all. i.e.
| in the context engineering case, they're just sharing their
| thoughts in realtime. (i.e. I don't wanna get rolled up into
| a downvote brigade because it seems like I'm affirming the
| loose assertion Carmack is "not an AI guy", or, that it seems
| I'm criticizing anyone's conduct at all)
|
| EDIT: The context engineering example was not in reference to
| another post at the time of writing, now one is the top of
| front page.
| dvfjsdhgfv wrote:
| > Now I have to rework my site to not use "prompt
| engineering" and have a Take(tm) on "context engineering".
| Because of a couple tweets + a blog reverberating over 2-3
| days.
|
| The difference here is that your example shows a trivial
| statement and a change period of 3 days, whereas what
| Carmack is doing is taking years.
| refulgentis wrote:
| Right. Nothing against Carmack. Grew up on the guy. I
| haven't looked into, at all, any of the disputed stuff
| and should actively proclaim I'm a yuge Carmack fanboy.
| raincole wrote:
| Because it "confirms" what they already believe in.
| goatlover wrote:
| I've wondered about the claim that the models played those
| Atari/2D video games at superhuman levels, because I clearly
| recall some humans achieving superhuman levels before models
| were capable of it. Must have been superhuman compared to
| average human player, not someone who spent an inordinate
| amount of time mastering the game.
| raincole wrote:
| I'm not sure why you think so. AI outperforms humans in many
| games already. Basically all the games we care to put money
| to train a model.
|
| AI has beat the best human players in Chess, Go, Mahjong,
| Texas hold'em, Dota, Starcraft, etc. It would be really,
| really surprising that some Atari game is the holy grail of
| human performance that AI cannot beat.
| tsimionescu wrote:
| I recall this not being true at all for Dota and Starcraft.
| I recall AlphaStar performed much better than the top non-
| pro players, but it couldn't consistently beat the pro
| players with the budget that Google was willing to spend,
| and I believe the same was true of Dota II (and there they
| were even playing a limited form of the game, with fewer
| heroes and without the hero choice part, I believe).
| fullshark wrote:
| Just sounds like an example of overfitting. This is all machine
| learning at its root.
| hluska wrote:
| When I finished my degree, the idea that a software system
| could develop that level of expertise was relegated to science
| fiction. It is an unbelievable human accomplishment to get to
| that point and honestly, a bit of awe makes life more pleasant.
|
| Less quality of life focused, I don't believe that the models
| he uses for this research are capable of more. Is it really
| that revealing?
| Uehreka wrote:
| These questions of whether the model is "really intelligent" or
| whatever might be of interest to academics theorizing about
| AGI, but to the vast swaths of people getting useful stuff out
| of LLMs, it doesn't really matter. We don't care if the current
| path leads to AGI. If the line stopped at Claude 4 I'd still
| keep using it.
|
| And like I get it, it's fun to complain about the obnoxious and
| irrational AGI people. But the discussion about how people are
| using these things in their everyday lives is way more
| interesting.
| Kapura wrote:
| Here's an idea: make the AIs consistent at doing things computers
| are good at. Here's an anecdote from a friend who's living in
| Japan:
|
| > i used chatgpt for the first time today and have some lite rage
| if you wanna hear it. tldr it wasnt correct. i thought of one
| simple task that it should be good at and it couldnt do that.
|
| > (The kangxi radicals are neatly in order in unicode so you can
| just ++ thru em. The cjks are not. I couldnt see any clear
| mapping so i asked gpt to do it. Big mess i had to untangle
| manually anyway it woulda been faster to look them up by hand
| (theres 214))
|
| > The big kicker was like, it gave me 213. And i was like, "why
| is one missing?" Then i put it back in and said count how many
| numbers are here and it said 214, and there just werent. Like
| come on you SHOULD be able to count.
|
| If you can make the language models actually interface with what
| we've been able to do with computers for decades, i imagine many
| paths open up.
| cheevly wrote:
| Many of us have solved this with internal tooling that has not
| yet been shared or released to the public.
| layer8 wrote:
| This needs to be generalized however. For example, if you
| present an AI with a drawing of some directed graph (a state
| diagram, for example), it should be able to answer questions
| based on the precise set of all possible paths in that graph,
| without someone having to write tooling for diagram or graph
| processing and traversal. Or, given a photo of a dropped box
| of matches, an AI should be able to precisely count the
| matches, as far as they are individually visible (which a
| human could do by keeping a tally while coloring the
| matches). There are probably better examples, these are off
| the cuff.
|
| There's an infinite repertoire of such tasks that combine AI
| capabilities with traditional computer algorithms, and I
| don't think we have a generic way of having AI autonomously
| outsource whatever parts require precision in a reliable way.
| snapcaster wrote:
| What you're describing sounds like agentic tool usage. Have
| you kept up with the latest developments on that? it's
| already solved depending on how strict you define your
| criteria above
| layer8 wrote:
| My understanding is that you need to provide and
| configure task-specific tools. You can't combine the AI
| with just a general-purpose computer and have the AI
| figure out on its own how to make use of it to achieve
| with reliability and precision whatever task it is given.
| In other words, the current tool usage isn't general-
| purpose in the way the LLM itself is, and also the LLM
| doesn't reason about its own capabilities in order to
| decide how to incorporate computer use to compensate for
| its own weaknesses. Instead you have to tell the LLM what
| it should apply the tooling for.
| b0a04gl wrote:
| if datasets are the new codebases ,then the real IP can be
| dataset version control. how you fork ,diff ,merge and audit
| datasets like code. every team says 'we trained on 10B tokens'
| but what if we can answer 'which 5M tokens made reasoning
| better', 'which 100k made it worse'. then we can start being
| targeted leverage
| krunck wrote:
| Until these "AI" systems become always-on, always-thinking,
| always-processing, progress is stuck. The current push button AI
| - meaning it only processes when we prompt it - is not how the
| kind of AI that everyone is dreaming of needs to function.
| fwip wrote:
| From a technical perspective, we can do that with a for loop.
|
| The reason we don't do it isn't because it's hard, it's because
| it yields worse results for increased cost.
| nyrulez wrote:
| Things haven't changed much in terms of truly new ideas since
| electricity was invented. Everything else is just applications on
| top of that. Make the electrons flow in a different way and you
| get a different outcome.
| nomel wrote:
| > Make the electrons flow in a different way and you get a
| different outcome.
|
| This happens to be the basis of every aspect of our biology.
| seydor wrote:
| There are new ideas, people are finding new ways to build vision
| models, which then are applied to language models and vice versa
| (like diffusion).
|
| The original idea of connectionism is that neural networks can
| represent any function, which is the fundamental mathematical
| fact. So we should be optimistic, neural nets will be able to do
| anything. Which neural nets? So far people have stumbled on a few
| productive architectures, but it appears to be more alchemy than
| science. There is no reason why we should think there won't be
| both new ideas and new data. Biology did it, humans will do it
| too.
|
| > we're engaged in a decentralized globalized exercise of
| Science, where findings are shared openly
|
| Maybe the findings are shared, if they make the Company look
| good. But the methods are not anymore
| lossolo wrote:
| I wrote about it around a year ago here:
|
| "There weren't really any advancements from around 2018. The
| majority of the 'advancements' were in the amount of parameters,
| training data, and its applications. What was the GPT-3 to
| ChatGPT transition? It involved fine-tuning, using specifically
| crafted training data. What changed from GPT-3 to GPT-4? It was
| the increase in the number of parameters, improved training data,
| and the addition of another modality. From GPT-4 to GPT-40? There
| was more optimization and the introduction of a new modality. The
| only thing left that could further improve models is to add one
| more modality, which could be video or other sensory inputs,
| along with some optimization and more parameters. We are
| approaching diminishing returns." [1]
|
| 10 months ago around o1 release:
|
| "It's because there is nothing novel here from an architectural
| point of view. Again, the secret sauce is only in the training
| data. O1 seems like a variant of RLRF
| https://arxiv.org/abs/2403.14238
|
| Soon you will see similar models from competitors." [2]
|
| Winter is coming.
|
| 1. https://news.ycombinator.com/item?id=40624112
|
| 2. https://news.ycombinator.com/item?id=41526039
| tolerance wrote:
| And when winter does arrive, then what? The technology is
| slowing down while its popularity picks up. Can sparks fly out
| of snow?
| LarsDu88 wrote:
| If datasets are what we are talking about, I'd like to bring
| attention to the biological datasets out there that have yet to
| be fully harnessed.
|
| The ability to collect gene expression data at a tissue specific
| level has only been invented and automated in the last 4-5 years
| (see 10X Genomics Xenium, MERFISH). We've only recently figured
| out how to collect this data at the scale of millions of cells. A
| breakthrough on this front may be the next big area of
| advancement.
| Night_Thastus wrote:
| Man I can't wait for this '''''AI''''' stuff to blow over. The
| back and forth gets a bit exhausting.
| alganet wrote:
| Dataset? That's so 2000s.
|
| Each crawl on the internet is actually a discrete chunk of a more
| abstractly defined, constant influx of information streams. Let's
| call them rivers (it's a big stream).
|
| These rivers can dry up, present seasonal shifts, be poisoned, be
| barraged.
|
| It will never "get there" and gather enough data to "be done".
|
| --
|
| Regarding "new ideas in AI", I think there could be. But this
| whole thing is not about AI anymore.
| strangescript wrote:
| If you work with model architecture and read papers, how could
| not know there are a flood of new ideas? Only few yield
| interesting results though.
|
| I kind of wonder if libraries like pytorch have hurt experimental
| development. So many basic concepts no one thinks about anymore
| because they just use the out of the box solutions. And maybe
| those solutions are great and those parts are "solved", but I am
| not sure. How many models are using someone else's tokenizer, or
| someone else's strapped on vision model just to check a box in
| the model card?
| kevmo314 wrote:
| The people who don't think about such things probably wouldn't
| develop experimentally sans pytorch either.
| thenaturalist wrote:
| That's been the very normal way of the human world.
|
| When the foundation layer at a given moment doesn't yield an
| ROI on intellectual exploration - say because you can
| overcompensate with VC funded raw compute and make more progess
| elsewhere -, few(er) will go there.
|
| But inevitably, as other domains reach diminishing returns,
| bright minds will take a look around where significant gains
| for their effort can be found.
|
| And so will the next generation of PyTorch or foundational
| technologies evolve.
| russellbeattie wrote:
| Paradigm shifts are often just a conglomeration of previous ideas
| with one little tweak that suddenly propels a technology ahead
| 10x which opens up a whole new era.
|
| The iPhone is a perfect example. There were smartphones with
| cameras and web browsers before. But when the iPhone launched, it
| added a capacitive touch screen that was so responsive there was
| no need for a keyboard. The importance of that one technical
| innovation can't be overstated.
|
| Then the "new new thing" is followed by a period of years where
| the innovation is refined, distributed, applied to different
| contexts, and incrementally improved.
|
| The iPhone launched in 2007 is not really that much different
| than the one you have in your pocket today. The last 20 years has
| been about improvements. The web browser before that is also
| pretty much the same as the one you use today.
|
| We've seen the same pattern happen with LLMs. The author of the
| article points out that many of AI's breakthroughs have been
| around since the 1990s. Sure! And the Internet was created in the
| 1970s and mobile phones were invented in the 1980s. That doesn't
| mean the web and smartphones weren't monumental technological
| events. And it doesn't mean LLMs and AI innovation is somehow not
| proceeding apace.
|
| It's just how this stuff works.
| lsy wrote:
| This seems simplistic, tech and infrastructure play a huge part
| here. A short and incomplete list of things that contributed:
|
| - Moore's law petering out, steering hardware advancements
| towards parallelism
|
| - Fast-enough internet creating shift to processing and storage
| in large server farms, enabling both high-cost training and
| remote storage of large models
|
| - Social media + search both enlisting consumers as data
| producers, and necessitating the creation of armies of Mturkers
| for content moderation + evaluation, later becoming available for
| tagging and rlhf
|
| - A long-term shift to a text-oriented society, beginning with
| print capitalism and continuing through the rise of "knowledge
| work" through to the migration of daily tasks (work, bill paying,
| shopping) online, that allows a program that only produces text
| to appear capable of doing many of the things a person does
|
| We may have previously had the technical ideas in the 1990s but
| we certainly didn't have the ripened infrastructure to put them
| into practice. If we had the dataset to create an LLM in the 90s,
| it still would have been astronomically cost-prohibitive to
| train, both in CPU and human labor, and it wouldn't have as much
| of an effect on society because you wouldn't be able to hook it
| up to commerce or day-to-day activities (far fewer texts, emails,
| ecommerce).
| somebodythere wrote:
| I don't know if it matters. Even if the best we can do is get
| really good at interpolating between solutions to cognitive tasks
| on the data manifold, the only economically useful human labor
| left asymptotes toward frontier work; work that only a single-
| digit percentage of people can actually perform.
| tim333 wrote:
| An interesting step forward, although an old idea we seem close
| to is recursive self improvement. Get the AI to make a modified
| version of itself to try to think better.
| cadamsdotcom wrote:
| What about actively obtained data - models seeking data, rather
| than being fed. Human babies put things in their mouths, they try
| to stand and fall over. They "do stuff" to learn what works.
| Right now we're just telling models what works.
|
| What about simulation: models can make 3D objects so why not give
| them a physics simulator? We have amazing high fidelity (and low
| cost!) game engines that would be a great building block.
|
| What about rumination: behind every Cursor rule for example, is a
| whole story of why a user added it. Why not take the rule, ask a
| reasoning model to hypothesize about why that rule was created,
| and add that rumination (along with the rule) to the training
| data. Providing opportunities to reflect on the choices made by
| their users might deepen any insights, squeezing more juice out
| of the data.
| kevmo314 wrote:
| That would be reinforcement learning. The juice is quite hard
| to squeeze.
| cadamsdotcom wrote:
| Agreed for most cases.
|
| Each Cursor rule is a byproduct of tons of work and probably
| contains lots that can be unpacked. Any research on that?
| Centigonal wrote:
| Simulation and embodied AI (putting the AI in a robotic arm or
| a car so it can try stuff and gather information about the
| results) are very actively being explored.
| cadamsdotcom wrote:
| What about at inference time? ie. in response to a query.
|
| We let models write code and run it. Which gives them a high
| chance of getting arithmetic right.
|
| Solving the "crossing the river" problem by letting the model
| create and run a simulation would give a pretty high chance
| of getting it right.
___________________________________________________________________
(page generated 2025-06-30 23:00 UTC)