[HN Gopher] All AI models might be the same
       ___________________________________________________________________
        
       All AI models might be the same
        
       Author : jxmorris12
       Score  : 95 points
       Date   : 2025-07-17 17:28 UTC (5 hours ago)
        
 (HTM) web link (blog.jxmo.io)
 (TXT) w3m dump (blog.jxmo.io)
        
       | tyronehed wrote:
       | Especially if they are all me-too copies of a Transformer.
       | 
       | When we arrive at AGI, you can be certain it will not contain a
       | Transformer.
        
         | jxmorris12 wrote:
         | I don't think architecture matters. It seems to be more a
         | function of the data somehow.
         | 
         | I once saw a LessWrong post claiming that the Platonic
         | Representation Hypothesis doesn't hold when you only embed
         | random noise, as opposed to natural images:
         | http://lesswrong.com/posts/Su2pg7iwBM55yjQdt/exploring-the-p...
        
           | blibble wrote:
           | > I don't think architecture matters. It seems to be more a
           | function of the data somehow.
           | 
           | of course it matters
           | 
           | if I supply the ants in my garden with instructions on how to
           | build tanks and stealth bombers they're still not going to be
           | able to conquer my front room
        
       | TheSaifurRahman wrote:
       | This only works when different sources share similar feature
       | distributions and semantic relationships.
       | 
       | The M or B game breaks down when you play with someone who knows
       | obscure people you've never heard of. Either you can't recognize
       | their references, or your sense of "semantic distance" differs
       | from theirs. The solution is to match knowledge levels: experts
       | play with experts, generalists with generalists.
       | 
       | The same applies to decoding ancient texts, if ancient
       | civilizations focused on completely different concepts than we do
       | today, our modern semantic models won't help us understand their
       | writing.
        
         | npinsker wrote:
         | I've played this game with friends occasionally and -- when
         | it's a person -- don't think I've ever completed a game.
        
       | TheSaifurRahman wrote:
       | Has there been research on using this to make models smaller? If
       | models converge on similar representations, we should be able to
       | build more efficient architectures around those core features.
        
         | yorwba wrote:
         | It's more likely that such an architecture would be bigger
         | rather than smaller. https://arxiv.org/abs/2412.20292
         | demonstrated that score-matching diffusion models approximate a
         | process that combines patches from different training images.
         | To build a model that makes use of this fact, all you need to
         | do is look up the right patch in the training data. Of course a
         | model the size of its training data would typically be rather
         | unwieldy to use. If you want something smaller, we're back to
         | approximations created by training the old-fashioned way.
        
         | giancarlostoro wrote:
         | I've been thinking about this a lot. I want to know what's the
         | smallest a model needs to be, before letting it browse search
         | engines, or files you host locally is actually an avenue an LLM
         | can go through to give you more informed answers. Is it 2GB?
         | 8GB? Would love to know.
        
       | empath75 wrote:
       | This is kind of fascinating because I just tried to play
       | mussolini or bread with chatgpt and it is absolutely _awful_ at
       | it, even with reasoning models.
       | 
       | It just assumes that your answers are going to be reasonably
       | bread-like or reasonably mussolini-like, and doesn't think
       | laterally at all.
       | 
       | It just kept asking me about varieties of baked goods.
       | 
       | edit: It did much better after I added some extra explanation --
       | that it could be anything that it may be very unlike either
       | choice, and not to try and narrow down too quickly
        
         | fsmv wrote:
         | I think an LLM is a bit too high level for this game or maybe
         | it just would need a lengthy prompt to explain the game.
         | 
         | If you used word2vec directly it's the exact right thing to
         | play this game with. Those embeddings exist in an LLM but it's
         | trained to respond like text found online not play this game.
        
       | Xcelerate wrote:
       | Edit: I wrote my comment a bit too early before finishing the
       | whole article. I'll leave my comment below, but it's actually not
       | very closely related to the topic at hand or the author's paper.
       | 
       | I agree with the gist of the article (which IMO is basically that
       | universal computation is universal regardless of how you perform
       | it), but there are two big issues that prevent this observation
       | from helping us in a practical sense:
       | 
       | 1. Not all models are equally _efficient_. We already have many
       | methods to perform universal search (e.g., Levin 's, Hutter's,
       | and Schmidhuber's versions), but they are painfully slow despite
       | being optimal in a narrow sense that doesn't extrapolate well to
       | real world performance.
       | 
       | 2. Solomonoff induction is only optimal for _infinite_ data
       | (i.e., it can be used to create a predictor that asymptotically
       | dominates any other algorithmic predictor). As far as I can tell,
       | the problem remains totally unsolved for _finite_ data, due to
       | the additive constant that results from the question: _which_
       | universal model of computation should be applied to finite data?
       | You can easily construct a Turing machine that is universal and
       | perfectly reproduces the training data, yet nevertheless
       | dramatically fails to generalize. No one has made a strong case
       | for any specific natural prior over universal Turing machines
       | (and if you try to define some measure to quantify the  "size" of
       | a Turing machine you realize this method starts to fail once the
       | number of transition tables becomes large enough to start
       | exhibiting redundancy).
        
         | im3w1l wrote:
         | Regarding your second point I think there are two cases here
         | that should be kept separate. The first is that you are
         | teleported into a parallel dimension where literally everything
         | works differently from here. In that case I do agree that there
         | are several reasonable choices of models of computation. You
         | simply have to pick one and hope it wasn't too bad.
         | 
         | But the second case is that you encounter some phenomenon here
         | in our ordinary world. And in that case I think you can do way
         | better by reasoning about the phenomenon and trying to guess at
         | plausible mechanics based on your preexisting knowledge of how
         | the world works. In particular, I think guessing that "there is
         | some short natural language description of how the phenomenon
         | works, based on a language grounded in the corpus of human
         | writing" is a very reasonable prior.
        
       | dr_dshiv wrote:
       | What about the platonic bits? Any other articles that give more
       | details there?
        
       | somethingsome wrote:
       | Mmmh I'm deeply skeptical of some parts.
       | 
       | > One explanation for why this game works is that there is only
       | one way in which things are related
       | 
       | There is not, this is a completely non transitive relationship.
       | 
       | On another point, suppose you keep the same vocabulary, but
       | permute the signification of the words, the neural network will
       | still learn relationships, completely different ones, but it's
       | representation may converge toward a better compression for that
       | set of words, but I'm dubious that this new compression scheme
       | will ressemble the previous one (?)
       | 
       | I would say that given an optimal encoding of the relationships,
       | we can achieve an extreme compression, but not all encodings lead
       | to the same compression at the end.
       | 
       | If I add 'bla' between every words in a text, that is easy to
       | compress, but now, if I add an increasing sequence of words
       | between each words, the meaning is still there, but the
       | compression will not be the same, as the network will try to
       | generate the words in-between.
       | 
       | (thinking out loud)
        
       | coffeecoders wrote:
       | I think "we might decode whale speech or ancient languages" is a
       | huge stretch. Context is the most important part of what makes
       | language useful.
       | 
       | There is billions of human-written texts, grounded in shared
       | experience that makes our AI good at language. We don't have that
       | for a whale.
        
         | klank wrote:
         | If a lion could speak, would we understand it?
        
           | eddythompson80 wrote:
           | There is nothing really special about speech as a form of
           | communication. All animals communicate with each other and
           | with other animals. Informational density and, uhhhhh,
           | cyclomatic complexity might be different between speech and a
           | dance or a grunt or whatever.
        
             | klank wrote:
             | I was referencing Wittgenstein's "If a lion could speak, we
             | would not understand it." Wittgenstein believed (and I am
             | strongly inclined to agree with him) that our ability to
             | convey meaning through communication was intrinsically tied
             | to (or, rather, sprang forth from) our physical, lived
             | experiences.
             | 
             | Thus, to your point, assuming communication, because
             | "there's nothing really special about speech", does that
             | mean we would be able to understand a lion, if the lion
             | could speak? Wittgenstein would say probably not. At least
             | not initially and not until we had built shared lived
             | experiences.
        
               | cdrini wrote:
               | Hmm I'm not convinced we don't have a lot of shared
               | experience. We live on the same planet. We both hunger,
               | eat, and drink. We see the sun, the grass, the sky. We
               | both have muscles that stretch and compress. We both
               | sleep and yawn.
               | 
               | I mean who knows, maybe their perception of these shared
               | experiences would be different enough to make
               | communication difficult, but still, I think it's
               | undeniably shared experience.
        
               | klank wrote:
               | That's fair. To me, the point of Wittgenstein's lion
               | thought experiment though was not necessarily to say that
               | _any_ communication would be impossible. But to
               | understand what it truly meant to be a lion, not just
               | what it meant to be an animal. But we have no shared lion
               | experiences nor does a lion have human experiences. So
               | would we be able to have a human to lion communication
               | even if we could both speak human speech?
               | 
               | I think that's the core question being asked and that's
               | the one I have a hard time seeing how it'd work.
        
               | cdrini wrote:
               | Hmm, I'm finding the premise a bit confusing, "understand
               | what it truly meant to be a lion". I think that's quite
               | different than having meaningful communication. One could
               | make the same argument for "truly understanding" what it
               | means to be someone else.
               | 
               | My thinking is that if something is capable of human-
               | style speech, then we'd be able to communicate with them.
               | We'd be able to talk about our shared experiences of the
               | planet, and, if we're capable of human-style speech,
               | likely also talk about more abstract concepts of what it
               | means to be a human or lion. And potentially create new
               | words for concepts that don't exist in each language.
               | 
               | I think the fact that human speech is capable of abstract
               | concepts, not just concrete concepts, means that shared
               | experience isn't necessary to have meaningful
               | communication? It's a bit handwavy, depends a bit on how
               | we're defining "understand" and "communicate".
        
               | klank wrote:
               | > I think the fact that human speech is capable of
               | abstract concepts, not just concrete concepts, means that
               | shared experience isn't necessary to have meaningful
               | communication?
               | 
               | I don't follow that line of reasoning. To me, in that
               | example, you're still communicating with a human, who
               | regardless of culture, or geographic location, still
               | shares an immense amount of shared life experiences with
               | you.
               | 
               | Or, they're not. For example, an intentionally extreme
               | example, I bet we'd have a super hard time talking about
               | homotopy type theory with a member of the amazon rain
               | forest. Similarly, I'd bet they had their own abstract
               | concepts that they would not be able to easily explain to
               | us.
        
               | Isamu wrote:
               | If we had a sufficiently large corpus of lion-speech we
               | could build an LLM (Lion Language Model) that would
               | "understand" as well as any model could.
               | 
               | Which isn't saying much, it still couldn't explain Lion
               | Language to us, it could just generate statistically
               | plausible examples or recognize examples.
               | 
               | To translate Lion speech you'd need to train a
               | transformer on a parallel corpus of Lion to English, the
               | existence of which would require that you already
               | understand Lion.
        
               | klank wrote:
               | And even, assuming the existence of a Lion to English
               | corpus, it would only give us Human word approximations.
               | We experience how lossy that type of translation is
               | already between Human->Human languages. Or sometimes
               | between dialects within the same language.
               | 
               | Who knows, we don't really have good insight into how
               | this information loss, or disparity grows. Is it linear?
               | exponential? Presumably there is a threshold beyond which
               | we simply have no ability to translate while retaining a
               | meaningful amount of original meaning.
               | 
               | Would we know it when we tried to go over that threshold?
               | 
               | Sorry, I know I'm rambling. But it has always been
               | regularly on my mind and it's easy for me to get on a
               | roll. All this LLM stuff only kicked it all into
               | overdrive.
        
               | cdrini wrote:
               | Hmm I don't think we'd need a rosetta stone. In the same
               | way LLMs associate via purely contextual usage the
               | meaning of words, two separate data sets of lion and
               | English, encoded into the same vector space, might pick
               | up patterns of contextual usage at a high enough level to
               | allow for mapping between the two languages.
               | 
               | For example, given thousands of English sentences with
               | the word "sun", the vector embedding encodes the meaning.
               | Assuming the lion word for "sun" is used in much the same
               | context (near lion words for "hot", "heat", etc), it
               | would likely end up in a similar spot near the English
               | word for sun. And because of our shared context living in
               | earth/being animals, I reckon many words likely will be
               | used in similar contexts.
               | 
               | That's my guess though, note I don't know a ton about the
               | internals of LLMs.
        
           | ecocentrik wrote:
           | That was a philosophical position on the difficulty of
           | understanding alien concepts and language, not a hard
           | technological limit.
        
             | klank wrote:
             | I'm missing why that distinction matters given the thread
             | of conversation.
             | 
             | Would you care to expound?
        
           | kouru225 wrote:
           | Knowing lions I bet all they'd talk about is being straight
           | up dicks to anyone and everyone around them so yea I think we
           | probably could ngl
        
           | UltraSane wrote:
           | We should understand common concepts like hungry, tired,
           | horny, pain, etc.
        
       | streptomycin wrote:
       | _Is it closer to Mussolini or bread? Mussolini._
       | 
       |  _Is it closer to Mussolini or David Beckham? Uhh, I guess
       | Mussolini. (Ok, they're definitely thinking of a person.)_
       | 
       | That reasoning doesn't follow. Many things besides people would
       | have the same answers, for instance any animal that seems more
       | like Mussolini than Beckham.
        
         | jxmorris12 wrote:
         | Whoops. I hope you can overlook this minor logical error.
        
           | streptomycin wrote:
           | Oh yeah it's absolutely an interesting article!
        
         | pjio wrote:
         | I believe the joke is about David Beckham not really being
         | (perceived as) human, even when compared to personified evil
        
         | Fomite wrote:
         | Oswald Mosley
        
       | gerdesj wrote:
       | The devil is in the details.
       | 
       | I recently gave the "Veeam Intelligence" a spin.
       | 
       | Veeam is a backup system spanning quite a lot of IT systems with
       | a lot of options - it is quite complicated but it is also a
       | bounded domain - the app does as the app does. It is very mature
       | and has extremely good technical documentation and a massive
       | amount of technical information docs (TIDs) and a vibrant and
       | very well informed set of web forums, staffed by ... staff and
       | even the likes of Anton Gostev -
       | https://www.veeam.com/company/management-team.html
       | 
       | Surely they have close to the perfect data set to train on?
       | 
       | I asked a question about moving existing VMware replicas from one
       | datastore to another and how to keep my replication jobs working
       | correctly. In this field, you may not be familiar with my
       | particular requirements but this is not a niche issue.
       | 
       | The "VI" came up with a reasonable sounding answer involving a
       | wizard. I hunted around the GUI looking for it (I had actually
       | used that wizard a while back). So I asked where it was and was
       | given directions. It wasn't there. The wizard was genuine but its
       | usage here was a hallucination.
       | 
       | A human might have done the same thing with some half remembered
       | knowledge but would soon fix that with the docs or the app
       | itself.
       | 
       | I will stick to reading the docs. They are really well written
       | and I am reasonably proficient in this field so actually - a
       | decent index is all I need to get a job done. I might get some of
       | my staff to play with this thing when given a few tasks that they
       | are unfamiliar with and see what it comes up with.
       | 
       | I am sure that domain specific LLMs are where it is at but we
       | need some sort of efficient "fact checker" system.
        
       | ieie3366 wrote:
       | LLMs are bruteforce reverse engineered human brains. Think about
       | it. Any written text out there is written by human brains. The
       | "function" to output this is whatever happens inside the brain,
       | insanely complex.
       | 
       | LLM "training" is just brute forcing the same function into
       | existence. "Human brain output X, llm output Y, mutate it times
       | billion until X and Y start matching"
        
       | tgsovlerkhgsel wrote:
       | I've noticed that many of the large, separately developed AIs
       | often answer with remarkably similar wording to the same
       | question.
        
       | kindkang2024 wrote:
       | The Dao can be spoken of, yet what is spoken is not the eternal
       | Dao.
       | 
       | So, what is the Dao? Personally, I see it as will -- something we
       | humans express through words. Even though we speak different
       | languages -- Chinese, Japanese, English... -- behind them all
       | lies a similar will.
       | 
       | Large language models learn from word tokens and begin to grasp
       | this will -- and in doing so, they become the Dao.
       | 
       | In that sense, I agree: "All AI models might be the same."
        
       | foxes wrote:
       | So in the limit the models representation space has one dimension
       | per "concept" or something, but making it couple things together
       | is what actually makes it useful?
       | 
       | An infinite dimensional model with just one dim per concept would
       | be sorta useless, but you need things tied together?
        
       | IAmNotACellist wrote:
       | I agree LLMs are converging on a current representation of
       | reality based on the collective works of humanity. What we need
       | to do is provide AIs with realtime sensory input, simulated
       | hormones each with their own half-lifes based on metabolic
       | conditions and energy usage, a constant thinking loop, and
       | discover a synthetic psilocybin that's capable of causing
       | creative, cross-neural connections similar to human brains. We
       | have the stoned ape theory, we need the stoned AI theory.
        
       ___________________________________________________________________
       (page generated 2025-07-17 23:00 UTC)