[HN Gopher] The Einstein AI Model
       ___________________________________________________________________
        
       The Einstein AI Model
        
       Author : 9woc
       Score  : 86 points
       Date   : 2025-03-08 14:14 UTC (2 days ago)
        
 (HTM) web link (thomwolf.io)
 (TXT) w3m dump (thomwolf.io)
        
       | Agingcoder wrote:
       | The author seems to assume that conjuring up a conjecture is the
       | hard part - yet it will be filled with the same standard
       | mathematics ( granted, sometimes wrapped as new tools, and the
       | proof ends up being as important as the result), often at great
       | cost.
       | 
       | Having powerful assistants that allow people to try out crazy
       | mathematical ideas without fear of risking their careers or just
       | having fun with ideas is likely to have an outsized impact anyway
       | I think.
        
         | timewizard wrote:
         | New things AI will magically fix by existing: The completely
         | broken university career publishing pipeline. *fingers crossed*
        
         | kristianc wrote:
         | As Isaac Newton himself put it, "if I have seen further it is
         | by standing on the shoulders of Giants." It was ever thus.
        
         | aleksiy123 wrote:
         | The Bitter Lesson seems relevant here again.
         | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
         | 
         | I think I read somewhere about Erdos having this somewhat brute
         | force approach. Whenever fresh techniques were developed (by
         | himself or others), he would go back to see if they could be
         | used on one of his long-standing open questions.
        
       | jrimbault wrote:
       | What about the not-LLMs works?
       | 
       | I know barely anything about it but it seems some people are
       | interested and excited about protein engineering powered by
       | neural networks.
        
         | tim333 wrote:
         | Deepmind are working on simulating a whole cell which will be
         | interesting and potentially useful.
        
       | neilv wrote:
       | A nice post (that should be somewhere smarter than contemporary
       | Twitter/X).
       | 
       | > _PS: You might be wondering what such a benchmark could look
       | like. Evaluating it could involve testing a model on some recent
       | discovery it should not know yet (a modern equivalent of special
       | relativity) and explore how the model might start asking the
       | right questions on a topic it has no exposure to the answers or
       | conceptual framework of. This is challenging because most models
       | are trained on virtually all human knowledge available today but
       | it seems essential if we want to benchmark these behaviors.
       | Overall this is really an open question and I'll be happy to hear
       | your insightful thoughts._
       | 
       | Why benchmarks?
       | 
       | A genius (human or AI) could produce _novel_ insights, some of
       | which could practically be tested in the real world.
       | 
       | "We can gene-edit using such-and-such approach" => Go try it.
       | 
       | No sales brochure claims, research paper comparison charts to
       | show incremental improvement, individual KPIs/OKRs to hit, nor
       | promotion packets required.
        
         | vessenes wrote:
         | The reason you'd have a benchmark is that you want to be able
         | to check in on your model programmatically. DNA wetwork is slow
         | and expensive. While you're absolutely right that benchmarks
         | aren't the best thing ever and that they are used for marketing
         | and sales purposes, they also do seem to generally create
         | capacity momentum in the market. For instance, nobody running
         | local LLMs right now would prefer a 12 month-old model to one
         | of the top models today at the same size - they are
         | significantly more capable, and many researchers believe that
         | training on new and harder benchmarks has been a way to
         | increase that capacity.
        
       | internet_points wrote:
       | If an llm is trained on knowledge up until say September 2023,
       | could you use a corpus of interesting/insightful scientific
       | discoveries and new methods developed after that date to
       | evaluate/tune it? (Though I fear it would be a small corpus.)
        
         | kingkongjaffa wrote:
         | Get a research paper, look at the references. Give an llm all
         | of the references but not the current paper. See if it can
         | conclude something like the current paper? Or at least design
         | the same experiment as detailed in the paper?
        
         | Yizahi wrote:
         | One of the problems would be acquiring said corpus. NN
         | corporations got away with scraping all human made content for
         | free (arguably stealing it all), but no one can really prove
         | that their specific content was taken without asking, so no
         | lawsuits. NYT tried but that was workaround and I don't know
         | the status of that case. But if NN corpo will come out with
         | explicitly saying that "here, we are using a Nature journal
         | dump from 2024" then Nature journal will come to them and say
         | "oh, really?".
        
       | wewewedxfgdf wrote:
       | I'm still waiting for the end of the world caused by AI as
       | predicted by a very large number of prominent figures such as Sam
       | Altman, Hinton, Musk, signers of the Center for AI Safety
       | statement, Shane Legg, Martin Minsky, Eliezer Yudkowsky.
       | 
       | No sign yet.
       | 
       | On the other hand, LLMs are writing code which I can debug and
       | eventually get to work in a real code base - and script writers
       | everywhere are writing scripts more quickly, marketing people are
       | writing better ad copy, employers are writing better job ads and
       | real estate agents writing better ads for houses.
        
         | lionkor wrote:
         | Prominent figure says baseless thing to boost stock prices,
         | more news at 6
        
           | godelski wrote:
           | The oddity isn't that people lie, the oddity is that people
           | continue to believe those who lie. They even give more trust
           | to those who constantly lie. This is certainly odd
        
           | TeMPOraL wrote:
           | Yes, except half of the list isn't made of prominent people.
           | Whose stock price was Eliezer boosting when he was talking
           | about these things _15 years ago_?
           | 
           | Nah, it's more that the masses got exposed to those ideas
           | recently - ideas which existed long ago, in obscurity - and
           | of course _now_ everyone is a fucking expert in this New
           | Thing No One Talked About Before ChatGPT.
           | 
           | Even the list GP gave, the specific names on it - the only
           | thing that this particular grouping communicates is one
           | having no first clue what they're talking about.
        
         | empiko wrote:
         | I am still waiting to see the impact on GDP or any other
         | economic measure.
        
         | netdevphoenix wrote:
         | Fact is even if the world was to end, finding the causes would
         | be extremely difficult because...well the world would have
         | ended.
        
       | berkes wrote:
       | I've had some luck instructing AI to "Don't make up anything. If
       | there's no answer, say I don't know".
       | 
       | Which made me think that AI would be far more useful (for me?) if
       | it was tuned to "Dutchness" rather than "Americanness".
       | 
       | "Dutch" famously known for being brutally blunt, rude, honest,
       | and pushing back.
       | 
       | Yet we seem to have "American" AI, tuned to "the customer is
       | always right", inventing stuff just to not let you down, always
       | willing to help even if that makes things worse.
       | 
       | Not "critical thinking" or "revolutionary" yet. Just less polite
       | and less willing to always please you. In human interaction, the
       | Dutch bluntness and honesty can be very off-putting, but It is
       | quite efficient and effective. Two traits I very much prefer my
       | software to have. I don't need my software to be polite or to not
       | hurt my feelings. It's just a tool!
        
         | XCabbage wrote:
         | Obvious thought that I haven't tested: can you literally
         | achieve this by getting it to answer in Dutch, or training an
         | AI on Dutch text? Plausibly* Dutch-language training data will
         | reflect this cultural difference by virtue of being written
         | primarily by Dutch people.
         | 
         | * (though not necessarily, since the Internet is its own
         | country with its own culture, and much training data comes from
         | the Internet)
        
           | zoover2020 wrote:
           | I've tried Dutch answers and it is more than happy to
           | hallucinate and give me answers that are very "American".
           | Doesn't help that our culture is very inspired by the US pop
           | culture as well since the internet.
           | 
           | Haven't tried prompt engineering with the Dutch stereotype,
           | though.
        
           | berkes wrote:
           | That hardly works. Though from my limited experiments,
           | claude's models are better at this than OpenAIs. OpenAI will,
           | quite often, come with suggestions that are literal
           | translations of "anglicist" phrases.
           | 
           | Such as "Ik hoop dat deze email u gezond vindt" (I hope this
           | email finds you well), which is so wrong that not even
           | "simple" translation tools would suggest this.
           | 
           | Seeing that OpenAIs models can (could? This is from a large
           | test we did months ago) not even use proper localized phrases
           | but uses American ones, I highly doubt it can or will respond
           | by refusing answers when it has none based on the training
           | data.
        
         | asddubs wrote:
         | I suspect it's a balancing act between the AI being generally
         | willing to help and avoid responses like this, e.g.:
         | 
         | https://www.sandraandwoo.com/wp-content/uploads/2024/02/twit...
         | 
         | or it just telling you to google it
        
           | shreyshnaccount wrote:
           | what (hypothetically) happens when the cost to run the next
           | giant llm exceeds the cost to hire a person for tasks like
           | this?
        
             | EGreg wrote:
             | the R&D continues
        
             | Rescis wrote:
             | Given current models can accomplish this task quite
             | successfully and cheaply, I'd say that if/when that happens
             | it would be a failure of the user (or the provider) for not
             | routing the request to the smaller, cheaper model.
             | 
             | Similar to how it would be the failure of the user/provider
             | if someone thought it was too expensive to order food in,
             | but the reason they thought that was they were looking at
             | the cost of chartering a helicopter form the restaurant to
             | their house.
        
             | vlovich123 wrote:
             | Realtime LLM generation is ~$15/million "words". By
             | comparison a human writer at the beginning of a career
             | typically earns ~$50k/million words up to
             | ~$1million/million words for experienced writers. That's
             | about 4-6 orders of magnitude.
             | 
             | Inference costs generally have many orders of magnitude to
             | go before it approaches raw human costs & there's always
             | going to be innovation to keep driving down the cost of
             | inference. This is also ignoring that humans aren't
             | available 24/7, have varying quality of output depending on
             | what's going on in their personal lives (& ignoring that
             | digital LLMs can respond quicker than humans, reducing the
             | time a task takes) & require more laborious editing than
             | might be present with an LLM. Basically the hypothetical
             | case seems unlikely to ever come to reality unless you've
             | got a supercomputer AI that's doing things no human
             | possibly could because of the amount of data it's operating
             | on (at which point, it might exceed the cost but a
             | competitive human wouldn't exist).
        
         | j45 wrote:
         | Also the more accuracy that is put into the prompt and
         | attached, the more accurate the processing is.
        
         | Kabukks wrote:
         | I suspect instructing the model to respond with "I don't know"
         | more readily will result in more of those responses even though
         | there are other options that seem viable according to the
         | training data / model.
         | 
         | Remember, LLMs are just statistical sentence completion
         | machines. So telling it what to respond with will increase the
         | likelihood of that happening, even if there are other options
         | that are viable.
         | 
         | But since you can't blindly trust LLM output anyway, I guess
         | increasing "I don't know" responses is a good way of reducing
         | incorrect responses (which will still happen frequently enough)
         | at the cost of missing some correct ones.
        
           | berkes wrote:
           | > Remember, LLMs are just statistical sentence completion
           | machines. So telling it what to respond with will increase
           | the likelihood of that happening, even if there are other
           | options that are viable.
           | 
           | Obviously. When I say "tuned" I don't mean adding stuff to a
           | prompt. I mean tuning in the way models are also tuned to be
           | more or less professional, tuned to defer certain tasks to
           | other models (i.e. counting or math, something statistical
           | models are almost unable to do) and so on.
           | 
           | I am almost certain that the chain of models we use on
           | chatgpt.com are "tuned" to always give an answer, and not to
           | answer with "I am just a model, I don't have information on
           | this". Early models and early toolchains did this far more
           | often, but today they are quite probably tuned to "always be
           | of service".
           | 
           | "Quite probably" because I have no proof, other than that it
           | will gladly hallucinate, invent urls and references, etc. And
           | knowing that all the GPT competitors are battling for users,
           | so their products quite certainly tuned to help in this
           | battle - e.g. appear to be helpful and all-knowing, rather
           | than factual correct and therefore often admittedly ignorant.
        
             | zamadatix wrote:
             | Whether you train the model how to do math internally or
             | tell it to call an external model which only does math the
             | root problem still exists. It's not as if a model which
             | only does math won't hallucinate how to solve math problems
             | just because it doesn't know about history, for the same
             | number of parameters it's probably better to not have to
             | duplicate the parts needed to understand the basis of
             | things multiple times.
             | 
             | The root problem is training models to be uncertain of
             | their answers results in lower benchmarks in every area
             | except hallucinations. It's like you were in a multiple
             | choice test and instead of picking which of answers A-D you
             | think made more sense you picked E "I don't know". Helpful
             | for the test grader, a bad bet for the model trying to
             | claim it gets the most answers right compared to other
             | models.
        
         | Yizahi wrote:
         | The so called AI can't "know". It doesn't have understanding if
         | the generated text is an answer or of it isn't. You can't force
         | that instruction on a neural network, at best it just adjusts
         | generated text slightly and you think that it somehow started
         | understanding.
        
           | berkes wrote:
           | There's a distinction between "a model" and the chain of
           | tools and models you employ when asking something on
           | chatgpt.com or any of the consumer facing alternatives.
           | 
           | The latter is a chain of models, some specialized in question
           | dissecting, some specialized in choosing the right models and
           | tools (i.e: there's a calculation in there, lets push that
           | part to a simple python function that can actually count
           | stuff, and pull the rest through a generic LLM). I experiment
           | with such toolchains myself and it's baffling how fast the
           | complexity of all this is becoming.
           | 
           | A very simple example would be "question" ->
           | "does_it_want_code_generated.model" -[yes]->
           | specialized_code_generator.model | -[no]->
           | specialized_english_generator.model"
           | 
           | So, sure: a model has no "knowledge", and nor does a chain of
           | tools. But having e.g. a model specialized (ie. trained on or
           | enriched with) all scientific papers ever, or maybe even a
           | vector DB with all that data, somewhere in the toolchain that
           | is in charge of either finding the "very likely references"
           | or denying an answer would help a lot. It would for me.
        
             | Yizahi wrote:
             | Sure, chains of networks can guess at the "passable" answer
             | much better/faster/cheaper etc. But that doesn't remove the
             | core issue, that none of the sub-networks or decision trees
             | can understand what it generates, and so it can't abort its
             | work and output "no answer" or something similar.
             | 
             | The whole premise of original request was that user raises
             | a task for NN which has a verifiable (maybe partially)
             | answer. He sees incorrect answer and wishes that a
             | "failure" was displayed instead. But NN can't verify
             | correctness of it's output. After all G in GPT stands for
             | Generative.
        
               | berkes wrote:
               | My simple RAG setup has a steps that will return "We
               | don't have this information" if e.g. our vector DB
               | returns entries with far too low relevancy scores or if
               | the response from the LLM fails to add certain attributes
               | in its answer and so on.
               | 
               | Edit: TBC: these "steps" aren't LLMS or other models.
               | They're simple code with simple if/elses and an
               | accidental regex.
               | 
               | Again: an LLM/NN indeed has no "understanding" of what it
               | creates. Especially the LLMs that are "just" statistical
               | models. But the tooling around it, the entire chain can
               | very well handle this.
        
           | baq wrote:
           | How confident can you be in this? Have you analyzed what
           | exactly the billions of weights do?
           | 
           | I've got my opinions about what LLMs are and what they
           | aren't, but I don't confidently claim that they must be such.
           | There's a lot of stuff in those weights.
        
             | Q6T46nT668w6i3m wrote:
             | I'm confident that there's no magic and I've spent years
             | understanding "what the weights do." You're describing
             | weights as magic and they are not.
        
           | hatthew wrote:
           | Can you clarify what definition of "understanding" you're
           | using here?
        
         | netdevphoenix wrote:
         | > known for being brutally blunt, rude, honest, and pushing
         | back.
         | 
         | That's a different perspective. Dutch people don't see
         | themselves as rude. A Dutch could say that Americans are known
         | for being dishonest and not truly conveying what they mean. Yet
         | Americans won't see themselves this way. You can replace Dutch
         | and American for any other nationality
        
           | berkes wrote:
           | I am Dutch, have lived in many countries in several
           | continents. I do see myself as rude. But, being Dutch, I
           | don't give a ** ;).
        
         | lifestyleguru wrote:
         | > "Dutch" famously known for being brutally blunt, rude,
         | honest, and pushing back.
         | 
         | Dutch will never bluntly push back if you plan to setup tax
         | evasion scheme in their country. Being vicious assholes in
         | daily stuff especially towards strangers? That's hardly
         | something deserving praise.
        
           | eszed wrote:
           | To be fair, that's consequent to the Netherlands' well-known
           | love of soda-bread sandwiches.
        
             | lifestyleguru wrote:
             | What do you mean, some Irish reference? Oh I see, I
             | answered to myself;)
        
               | eszed wrote:
               | :-)
               | 
               | I was aiming for juuuust subtle enough for the joke to
               | land, if you know the reference. Now I know it did, here
               | the rest of y'all go:
               | 
               | https://en.m.wikipedia.org/wiki/Dutch_Sandwich
        
           | msm_ wrote:
           | That's... a surprisingly crass thing to say. I would play it
           | off as a joke, if not for the second part of your post. Dutch
           | people are not "vicious assholes", they have a culture of
           | direct communication. Assuming that only your culture
           | communication patterns are "correct" is xenophobic and close-
           | minded.
           | 
           | And connecting all people in the country with "tax evasion
           | schemes" is rude, if that was not actually a joke.
        
             | lifestyleguru wrote:
             | I'm just being brutally blunt. It goes both ways. The scale
             | of this these evasion schemes is monstrous, not a joke at
             | all.
        
           | theshackleford wrote:
           | > Being vicious assholes in daily stuff especially towards
           | strangers? That's hardly something deserving praise.
           | 
           | I'll take it over the fake American politeness any day, 100
           | times over.
        
         | OutOfHere wrote:
         | I have seen the other side where a configured AI responds "I
         | don't know" far too much, often when it shouldn't. There is
         | nothing more useless than it. Certainly we need an accurate
         | balance.
        
         | jyounker wrote:
         | One of my current models for LLMs is that they're compression
         | algorithms. They compress a large amount of training data into
         | a set of weights. A query is a key into that compression space.
         | Hallucinations happen when you supply a key that corresponds to
         | something that wasn't in the training set.
        
           | threeducks wrote:
           | The nice think about LLMs is that they can answer some
           | questions which were not in the training set. Unfortunately,
           | it is not easy to tell when that is the case.
        
         | janalsncm wrote:
         | One approach does use this. You can ask an LLM to explicitly
         | check its own answers by outputting thinking tokens, generating
         | a reward signal if it gets the right answer, and directly
         | updating based on the reward signals. That's a part of how
         | DeepSeek R1 was trained. It's better but not perfect, because
         | the thinking process is imperfect. Ultimately the LLM might not
         | know what it doesn't know.
        
         | neom wrote:
         | I take my final thoughts out of the LLM and into two other new
         | convos, I give both of them the same convo, but I ask one to
         | steel man and the other to straw man.. I find it's a decent way
         | to look for nuances you're missing.
        
         | KoolKat23 wrote:
         | Gemini can be quite direct if it's adamant that it is correct.
        
       | OtherShrezzing wrote:
       | >We're currently building very obedient students, not
       | revolutionaries. This is perfect for today's main goal in the
       | field of creating great assistants and overly compliant helpers.
       | But until we find a way to incentivize them to question their
       | knowledge and propose ideas that potentially go against past
       | training data, they won't give us scientific revolutions yet.
       | 
       | This would definitely be an interesting future. I wonder what
       | it'd do to all of the work in alignment & safety if we started
       | encouraging AIs to go a bit rogue in some domains.
        
       | TeMPOraL wrote:
       | > _If something was not written in a book I could not invent it
       | unless it was a rather useless variation of a known theory.
       | __More annoyingly, I found it very hard to challenge the status-
       | quo__, to question what I had learned._
       | 
       | (__emphasis__ mine)
       | 
       | As if "challenging the status-quo" was the goal in the first
       | place. You ain't gonna get any Einstein by asking people to think
       | inside the "outside the box" box. "Status quo" isn't the enemy,
       | and defying it isn't the path to genius; if you're measuring your
       | own intellectual capacity by proxy of how much you question, you
       | ain't gonna get anywhere useful. After all, questioning
       | everything is _easy_ , and doesn't require any particular skill.
       | 
       | The hard thing is to be _right_ , despite both the status-quo and
       | the "question the status-quo" memes.
       | 
       | (It also helps being in the right time and place, to have access
       | to the results of previous work that is required to make that
       | next increment - that's another, oft forgotten factor.)
        
       | mentalgear wrote:
       | BlueSky version:
       | https://bsky.app/profile/thomwolf.bsky.social/post/3ljpkl6c6...
       | 
       | ---
       | 
       | Quite interesting post that asks the right question about "asking
       | the right questions". Yet one aspect I felt missing (which might
       | automatically solve this) is first-principles-based causal
       | reasoning.
       | 
       | A truly intelligent system -- one that reasons from first
       | principles by running its own simulations and physical
       | experiments -- would notice if something doesn't align with the
       | "textbook version".
       | 
       | It would recognize when reality deviates from expectations and
       | ask follow-up questions, naturally leading to deeper insights and
       | the right questions - and answers.
       | 
       | Fascinating in this space is the new "Reasoning-Prior" approach
       | (MIT Lab & Harvard), which trains reasoning capabilities learned
       | from the physical world as a foundation for new models (before
       | evening learning about text).
       | 
       | Relevant paper: "General Reasoning Requires Learning to Reason
       | from the Get-go."
        
         | mentalgear wrote:
         | PS: great explainer video
         | https://www.youtube.com/watch?v=seTdudcs-ws&t=180s
        
       | moralestapia wrote:
       | >Just consider the crazy paradigm shift of special relativity and
       | the guts it took to formulate a first axiom like "let's assume
       | the speed of light is constant in all frames of reference"
       | defying the common sense of these days (and even of today...)
       | 
       | I'm not an expert on this. Wasn't this an observed phenomenon
       | before Albert put together his theory?
        
         | tim333 wrote:
         | It was an observed phenomenon -
         | https://en.wikipedia.org/wiki/Michelson%E2%80%93Morley_exper...
         | 
         | Einsteins more impressive stuff was explaining that by time
         | passing at different rates for different observers
        
         | zesterer wrote:
         | Weird problems with physics were everywhere before Einstein.
         | Maxwell comes _painfully_ close to discovering GR in some of
         | his musings on black body radiation.
         | 
         | Noticing that there was a problem was not the breakthrough:
         | trying something bizarre and counter-cultural - like assuming
         | light speed is invariant over the observer - just to see if
         | anything interesting drops out was the breakthrough.
        
       | tim333 wrote:
       | >I'm afraid AI won't give us a "compressed 21st century".
       | 
       | There's no mention of exponential growth which seems a major
       | omission when you are talking about centuries. Computers have
       | kept improving in a Moore's law like way in terms of compute per
       | dollar and no doubt will keep on like that for a while yet. Give
       | it a few years and AI tech will be way better than what we have
       | now. I don't know about exact timings like 5-10 years but in a
       | while.
        
         | dimitri-vs wrote:
         | What exponential growth? By all accounts things are slowing
         | down: sonnet3.7 is not exponentially better, neither is gpt4.5,
         | grok3 is just catching up. I'm still using sonnet3.5 for a lot
         | of coding because IMO it's better than 3.7.
        
           | tim333 wrote:
           | Exponential growth of computing power which will lead to a
           | gradual increase in AI performance. I think the oldest LLM
           | you mention there is nine months old which is not very long
           | in the scheme of things but give it a couple of years and
           | you'll probably see a good improvement.
        
         | zesterer wrote:
         | The whole point of this post is that the things AI isn't good
         | at and has never been good at will be the _limit_ to otherwise-
         | exponential growth.
        
           | tim333 wrote:
           | Well, yeah the post kind of tries to argue that but it is
           | also talking about how we don't have an Einstein or Newton
           | like AI. Those two are outliers thought of as some of the
           | smartest scientists ever to have lived and so are a bit of an
           | unrealistic target just now.
           | 
           | As to whether AI can go beyond doing what it's told and make
           | new discoveries, we've sort of seen that a bit with for
           | example the AlphaGo type programs coming up with modes of
           | play humans hadn't thought of. I guess I don't buy the
           | hypothesis that if you had an AI smarter than Einstein it
           | wouldn't be able to make Einstein like discoveries due to not
           | being a rebel.
        
       | rcarmo wrote:
       | He means YMaaS, no? Might as well coin the acronym early.
        
       | dang wrote:
       | (Most comments here were posted to
       | https://news.ycombinator.com/item?id=43317269 and then moved
       | hither.)
        
       | ilaksh wrote:
       | I think it's more of a social phenomenon than an intellectual
       | characteristic. I guess these days people would just assume that
       | outlier ideas come from autism, but I think that isn't
       | necessarily true.
       | 
       | But maybe it helps to be socially isolated or just stubborn.
       | People do not want to accept new approaches.
       | 
       | Clearly they do eventually, but there is always some friction.
       | 
       | But I think that it's been shown that through promoting and
       | various types of training or tuning, LLMs can be configured to be
       | non- sycophantic. It's just that humans don't want to be
       | contradicted so that can be trained out of them during
       | reinforcement.
       | 
       | Along with the training process just generally being aimed at
       | producing expected rather than unexpected answers.
        
       | randomNumber7 wrote:
       | Thing about the Einstein example is, that it was already known
       | the speed of light is constant.
       | 
       | The question he asked was just that this fact was not compatible
       | with the Maxwell equations.
        
       | systemstops wrote:
       | Wouldn't the ability to "ask the right questions" require that AI
       | could update its own weights, as those weights determine which
       | questions can be asked?
        
       | ypeterholmes wrote:
       | Hey look, the goalposts are being moved again. This time it's
       | from top end researcher to generational genius. Question: what
       | evidence is there that this benchmark will not be reached also?
       | Time and again these essays make the mistake of assuming AI is a
       | static thing, and refuse to acknowledge the inexorable march
       | forward we are witnessing. As humans, we cling to our own fragile
       | superiority. Even on this thread- I thought Hinton said the world
       | would be transformed by now. That's NOT what was claimed. We are
       | like three years in! Posts like this will be laughable in 10
       | years.
        
         | nl wrote:
         | > Hey look, the goalposts are being moved again.
         | 
         | Typically the "moving goalpost" posts are "we don't have AI
         | because ....". That's not what this post is doing - it's
         | pointing out a genuine weakness and a way forward.
        
       | janalsncm wrote:
       | > Many have been proposing "move 37" as evidence that AI has
       | already reached Einstein-level intelligence
       | 
       | I don't think this example applies in the ways we care about.
       | Sure, in the domain of go we have incredibly powerful engines.
       | Poker too, which is an imperfect information game which you could
       | argue is more similar to life in that regard.
       | 
       | But life has far more degrees of freedom than go or poker, and
       | the "value" of any one action is impossible to calculate due to
       | imperfect information. And unlike in poker, where probabilities
       | can be calculated, we don't even have the probability
       | distribution for most events, even if we could enumerate them.
        
         | haswell wrote:
         | I didn't interpret the mention of move 37 in the way I think
         | you are here.
         | 
         | The author brought it up specifically to highlight that they
         | don't believe move 37 signifies what many people think it does,
         | and that while impressive, it's not general enough to indicate
         | what some people seem to believe it indicates.
         | 
         | In essence, I think they said the same thing you are using
         | different words.
        
           | janalsncm wrote:
           | I don't disagree with the author, I just think their argument
           | isn't as strong as it could be. Excelling in a constrained
           | decision space like go is fundamentally less difficult than
           | doing the same in the real world. It's a categorical
           | difference that the author didn't mention.
           | 
           | I'm also not even convinced move 37 was properly explained as
           | a "straight A student" behavior. AlphaGo did bootstrap by
           | studying human games but it also learned more fundamental
           | value functions via self play.
        
       | phillipcarter wrote:
       | A way I've been thinking about this today is:
       | 
       | We can't distinguish between a truly novel response from an LLM
       | or a hallucination.
       | 
       | We can get some of the way there, such as if we know what the
       | outcome to a problem should look like, and are seeking a better
       | function to achieve that outcome. Certainly at small scales and
       | in environments where there are minimal consequences for failure,
       | this could work.
       | 
       | But this breaks down as things get more complicated. We won't be
       | able to test the effectiveness of 100 million potential solutions
       | to eradicating brain tumors at once. Even if we somehow arrive at
       | guaranteeing that every unforeseen consequence is also accounted
       | for in our exercise in specifying the goals and constraints of
       | the problem. We just simply don't have the logistics to run 100
       | million clinical trials where we also know how to account for
       | countless confounding effects (let alone consent!)
        
       | tyronehed wrote:
       | The first thing you need to understand is that no current llm
       | based, transformer architected AI is going to get to agi. The
       | design in essence is not capable of that kind of creativity. In
       | fact no AI that has at its root a statistical analysis or
       | probabilistic correlation will get us past the glorified Google
       | parlor trick that is the modern llm in every form.
       | 
       | A great leap in IP but unfortunately is too important to blab
       | about widely, is the solution to this problem and the
       | architecture that will be contained in the ultimate AGI solution
       | that emerges.
        
       | hackerknew wrote:
       | Could we train an AI model on the corpus of physics knowledge up
       | to the year 1905 and then see if we can adjust the prompt to get
       | it to output the theory of relativity?
       | 
       | This would be an interesting experiment for other historical
       | discoveries too. I'm now curious if anybody has created a model
       | with "old data" like documents and books from hundreds of years
       | ago, and see if comes up with the same conclusions as researchers
       | and scientists of the past.
       | 
       | Would AI have been able to predict the effectiveness of vaccines,
       | insulin, other medical discoveries?
        
         | knowaveragejoe wrote:
         | Now that would be interesting!
        
       | nahuel0x wrote:
       | We saw algorithms designing circuits that no human engineer would
       | design, even before the LLM (using genetic algorithms). So out-
       | the-box thinking can be also more reachable than this author
       | thinks.
        
       ___________________________________________________________________
       (page generated 2025-03-10 23:00 UTC)