[HN Gopher] The Einstein AI Model
___________________________________________________________________
The Einstein AI Model
Author : 9woc
Score : 86 points
Date : 2025-03-08 14:14 UTC (2 days ago)
(HTM) web link (thomwolf.io)
(TXT) w3m dump (thomwolf.io)
| Agingcoder wrote:
| The author seems to assume that conjuring up a conjecture is the
| hard part - yet it will be filled with the same standard
| mathematics ( granted, sometimes wrapped as new tools, and the
| proof ends up being as important as the result), often at great
| cost.
|
| Having powerful assistants that allow people to try out crazy
| mathematical ideas without fear of risking their careers or just
| having fun with ideas is likely to have an outsized impact anyway
| I think.
| timewizard wrote:
| New things AI will magically fix by existing: The completely
| broken university career publishing pipeline. *fingers crossed*
| kristianc wrote:
| As Isaac Newton himself put it, "if I have seen further it is
| by standing on the shoulders of Giants." It was ever thus.
| aleksiy123 wrote:
| The Bitter Lesson seems relevant here again.
| http://www.incompleteideas.net/IncIdeas/BitterLesson.html
|
| I think I read somewhere about Erdos having this somewhat brute
| force approach. Whenever fresh techniques were developed (by
| himself or others), he would go back to see if they could be
| used on one of his long-standing open questions.
| jrimbault wrote:
| What about the not-LLMs works?
|
| I know barely anything about it but it seems some people are
| interested and excited about protein engineering powered by
| neural networks.
| tim333 wrote:
| Deepmind are working on simulating a whole cell which will be
| interesting and potentially useful.
| neilv wrote:
| A nice post (that should be somewhere smarter than contemporary
| Twitter/X).
|
| > _PS: You might be wondering what such a benchmark could look
| like. Evaluating it could involve testing a model on some recent
| discovery it should not know yet (a modern equivalent of special
| relativity) and explore how the model might start asking the
| right questions on a topic it has no exposure to the answers or
| conceptual framework of. This is challenging because most models
| are trained on virtually all human knowledge available today but
| it seems essential if we want to benchmark these behaviors.
| Overall this is really an open question and I'll be happy to hear
| your insightful thoughts._
|
| Why benchmarks?
|
| A genius (human or AI) could produce _novel_ insights, some of
| which could practically be tested in the real world.
|
| "We can gene-edit using such-and-such approach" => Go try it.
|
| No sales brochure claims, research paper comparison charts to
| show incremental improvement, individual KPIs/OKRs to hit, nor
| promotion packets required.
| vessenes wrote:
| The reason you'd have a benchmark is that you want to be able
| to check in on your model programmatically. DNA wetwork is slow
| and expensive. While you're absolutely right that benchmarks
| aren't the best thing ever and that they are used for marketing
| and sales purposes, they also do seem to generally create
| capacity momentum in the market. For instance, nobody running
| local LLMs right now would prefer a 12 month-old model to one
| of the top models today at the same size - they are
| significantly more capable, and many researchers believe that
| training on new and harder benchmarks has been a way to
| increase that capacity.
| internet_points wrote:
| If an llm is trained on knowledge up until say September 2023,
| could you use a corpus of interesting/insightful scientific
| discoveries and new methods developed after that date to
| evaluate/tune it? (Though I fear it would be a small corpus.)
| kingkongjaffa wrote:
| Get a research paper, look at the references. Give an llm all
| of the references but not the current paper. See if it can
| conclude something like the current paper? Or at least design
| the same experiment as detailed in the paper?
| Yizahi wrote:
| One of the problems would be acquiring said corpus. NN
| corporations got away with scraping all human made content for
| free (arguably stealing it all), but no one can really prove
| that their specific content was taken without asking, so no
| lawsuits. NYT tried but that was workaround and I don't know
| the status of that case. But if NN corpo will come out with
| explicitly saying that "here, we are using a Nature journal
| dump from 2024" then Nature journal will come to them and say
| "oh, really?".
| wewewedxfgdf wrote:
| I'm still waiting for the end of the world caused by AI as
| predicted by a very large number of prominent figures such as Sam
| Altman, Hinton, Musk, signers of the Center for AI Safety
| statement, Shane Legg, Martin Minsky, Eliezer Yudkowsky.
|
| No sign yet.
|
| On the other hand, LLMs are writing code which I can debug and
| eventually get to work in a real code base - and script writers
| everywhere are writing scripts more quickly, marketing people are
| writing better ad copy, employers are writing better job ads and
| real estate agents writing better ads for houses.
| lionkor wrote:
| Prominent figure says baseless thing to boost stock prices,
| more news at 6
| godelski wrote:
| The oddity isn't that people lie, the oddity is that people
| continue to believe those who lie. They even give more trust
| to those who constantly lie. This is certainly odd
| TeMPOraL wrote:
| Yes, except half of the list isn't made of prominent people.
| Whose stock price was Eliezer boosting when he was talking
| about these things _15 years ago_?
|
| Nah, it's more that the masses got exposed to those ideas
| recently - ideas which existed long ago, in obscurity - and
| of course _now_ everyone is a fucking expert in this New
| Thing No One Talked About Before ChatGPT.
|
| Even the list GP gave, the specific names on it - the only
| thing that this particular grouping communicates is one
| having no first clue what they're talking about.
| empiko wrote:
| I am still waiting to see the impact on GDP or any other
| economic measure.
| netdevphoenix wrote:
| Fact is even if the world was to end, finding the causes would
| be extremely difficult because...well the world would have
| ended.
| berkes wrote:
| I've had some luck instructing AI to "Don't make up anything. If
| there's no answer, say I don't know".
|
| Which made me think that AI would be far more useful (for me?) if
| it was tuned to "Dutchness" rather than "Americanness".
|
| "Dutch" famously known for being brutally blunt, rude, honest,
| and pushing back.
|
| Yet we seem to have "American" AI, tuned to "the customer is
| always right", inventing stuff just to not let you down, always
| willing to help even if that makes things worse.
|
| Not "critical thinking" or "revolutionary" yet. Just less polite
| and less willing to always please you. In human interaction, the
| Dutch bluntness and honesty can be very off-putting, but It is
| quite efficient and effective. Two traits I very much prefer my
| software to have. I don't need my software to be polite or to not
| hurt my feelings. It's just a tool!
| XCabbage wrote:
| Obvious thought that I haven't tested: can you literally
| achieve this by getting it to answer in Dutch, or training an
| AI on Dutch text? Plausibly* Dutch-language training data will
| reflect this cultural difference by virtue of being written
| primarily by Dutch people.
|
| * (though not necessarily, since the Internet is its own
| country with its own culture, and much training data comes from
| the Internet)
| zoover2020 wrote:
| I've tried Dutch answers and it is more than happy to
| hallucinate and give me answers that are very "American".
| Doesn't help that our culture is very inspired by the US pop
| culture as well since the internet.
|
| Haven't tried prompt engineering with the Dutch stereotype,
| though.
| berkes wrote:
| That hardly works. Though from my limited experiments,
| claude's models are better at this than OpenAIs. OpenAI will,
| quite often, come with suggestions that are literal
| translations of "anglicist" phrases.
|
| Such as "Ik hoop dat deze email u gezond vindt" (I hope this
| email finds you well), which is so wrong that not even
| "simple" translation tools would suggest this.
|
| Seeing that OpenAIs models can (could? This is from a large
| test we did months ago) not even use proper localized phrases
| but uses American ones, I highly doubt it can or will respond
| by refusing answers when it has none based on the training
| data.
| asddubs wrote:
| I suspect it's a balancing act between the AI being generally
| willing to help and avoid responses like this, e.g.:
|
| https://www.sandraandwoo.com/wp-content/uploads/2024/02/twit...
|
| or it just telling you to google it
| shreyshnaccount wrote:
| what (hypothetically) happens when the cost to run the next
| giant llm exceeds the cost to hire a person for tasks like
| this?
| EGreg wrote:
| the R&D continues
| Rescis wrote:
| Given current models can accomplish this task quite
| successfully and cheaply, I'd say that if/when that happens
| it would be a failure of the user (or the provider) for not
| routing the request to the smaller, cheaper model.
|
| Similar to how it would be the failure of the user/provider
| if someone thought it was too expensive to order food in,
| but the reason they thought that was they were looking at
| the cost of chartering a helicopter form the restaurant to
| their house.
| vlovich123 wrote:
| Realtime LLM generation is ~$15/million "words". By
| comparison a human writer at the beginning of a career
| typically earns ~$50k/million words up to
| ~$1million/million words for experienced writers. That's
| about 4-6 orders of magnitude.
|
| Inference costs generally have many orders of magnitude to
| go before it approaches raw human costs & there's always
| going to be innovation to keep driving down the cost of
| inference. This is also ignoring that humans aren't
| available 24/7, have varying quality of output depending on
| what's going on in their personal lives (& ignoring that
| digital LLMs can respond quicker than humans, reducing the
| time a task takes) & require more laborious editing than
| might be present with an LLM. Basically the hypothetical
| case seems unlikely to ever come to reality unless you've
| got a supercomputer AI that's doing things no human
| possibly could because of the amount of data it's operating
| on (at which point, it might exceed the cost but a
| competitive human wouldn't exist).
| j45 wrote:
| Also the more accuracy that is put into the prompt and
| attached, the more accurate the processing is.
| Kabukks wrote:
| I suspect instructing the model to respond with "I don't know"
| more readily will result in more of those responses even though
| there are other options that seem viable according to the
| training data / model.
|
| Remember, LLMs are just statistical sentence completion
| machines. So telling it what to respond with will increase the
| likelihood of that happening, even if there are other options
| that are viable.
|
| But since you can't blindly trust LLM output anyway, I guess
| increasing "I don't know" responses is a good way of reducing
| incorrect responses (which will still happen frequently enough)
| at the cost of missing some correct ones.
| berkes wrote:
| > Remember, LLMs are just statistical sentence completion
| machines. So telling it what to respond with will increase
| the likelihood of that happening, even if there are other
| options that are viable.
|
| Obviously. When I say "tuned" I don't mean adding stuff to a
| prompt. I mean tuning in the way models are also tuned to be
| more or less professional, tuned to defer certain tasks to
| other models (i.e. counting or math, something statistical
| models are almost unable to do) and so on.
|
| I am almost certain that the chain of models we use on
| chatgpt.com are "tuned" to always give an answer, and not to
| answer with "I am just a model, I don't have information on
| this". Early models and early toolchains did this far more
| often, but today they are quite probably tuned to "always be
| of service".
|
| "Quite probably" because I have no proof, other than that it
| will gladly hallucinate, invent urls and references, etc. And
| knowing that all the GPT competitors are battling for users,
| so their products quite certainly tuned to help in this
| battle - e.g. appear to be helpful and all-knowing, rather
| than factual correct and therefore often admittedly ignorant.
| zamadatix wrote:
| Whether you train the model how to do math internally or
| tell it to call an external model which only does math the
| root problem still exists. It's not as if a model which
| only does math won't hallucinate how to solve math problems
| just because it doesn't know about history, for the same
| number of parameters it's probably better to not have to
| duplicate the parts needed to understand the basis of
| things multiple times.
|
| The root problem is training models to be uncertain of
| their answers results in lower benchmarks in every area
| except hallucinations. It's like you were in a multiple
| choice test and instead of picking which of answers A-D you
| think made more sense you picked E "I don't know". Helpful
| for the test grader, a bad bet for the model trying to
| claim it gets the most answers right compared to other
| models.
| Yizahi wrote:
| The so called AI can't "know". It doesn't have understanding if
| the generated text is an answer or of it isn't. You can't force
| that instruction on a neural network, at best it just adjusts
| generated text slightly and you think that it somehow started
| understanding.
| berkes wrote:
| There's a distinction between "a model" and the chain of
| tools and models you employ when asking something on
| chatgpt.com or any of the consumer facing alternatives.
|
| The latter is a chain of models, some specialized in question
| dissecting, some specialized in choosing the right models and
| tools (i.e: there's a calculation in there, lets push that
| part to a simple python function that can actually count
| stuff, and pull the rest through a generic LLM). I experiment
| with such toolchains myself and it's baffling how fast the
| complexity of all this is becoming.
|
| A very simple example would be "question" ->
| "does_it_want_code_generated.model" -[yes]->
| specialized_code_generator.model | -[no]->
| specialized_english_generator.model"
|
| So, sure: a model has no "knowledge", and nor does a chain of
| tools. But having e.g. a model specialized (ie. trained on or
| enriched with) all scientific papers ever, or maybe even a
| vector DB with all that data, somewhere in the toolchain that
| is in charge of either finding the "very likely references"
| or denying an answer would help a lot. It would for me.
| Yizahi wrote:
| Sure, chains of networks can guess at the "passable" answer
| much better/faster/cheaper etc. But that doesn't remove the
| core issue, that none of the sub-networks or decision trees
| can understand what it generates, and so it can't abort its
| work and output "no answer" or something similar.
|
| The whole premise of original request was that user raises
| a task for NN which has a verifiable (maybe partially)
| answer. He sees incorrect answer and wishes that a
| "failure" was displayed instead. But NN can't verify
| correctness of it's output. After all G in GPT stands for
| Generative.
| berkes wrote:
| My simple RAG setup has a steps that will return "We
| don't have this information" if e.g. our vector DB
| returns entries with far too low relevancy scores or if
| the response from the LLM fails to add certain attributes
| in its answer and so on.
|
| Edit: TBC: these "steps" aren't LLMS or other models.
| They're simple code with simple if/elses and an
| accidental regex.
|
| Again: an LLM/NN indeed has no "understanding" of what it
| creates. Especially the LLMs that are "just" statistical
| models. But the tooling around it, the entire chain can
| very well handle this.
| baq wrote:
| How confident can you be in this? Have you analyzed what
| exactly the billions of weights do?
|
| I've got my opinions about what LLMs are and what they
| aren't, but I don't confidently claim that they must be such.
| There's a lot of stuff in those weights.
| Q6T46nT668w6i3m wrote:
| I'm confident that there's no magic and I've spent years
| understanding "what the weights do." You're describing
| weights as magic and they are not.
| hatthew wrote:
| Can you clarify what definition of "understanding" you're
| using here?
| netdevphoenix wrote:
| > known for being brutally blunt, rude, honest, and pushing
| back.
|
| That's a different perspective. Dutch people don't see
| themselves as rude. A Dutch could say that Americans are known
| for being dishonest and not truly conveying what they mean. Yet
| Americans won't see themselves this way. You can replace Dutch
| and American for any other nationality
| berkes wrote:
| I am Dutch, have lived in many countries in several
| continents. I do see myself as rude. But, being Dutch, I
| don't give a ** ;).
| lifestyleguru wrote:
| > "Dutch" famously known for being brutally blunt, rude,
| honest, and pushing back.
|
| Dutch will never bluntly push back if you plan to setup tax
| evasion scheme in their country. Being vicious assholes in
| daily stuff especially towards strangers? That's hardly
| something deserving praise.
| eszed wrote:
| To be fair, that's consequent to the Netherlands' well-known
| love of soda-bread sandwiches.
| lifestyleguru wrote:
| What do you mean, some Irish reference? Oh I see, I
| answered to myself;)
| eszed wrote:
| :-)
|
| I was aiming for juuuust subtle enough for the joke to
| land, if you know the reference. Now I know it did, here
| the rest of y'all go:
|
| https://en.m.wikipedia.org/wiki/Dutch_Sandwich
| msm_ wrote:
| That's... a surprisingly crass thing to say. I would play it
| off as a joke, if not for the second part of your post. Dutch
| people are not "vicious assholes", they have a culture of
| direct communication. Assuming that only your culture
| communication patterns are "correct" is xenophobic and close-
| minded.
|
| And connecting all people in the country with "tax evasion
| schemes" is rude, if that was not actually a joke.
| lifestyleguru wrote:
| I'm just being brutally blunt. It goes both ways. The scale
| of this these evasion schemes is monstrous, not a joke at
| all.
| theshackleford wrote:
| > Being vicious assholes in daily stuff especially towards
| strangers? That's hardly something deserving praise.
|
| I'll take it over the fake American politeness any day, 100
| times over.
| OutOfHere wrote:
| I have seen the other side where a configured AI responds "I
| don't know" far too much, often when it shouldn't. There is
| nothing more useless than it. Certainly we need an accurate
| balance.
| jyounker wrote:
| One of my current models for LLMs is that they're compression
| algorithms. They compress a large amount of training data into
| a set of weights. A query is a key into that compression space.
| Hallucinations happen when you supply a key that corresponds to
| something that wasn't in the training set.
| threeducks wrote:
| The nice think about LLMs is that they can answer some
| questions which were not in the training set. Unfortunately,
| it is not easy to tell when that is the case.
| janalsncm wrote:
| One approach does use this. You can ask an LLM to explicitly
| check its own answers by outputting thinking tokens, generating
| a reward signal if it gets the right answer, and directly
| updating based on the reward signals. That's a part of how
| DeepSeek R1 was trained. It's better but not perfect, because
| the thinking process is imperfect. Ultimately the LLM might not
| know what it doesn't know.
| neom wrote:
| I take my final thoughts out of the LLM and into two other new
| convos, I give both of them the same convo, but I ask one to
| steel man and the other to straw man.. I find it's a decent way
| to look for nuances you're missing.
| KoolKat23 wrote:
| Gemini can be quite direct if it's adamant that it is correct.
| OtherShrezzing wrote:
| >We're currently building very obedient students, not
| revolutionaries. This is perfect for today's main goal in the
| field of creating great assistants and overly compliant helpers.
| But until we find a way to incentivize them to question their
| knowledge and propose ideas that potentially go against past
| training data, they won't give us scientific revolutions yet.
|
| This would definitely be an interesting future. I wonder what
| it'd do to all of the work in alignment & safety if we started
| encouraging AIs to go a bit rogue in some domains.
| TeMPOraL wrote:
| > _If something was not written in a book I could not invent it
| unless it was a rather useless variation of a known theory.
| __More annoyingly, I found it very hard to challenge the status-
| quo__, to question what I had learned._
|
| (__emphasis__ mine)
|
| As if "challenging the status-quo" was the goal in the first
| place. You ain't gonna get any Einstein by asking people to think
| inside the "outside the box" box. "Status quo" isn't the enemy,
| and defying it isn't the path to genius; if you're measuring your
| own intellectual capacity by proxy of how much you question, you
| ain't gonna get anywhere useful. After all, questioning
| everything is _easy_ , and doesn't require any particular skill.
|
| The hard thing is to be _right_ , despite both the status-quo and
| the "question the status-quo" memes.
|
| (It also helps being in the right time and place, to have access
| to the results of previous work that is required to make that
| next increment - that's another, oft forgotten factor.)
| mentalgear wrote:
| BlueSky version:
| https://bsky.app/profile/thomwolf.bsky.social/post/3ljpkl6c6...
|
| ---
|
| Quite interesting post that asks the right question about "asking
| the right questions". Yet one aspect I felt missing (which might
| automatically solve this) is first-principles-based causal
| reasoning.
|
| A truly intelligent system -- one that reasons from first
| principles by running its own simulations and physical
| experiments -- would notice if something doesn't align with the
| "textbook version".
|
| It would recognize when reality deviates from expectations and
| ask follow-up questions, naturally leading to deeper insights and
| the right questions - and answers.
|
| Fascinating in this space is the new "Reasoning-Prior" approach
| (MIT Lab & Harvard), which trains reasoning capabilities learned
| from the physical world as a foundation for new models (before
| evening learning about text).
|
| Relevant paper: "General Reasoning Requires Learning to Reason
| from the Get-go."
| mentalgear wrote:
| PS: great explainer video
| https://www.youtube.com/watch?v=seTdudcs-ws&t=180s
| moralestapia wrote:
| >Just consider the crazy paradigm shift of special relativity and
| the guts it took to formulate a first axiom like "let's assume
| the speed of light is constant in all frames of reference"
| defying the common sense of these days (and even of today...)
|
| I'm not an expert on this. Wasn't this an observed phenomenon
| before Albert put together his theory?
| tim333 wrote:
| It was an observed phenomenon -
| https://en.wikipedia.org/wiki/Michelson%E2%80%93Morley_exper...
|
| Einsteins more impressive stuff was explaining that by time
| passing at different rates for different observers
| zesterer wrote:
| Weird problems with physics were everywhere before Einstein.
| Maxwell comes _painfully_ close to discovering GR in some of
| his musings on black body radiation.
|
| Noticing that there was a problem was not the breakthrough:
| trying something bizarre and counter-cultural - like assuming
| light speed is invariant over the observer - just to see if
| anything interesting drops out was the breakthrough.
| tim333 wrote:
| >I'm afraid AI won't give us a "compressed 21st century".
|
| There's no mention of exponential growth which seems a major
| omission when you are talking about centuries. Computers have
| kept improving in a Moore's law like way in terms of compute per
| dollar and no doubt will keep on like that for a while yet. Give
| it a few years and AI tech will be way better than what we have
| now. I don't know about exact timings like 5-10 years but in a
| while.
| dimitri-vs wrote:
| What exponential growth? By all accounts things are slowing
| down: sonnet3.7 is not exponentially better, neither is gpt4.5,
| grok3 is just catching up. I'm still using sonnet3.5 for a lot
| of coding because IMO it's better than 3.7.
| tim333 wrote:
| Exponential growth of computing power which will lead to a
| gradual increase in AI performance. I think the oldest LLM
| you mention there is nine months old which is not very long
| in the scheme of things but give it a couple of years and
| you'll probably see a good improvement.
| zesterer wrote:
| The whole point of this post is that the things AI isn't good
| at and has never been good at will be the _limit_ to otherwise-
| exponential growth.
| tim333 wrote:
| Well, yeah the post kind of tries to argue that but it is
| also talking about how we don't have an Einstein or Newton
| like AI. Those two are outliers thought of as some of the
| smartest scientists ever to have lived and so are a bit of an
| unrealistic target just now.
|
| As to whether AI can go beyond doing what it's told and make
| new discoveries, we've sort of seen that a bit with for
| example the AlphaGo type programs coming up with modes of
| play humans hadn't thought of. I guess I don't buy the
| hypothesis that if you had an AI smarter than Einstein it
| wouldn't be able to make Einstein like discoveries due to not
| being a rebel.
| rcarmo wrote:
| He means YMaaS, no? Might as well coin the acronym early.
| dang wrote:
| (Most comments here were posted to
| https://news.ycombinator.com/item?id=43317269 and then moved
| hither.)
| ilaksh wrote:
| I think it's more of a social phenomenon than an intellectual
| characteristic. I guess these days people would just assume that
| outlier ideas come from autism, but I think that isn't
| necessarily true.
|
| But maybe it helps to be socially isolated or just stubborn.
| People do not want to accept new approaches.
|
| Clearly they do eventually, but there is always some friction.
|
| But I think that it's been shown that through promoting and
| various types of training or tuning, LLMs can be configured to be
| non- sycophantic. It's just that humans don't want to be
| contradicted so that can be trained out of them during
| reinforcement.
|
| Along with the training process just generally being aimed at
| producing expected rather than unexpected answers.
| randomNumber7 wrote:
| Thing about the Einstein example is, that it was already known
| the speed of light is constant.
|
| The question he asked was just that this fact was not compatible
| with the Maxwell equations.
| systemstops wrote:
| Wouldn't the ability to "ask the right questions" require that AI
| could update its own weights, as those weights determine which
| questions can be asked?
| ypeterholmes wrote:
| Hey look, the goalposts are being moved again. This time it's
| from top end researcher to generational genius. Question: what
| evidence is there that this benchmark will not be reached also?
| Time and again these essays make the mistake of assuming AI is a
| static thing, and refuse to acknowledge the inexorable march
| forward we are witnessing. As humans, we cling to our own fragile
| superiority. Even on this thread- I thought Hinton said the world
| would be transformed by now. That's NOT what was claimed. We are
| like three years in! Posts like this will be laughable in 10
| years.
| nl wrote:
| > Hey look, the goalposts are being moved again.
|
| Typically the "moving goalpost" posts are "we don't have AI
| because ....". That's not what this post is doing - it's
| pointing out a genuine weakness and a way forward.
| janalsncm wrote:
| > Many have been proposing "move 37" as evidence that AI has
| already reached Einstein-level intelligence
|
| I don't think this example applies in the ways we care about.
| Sure, in the domain of go we have incredibly powerful engines.
| Poker too, which is an imperfect information game which you could
| argue is more similar to life in that regard.
|
| But life has far more degrees of freedom than go or poker, and
| the "value" of any one action is impossible to calculate due to
| imperfect information. And unlike in poker, where probabilities
| can be calculated, we don't even have the probability
| distribution for most events, even if we could enumerate them.
| haswell wrote:
| I didn't interpret the mention of move 37 in the way I think
| you are here.
|
| The author brought it up specifically to highlight that they
| don't believe move 37 signifies what many people think it does,
| and that while impressive, it's not general enough to indicate
| what some people seem to believe it indicates.
|
| In essence, I think they said the same thing you are using
| different words.
| janalsncm wrote:
| I don't disagree with the author, I just think their argument
| isn't as strong as it could be. Excelling in a constrained
| decision space like go is fundamentally less difficult than
| doing the same in the real world. It's a categorical
| difference that the author didn't mention.
|
| I'm also not even convinced move 37 was properly explained as
| a "straight A student" behavior. AlphaGo did bootstrap by
| studying human games but it also learned more fundamental
| value functions via self play.
| phillipcarter wrote:
| A way I've been thinking about this today is:
|
| We can't distinguish between a truly novel response from an LLM
| or a hallucination.
|
| We can get some of the way there, such as if we know what the
| outcome to a problem should look like, and are seeking a better
| function to achieve that outcome. Certainly at small scales and
| in environments where there are minimal consequences for failure,
| this could work.
|
| But this breaks down as things get more complicated. We won't be
| able to test the effectiveness of 100 million potential solutions
| to eradicating brain tumors at once. Even if we somehow arrive at
| guaranteeing that every unforeseen consequence is also accounted
| for in our exercise in specifying the goals and constraints of
| the problem. We just simply don't have the logistics to run 100
| million clinical trials where we also know how to account for
| countless confounding effects (let alone consent!)
| tyronehed wrote:
| The first thing you need to understand is that no current llm
| based, transformer architected AI is going to get to agi. The
| design in essence is not capable of that kind of creativity. In
| fact no AI that has at its root a statistical analysis or
| probabilistic correlation will get us past the glorified Google
| parlor trick that is the modern llm in every form.
|
| A great leap in IP but unfortunately is too important to blab
| about widely, is the solution to this problem and the
| architecture that will be contained in the ultimate AGI solution
| that emerges.
| hackerknew wrote:
| Could we train an AI model on the corpus of physics knowledge up
| to the year 1905 and then see if we can adjust the prompt to get
| it to output the theory of relativity?
|
| This would be an interesting experiment for other historical
| discoveries too. I'm now curious if anybody has created a model
| with "old data" like documents and books from hundreds of years
| ago, and see if comes up with the same conclusions as researchers
| and scientists of the past.
|
| Would AI have been able to predict the effectiveness of vaccines,
| insulin, other medical discoveries?
| knowaveragejoe wrote:
| Now that would be interesting!
| nahuel0x wrote:
| We saw algorithms designing circuits that no human engineer would
| design, even before the LLM (using genetic algorithms). So out-
| the-box thinking can be also more reachable than this author
| thinks.
___________________________________________________________________
(page generated 2025-03-10 23:00 UTC)