[HN Gopher] LLMs Will Always Hallucinate, and We Need to Live wi...
___________________________________________________________________
LLMs Will Always Hallucinate, and We Need to Live with This
Author : Anon84
Score : 173 points
Date : 2024-09-14 17:02 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| zyklonix wrote:
| We might as well embrace them:
| https://github.com/DivergentAI/dreamGPT
| bicx wrote:
| I treat LLMs like a fallible being, the same way I treat humans.
| I don't just trust output implicitly, and I accept help with
| tasks knowing I am taking a certain degree of risk. Mostly, my
| experience has been very positive with GPT-4o / ChatGPT and
| GitHub copilot with that in mind. I use each constantly
| throughout the day.
| ulbu wrote:
| I treat them as always hallucinating and it just so happens
| that, by accident, they sometimes produce results that resemble
| intentional, considered or otherwise veritable to the human
| observer. The accident is sometimes of a high probability, but
| still an accident. Humans are similar, but we are the standard
| for what's to be considered a hallucination, clinically
| speaking. For us, a hallucination is a cause of concern. For
| llms, it's just one of all the possible results. Monkeys
| bashing on a typewriter. This difference is an essential one,
| imo.
| everdrive wrote:
| One big difference is that at least some people have a healthy
| sense for when they may be wrong. This sort of meta-cognitive
| introspection is currently not possible for an LLM. For
| instance, let's say I asked someone "do you know the first 10
| elements of the periodic table of elements?" Most people would
| be able to accurately say "honestly I'm not sure what comes
| after Helium." But an LLM will just make up some bullshit, at
| least some of the time.
| TheGeminon wrote:
| There are ways to gauge the confidence of the LLM (token
| probabilities over the response, generating multiple outputs
| and checking consistency), but yeah that's outside the LLM
| itself. You could feed the info back to the LLM as a
| status/message I suppose
| christianqchung wrote:
| The problem is also that the model may have a very high
| confidence in token probability and is still wrong, but I'm
| sure it could help in some cases.
| Der_Einzige wrote:
| The idea of hooking LLMs back up to themselves, i.e. giving
| them token prob information somehow or even giving them
| control over the settings they use to prompt themselves is
| AWESOME and I cannot believe that no one has seriously done
| this yet.
|
| I've done it in some jupyter notebooks and the results are
| really neat, especially since LLMs can be made with a tiny
| bit of extra code to generate a context "timer" that they
| wait before they _prompt themselves_ to respond, creating a
| proper conversational agent system (i.e. not the walkie
| talkie systems of today)
|
| I wrote a paper that mentioned doing things like this for
| having LLMs act as AI art directors:
| https://arxiv.org/abs/2311.03716
| The_Colonel wrote:
| That's indeed the biggest problem, because it limits its
| usefulness to questions for which you can verify the
| correctness. (Don't get me wrong, you can still get a lot of
| utility out of that, since for many problems finding the
| [candidate] solutions is much more difficult than verifying
| them)
|
| OT, but this also reminds me how much I despise bullshitters.
| Sometimes right, something wrong, but always confident. In
| the end, you can't trust what they say.
| giantrobot wrote:
| > I treat LLMs like a fallible being, the same way I treat
| humans.
|
| The issue is LLMs are not marketed in this way. They're
| marketed as all knowing oracles to people that have been
| conditioned to just accept the first result Google gives them.
| mxwsn wrote:
| OK - there's always a nonzero chance of hallucination. There's
| also a non-zero chance that macroscale objects can do quantum
| tunnelling, but no one is arguing that we "need to live with
| this" fact. A theoretical proof of the impossibility of reaching
| 0% probability of some event is nice, but in practice it says
| little about whether we can exponentially decrease the
| probability of it happening or not to effectively mitigate risk.
| panarky wrote:
| Exactly.
|
| LLMs will sometimes be inaccurate. So are humans. When LLMs are
| clearly better than humans for specific use cases, we don't
| need 100% perfection.
|
| Autonomous cars will sometimes cause accidents. So do humans.
| When AVs are clearly safer than humans for specific driving
| scenarios, we don't need 100% perfection.
| talldayo wrote:
| > When AVs are clearly safer than humans for specific driving
| scenarios, we don't need 100% perfection.
|
| People didn't stop refining the calculator once it was fast
| enough to beat a human. It's reasonable to expect absolute
| idempotent perfection from a robot designed to manufacture
| text.
| wkat4242 wrote:
| Yeah if the computer had been wrong 3 out of 10 times it
| never would have been a thing.
| kfarr wrote:
| Not sure absolute perfection is a concept that can exist in
| the universe of words.
| tourmalinetaco wrote:
| Maybe, down the line. The calculator went through a long
| period of perfecting until it became as powerful as they
| are today. It's only natural LLMs will also take time. And
| much like calculators moving from stepped drums, to vacuum
| tubes, to finally transistors, the way we build LLMs are
| sure to change. Although I'm not quite sure idempotence is
| something LLMs are capable of.
| anthk wrote:
| A Scientific HP calculator from late 80's was powerful
| enough to cover most of Engineering classes.
| krapp wrote:
| If we only used LLMs for use cases where they exceed human
| ability, that would be great. But we don't. We use them to
| replace human beings in the general case, and many people
| believe that they exceed human ability in every relevant
| factor. Yet if human beings failed as often as LLMs do at the
| tasks for which LLMs are employed, those humans would be
| fired, sued and probably committed.
|
| Yet any arbitrary degree of error can be dismissed in LLMs
| because "humans do it too." It's weird.
| SpicyLemonZest wrote:
| I don't think it's true that modern LLMs are used to
| replace human beings in the general case, or that any
| significant number of people believe they exceed human
| ability in every relevant factor.
| digger495 wrote:
| LLMs will always have some degree of inaccuracy.
|
| FtFY.
| unshavedyak wrote:
| Plus, why do we care about that degree? If we could make it so
| humans don't hallucinate too that would be great, but it ain't
| happening. Humans memory gets polluted the moment you feed them
| new information, as evidence by how much care we have to give
| when trying to extract information when it matters, like law
| enforcement.
|
| People rag on LLMs constantly and i get it, but they then give
| humans way too much credit imo. The primary difference i feel
| like we see with LLMs vs Humans is complexity. No, i don't
| personally believe LLMs can scale to human "intelligence".
| However atm it feels like comparing a worm brain to a human
| intelligence and saying that's evidence that neurons can't
| reach human intelligence level.. despite the worm being a
| fraction of the underling complexity.
| threeseed wrote:
| Humans have two qualities that make them infinitely superior
| to LLMs for similar tasks.
|
| a) They don't give detailed answers for questions they have
| no knowledge about.
|
| b) They learn from their mistakes.
| amelius wrote:
| > there's always a nonzero chance of hallucination. There's
| also a non-zero chance that macroscale objects can do quantum
| tunnelling, but no one is arguing that we "need to live with
| this" fact.
|
| True, but it is defeatist and goes against a good
| engineering/scientific mindset.
|
| With this attitude we'd still be practicing alchemy.
| simonw wrote:
| A key skill necessary to work effectively with LLMs is learning
| how to use technology that is fundamentally unreliable and non-
| deterministic.
|
| A lot of people appear to find this hurdle almost impossible to
| overcome.
| jampekka wrote:
| LLMs are not fundamentally non-deterministic. It's trivial to
| do e.g. greedy sampling generation.
| threeseed wrote:
| Honesty and accuracy builds trust.
|
| And when you trust something it reduces the cognitive load
| because you don't have to build a mental model of the different
| ways it could be deceiving you and how to handle it.
|
| Which is why for me at least when I use LLMs I find them useful
| but stressful.
| jampekka wrote:
| I'm of the opinion that the current architectures are
| fundamentally ridden with "hallucinations" that will severely
| limit their practical usage (including very much what the hype
| thinks they could do). But this article puts an impossible limit
| to what it is to "not-hallucinate".
|
| It essentially restates well known fundamental limitations of
| formal systems and mechanistic computation and then presents the
| trivial result that LLMs also share these limitations.
|
| Unless some dualism or speculative supercomputational quantum
| stuff is invoked, this holds very much to humans too.
| diggan wrote:
| > fundamentally ridden with "hallucinations" that will severely
| limit their practical usage
|
| On the other hand, a LLM that got rid of "hallucinations" is
| basically just a thing that copy-paste at that point. The
| interesting properties from LLMs comes from the fact that it
| can kind of make things up but still make them believable.
| jampekka wrote:
| As per this article, even copy-paste hallucinate e.g. because
| there are no infinite datasets.
| SoftTalker wrote:
| > can kind of make things up but still make them believable
|
| This is the definition of a _bullshitter_ , by the way.
| Kerb_ wrote:
| Or, you know, fiction writer. Some of us like little
| stories.
| User23 wrote:
| C.S. Peirce, who is known for characterizing abductive
| reasoning and had a considerable on John Sowa's old school AI
| work, had an interesting take on this. I can't fully do it
| justice, but essentially he held that both matter and mind are
| real, but aren't dual. Rather, there is a smooth and continuous
| transition between the two.
|
| However, whatever the nature of mind and matter really is, we
| have convincing evidence of human beings creating meaning in
| symbols by a process Peirce called semiosis. We lack a properly
| formally described semiotic, although much interesting
| mathematical applied philosophy has been done in the space (and
| frankly a ton of bullshit in the academy calls itself semiotic
| too). Until we can do that, we will probably have great
| difficulty producing an automaton that can perform semiosis.
| So, for now, there certainly remains a qualitative difference
| between the capabilities of humans and LLMs.
| wrs wrote:
| I don't know the technical philosophy terms for this, but my
| simplistic way of thinking about it is that when I'm
| "seriously" talking (not just emitting thoughtless cliche
| phrases), I'm talking _about_ something. And this is
| observable because sometimes I have an idea that I have
| trouble expressing in words, where I know that the words I'm
| saying are not properly expressing the idea that I have. (I
| mean -- that's happening right now!)
|
| I don't see how that could ever happen for an LLM, because
| _all it does_ is express things in words, and all it knows is
| the words that people expressed things with. We know for
| sure, that's just what the code _does_ ; there's no question
| about the underlying mechanism, like there is with humans.
| User23 wrote:
| > I have an idea that I have trouble expressing in words,
| where I know that the words I'm saying are not properly
| expressing the idea that I have. (I mean -- that's
| happening right now!)
|
| That's a quality insight. Which, come to think of it, is an
| interestingly constructed word given what you just said.
| gradus_ad wrote:
| Producing text is only the visible end product. The LLM is
| doing a whole lot behind the scenes, which is conceivably
| analogous to the thought space from which our own words
| flow.
| smokedetector1 wrote:
| You can simulate a NAND gate using balls rolling down a
| specially designed wood board. In theory you could
| construct a giant wood board with billions and billions
| of balls that would implement the inference step of an
| LLM. Do you see these balls rolling down a wood board as
| a form of interiority/subjective experience? If not, then
| why do you give it to electric currents in silicon? Just
| because it's faster?
| User23 wrote:
| Tangentially, Peirce has a diagrammatic logical system
| that's built entirely on conjunction and negation which
| is isomorphic to propositional logic. He also defined
| extensions for what we now call predicate logic and modal
| logic.
|
| John Sowa annotated Peirce's tutorial and it's quite
| interesting[1].
|
| [1] https://www.jfsowa.com/pubs/egtut.pdf
| RealityVoid wrote:
| Your point of disagreement is the _medium_ of
| computation? The same point can be made about neurons.
|
| Do you think you could have the same kind of cognitive
| processes you have now if you were thinking 1000x slower
| than you do? Speed of processing matters, especially when
| you have time bounds on reaction, such in real life.
|
| Another problem with balls would be the necessity of
| perception, that you can't really do with balls alone,
| you need different kind of medium for perception and
| interaction, that humans, (and comuputers) do have.
| User23 wrote:
| Are you familiar with Searle's work[1] on the subject?
| It's fun how topical it is here. Anyhow maybe the medium
| doesn't matter, but the burden of proof for that claim is
| on you, because it's contrary to experience, intuition,
| and thought experiment.
|
| [1] https://plato.stanford.edu/entries/chinese-room/
| wrs wrote:
| Yes, but that space is entirely derived from human
| expressions, in words, of their own thought space. The
| LLM has no direct training access to the humans' thoughts
| like it does to their words. So if it does have
| comparable thought space, that would imply such a space
| can be reconstructed accurately after passing through
| expression in words, which seems like a unsupported claim
| based on millennia of humans having trouble understanding
| each others' thoughts based on verbal communication, and
| students writing essays that are superficially similar to
| the texts they've read, but clearly indicate they haven't
| internalized the concepts they were supposedly learning.
|
| It's not to say there couldn't be a highly multimodal and
| self-training model that developed a similar thought
| space, which would be very interesting to study. It just
| seems like LLMs aren't enough.
| o11c wrote:
| What impresses me is frankly how _bad_ it is.
|
| I can't claim to have tried every model out there, but most
| models very quickly fail when asked to do something along the
| lines of "describe the interaction of 3 entities." They can
| usually handle 2 (up to the point where they inevitably start
| talking in circles - often repeating entire chunks verbatim in
| many models), but 3 seems utterly beyond them.
|
| LLMs might have a role in the field of "burn money to generate
| usually-wrong ideas that are cheap enough to check in case
| there's actually a good one" though.
| treebeard901 wrote:
| If everyone else can hallucinate along with it then problem
| solved.
| jmakov wrote:
| Maybe time for getting synth minds from guessing to reason.
| feverzsj wrote:
| Maybe, it's time for the bubble to burst.
| rvz wrote:
| But but before that we need to achieve what we call "AGI"
| first.
|
| Even before that we need to define it, first and the reality is
| _no-one_ knows what "AGI" even is. Thus it could be anything.
|
| The fact that Sam doesn't believe that AGI has been "achieved"
| yet even after GPT-3.5, ChatGPT, GPT-4 (multi-modal), and with
| o1 (Strawberry) suggests that what AGI _really_ means is to
| capture the creation and work of billions, raise hundreds of
| billions of dollars and for everyone to be on their UBI based
| scheme, whilst they enrich themselves as the bubble continues.
|
| Seems like the hallucinations are an excuse to say that AGI has
| not yet been achieved. So it time to raise billions more money
| on training, energy costs on inference all for it to continue
| to hallucinate.
|
| Once all the value has been captured by OpenAI and the insiders
| cash out, THEN they would want the bubble to burst with 95% of
| AI startups disappearing (Except OpenAI).
| lolinder wrote:
| > By establishing the mathematical certainty of hallucinations,
| we challenge the prevailing notion that they can be fully
| mitigated
|
| Having a mathematical proof is nice, but honestly this whole
| misunderstanding could have been avoided if we'd just picked a
| different name for the concept of "producing false information in
| the course of generating probabilistic text".
|
| "Hallucination" makes it sound like something is going awry in
| the normal functioning of the model, which subtly suggests that
| if we could just identify what went awry we could get rid of the
| problem and restore normal cognitive function to the LLM. The
| trouble is that the normal functioning of the model is simply to
| produce plausible-sounding text.
|
| A "hallucination" is not a malfunction of the model, it's a value
| judgement we assign to the resulting text. All it says is that
| the text produced is not fit for purpose. Seen through that lens
| it's obvious that mitigating hallucinations and creating
| "alignment" are actually identical problems, and we won't solve
| one without the other.
| gwervc wrote:
| This comment should be pinned at the top of any LLM-related
| comment section.
| Moru wrote:
| It should be part of every AI related story on the news. Just
| like they keep saying "X, formerly Twitter".
| elif wrote:
| Nah it's quite pedantic to say that 'this neologism does not
| encapsulate the meaning it's meant to'
|
| This is the nature of language evolution. Everyone knows what
| hallucination means with respect to AI, without trying to
| confer to its definition the baggage of a term used for
| centuries as a human psychology term.
| lolinder wrote:
| Neologism undersells what this term is being used for. It's
| a technical term of art that's created its own semantic
| category in LLM research that separates "text generated
| that is factually inaccurate according to ${sources}" from
| "text generated that is morally repugnant to
| ${individuals}" or "text generated that ${governments} want
| to censor".
|
| These three categories are entirely identical at a
| technological level, so I think it's entirely reasonable to
| flag that serious LLM researchers are treating them as
| distinct categories of problems when they're fundamentally
| not at all distinct. This isn't just a case of linguistic
| pedantry, this is a case of the language actively impeding
| a proper understanding of the problem by the researchers
| who are working on that problem.
| wrs wrote:
| Yes, exactly, it's a post-facto value judgment, not a precise
| term. If I understand the meaning of the word, "hallucination"
| is _all the model does_. If it happens to hallucinate something
| we think is objectively true, we just decide not to call that a
| "hallucination". But there's literally no functional difference
| between that case and the case of the model saying something
| that's objectively false, or something whose objective truth is
| unknown or undefinable.
|
| I haven't read the paper yet, but if they resolve this
| definition usefully, that would be a good contribution.
| diputsmonro wrote:
| Exactly this, I've been saying this since the beginning.
| _Every_ response is a hallucination - a probabilistic string
| of words divorced from any concept of truth or reality.
|
| By total coincidence, some hallucinations happen to reflect
| the truth, but only because the training data happened to
| generally be truthful sentences. Therefore, creating
| something that imitates a truthful sentence will often happen
| to also be truthful, but there is absolutely no guarantee or
| any function that even attempts to enforce that.
|
| All responses are hallucinations. Some hallucinations happen
| to overlap the truth.
| fumeux_fume wrote:
| Ok, but I think it would be more productive to educate
| people that LLMs have no concept of truth rather than
| insist they use the term "hallucinate" in an unintuitive
| way.
| lolinder wrote:
| I don't know about OP, but I'm suggesting that the term
| 'hallucinate' be abolished entirely as applies to LLMs,
| not redefined. It draws an arbitrary line in the middle
| of the set of problems that all amount to "how do we make
| sure that the output of an LLM is consistently
| acceptable" and will all be solved using the same
| techniques if at all.
| aeternum wrote:
| LLMs do now have a concept of truth now since much of the
| RLHF is focused on making them more accurate and true.
|
| I think the problem is that humanity has a poor concept
| of truth. We think of most things as true or not true
| when much of our reality is uncertain due to fundamental
| limitations or because we often just don't know yet.
| During covid for example humanity collectively
| hallucinated the importance of disinfecting groceries for
| awhile.
| jbm wrote:
| > humanity collectively hallucinated the importance of
| disinfecting groceries for awhile
|
| I reject this history.
|
| I homeschooled my kids during covid due to uncertainty
| and even I didn't reach that level, and nor did anyone I
| knew in person.
|
| A very tiny number who were egged on by some YouTubers
| did this, including one person I knew remotely.
| Unsurprisingly that person was based in SV.
| plaidfuji wrote:
| In other words, all models are wrong, but some are useful.
| TeMPOraL wrote:
| I think you're going too far here.
|
| > _By total coincidence, some hallucinations happen to
| reflect the truth, but only because the training data
| happened to generally be truthful sentences._
|
| It's not a "total coincidence". It's the default. Thus, the
| model's responses aren't "divorced from any concept of
| truth or reality" - the whole distribution from which those
| responses are pulled is strongly aligned with reality.
|
| (Which is why people started using the term
| "hallucinations" to describe the failure mode, instead of
| "fishing a coherent and true sentence out of line noise" to
| describe the success mode - because success mode
| dominates.)
|
| Humans didn't invent language for no reason. They don't
| communicate to entertain themselves with meaningless
| noises. Most of communication - whether spoken or written -
| is deeply connected to reality. Language itself is deeply
| connected to reality. Even the most blatant lies, even all
| of fiction writing, they're all incorrect or fabricated
| only at the surface level - the whole thing, accounting for
| the utterance, what it is about, the meanings, the words,
| the grammar - is strongly correlated with truth and
| reality.
|
| So there's absolutely no coincidence that LLMs get things
| right more often than not. Truth is thoroughly baked into
| the training data, simply because it's a data set of real
| human communication, instead of randomly generated
| sentences.
| ahepp wrote:
| maybe hallucination is all cognition is, and humans are just
| really good at it?
| theturtle32 wrote:
| In my experience, humans are _at least_ as bad at it as
| GPT-4, if not _far worse_. In terms, specifically, of being
| "factually accurate" and grounded in absolute reality.
| Humans operate entirely in the probabilistic realm of what
| seems right to us based on how we were educated, the values
| we were raised with, our religious beliefs, etc. -- Human
| beings are all over the map with this.
| ruthmarx wrote:
| > In my experience, humans are at least as bad at it as
| GPT-4, if not far worse.
|
| I had an argument with a former friend recently, because
| he read some comments on YouTube and was convinced a
| racoon raped a cat and produced some kind of hybrid
| offspring that was terrorizing a neighborhood. Trying to
| explain that different species can't procreate like that
| resulted in him pointing to the fact that other people
| believed it in the comments as proof.
|
| Say what you will about LLMs, but they seem to have a
| better basic education than an awful lot of adults, and
| certainly significantly better basic reasoning
| capabilities.
| Terr_ wrote:
| "So we built a blind-guessing machine, but how can we tweak it
| so that its blind guesses always happen to be good?"
| durumu wrote:
| I think there's a useful distinction between plausible-seeming
| text that is wrong in some subtle way, vs text that is
| completely fabricated to match a superficial output format, and
| the latter is what I wish people used "hallucination" to mean.
| A clear example of this is when you ask an LLM for some
| sources, with ISBNs, and it just makes up random titles and
| ISBNs that it knows full well do not correspond with reality.
| If you ask "Did you just make that up?" the LLM will respond
| with something like "Sorry, yes, I made that up, I actually
| just remembered I can't cite direct sources." I wonder if this
| is because RLHF teaches the LLM that humans in practice prefer
| properly formatted fake output over truthful refusals?
| majormajor wrote:
| How does a model "know full well" that it output a fake ISBN?
|
| It's been trained that sources look like plausible-titles +
| random numbers.
|
| It's been trained that when challenged it should say "oh
| sorry I can't do this."
|
| Are those things actually distinct?
| rerdavies wrote:
| In fairness, they will also admit they were wrong even if
| they were right.
| paulddraper wrote:
| It's an inaccuracy.
| renjimen wrote:
| Maybe with vanilla LLMs, but new LLM training paradigms include
| post-training with the explicit goal of avoiding over-confident
| answers to questions the LLM should not be confident about
| answering. So hallucination is a malfunction, just like any
| overconfident incorrect prediction by a model.
| aprilthird2021 wrote:
| I still think OP has a point. The LLMs evolved after public
| use to be positioned as oracles which know so much knowledge.
| They were always probabilistic content generators, but people
| use them the way they use search engines, to retrieve info
| they know exists but don't exactly know.
|
| Since LLMs aren't designed for this there's a whole post
| process to try to make them amenable to this use case, but it
| will never plug that gap
| ta8645 wrote:
| > but it will never plug that gap
|
| They don't have to be perfect, they just have to be better
| than humans. And that seems very likely to be achievable
| eventually.
| AlexandrB wrote:
| To be better than humans they have to able confidently
| say "I don't know" when the correct answer is not
| available[1]. To me this sounds like a totally different
| type of "knowledge" than stringing words together based
| on a training set.
|
| [1] LLMs are already better than humans in terms of
| breadth, and sometimes depth, of knowledge. So it's not a
| problem of the AI knowing more facts.
| aprilthird2021 wrote:
| Umm, is this true? Tons of worthless technology is better
| than humans at something. It has to be better than humans
| AND better than existing technology.
| tsimionescu wrote:
| The only time the LLM can be somewhat confident of its answer
| is when it is reproducing verbatim text from its training
| set. In any other circumstance, it has no way of knowing if
| the text it produced is true or not, because fundamentally it
| only knows if it's a likely completion of its input.
| renjimen wrote:
| Post training includes mechanisms to allow LLMs to
| understand areas that they should exercise caution in
| answering. It's not as simple as you say anymore.
| BurningFrog wrote:
| "Hallucinations" just means that occasionally the LLM is wrong.
|
| The same is true of people, and I still find people extremely
| helpful.
| leptons wrote:
| Except the LLM didn't deliver a "wrong" result, it delivered
| text that is human readable and makes grammatical sense.
| Whether or not the information contained in the text is
| "wrong" is subjective, and the reader gets to decide if it's
| factual or not. If the LLM delivered unreadable gibberish,
| then that could be considered "wrong", but there is no
| "hallicinating" going on with LLMs. That's an
| anthropomorphism that is divorced from the reality of what an
| LLM does. Whoever called it "hallucinating" in the context of
| an LLM should have their nerd credentials revoked.
| imchillyb wrote:
| Human beings have a tendency to prefer comforting lies over
| uncomfortable truths.
|
| "The truth may set you free, but first it's really gonna
| piss you off." -G.S.
| AnimalMuppet wrote:
| Human readable, makes grammatical sense, and _wrong_. And
| no, that 's often not subjective.
| lolinder wrote:
| People constantly make this mistake, so just to clarify:
| absolutely nothing about what I just said implies that llms
| are not helpful.
|
| Having an accurate mental model for what a tool is doing does
| not preclude seeing its value, but it does preclude getting
| caught up in unrealistic hype.
| nativeit wrote:
| I don't know who/how the term was initially coined in this
| context, but I'm concerned that the things that make it
| inaccurate are also, perhaps counterintuitively, things that
| serve the interests of those who would overstate the
| capabilities of LLMs, and seek to cloud their true nature
| (along with inherent limitations) to investors and potential
| buyers. As you already pointed out, the term implies that the
| problems that are represented are temporary "bugs", rather than
| symptoms of the underlying nature of the technology itself.
| rerdavies wrote:
| How different things would be if the phenomenon had been called
| "makin' stuff up" instead. Humans make stuff up all the time,
| and make up far more outrageous things than AIs make up. One
| has to ask whether humans are really intelligent /not entirely
| sarcasm.
| TeMPOraL wrote:
| I'd prefer the phenomenon be called "saying the first thing
| that comes to your mind" instead, because humans do that a
| lot as well, and _that_ happens to produce pretty much the
| same failures as LLMs do.
|
| IOW, humans "hallucinate" _exactly_ the same way LLMs do -
| they just usually don 't say those things out loud, but
| rather it's a part of the thinking process.
|
| See also: people who are somewhat drunk, or very excited,
| tend to lose inhibitions around speaking, and end up
| frequently just blurting whatever comes to their mind
| verbatim (including apologizing and backtracking and "it's
| not what I meant" when someone points out the nonsense).
| pointlessone wrote:
| Confabulation is the term I've seen used a few times. I think
| it reflects what's going on in LLMs better.
| calf wrote:
| Your argument makes several mistakes.
|
| First, you have just punted the validation problem of what a
| Normal LLM Model ought to be doing. You rhetorically declared
| hallucinations to be part of the normal functioning (i.e., the
| word "Normal" is already a value judgement). But we don't even
| know that - we would need theoretical proof that ALL
| theoretical LLMs (or neural networks as a more general
| argument) cannot EVER attain a certain probabilistic
| distribution. This is a theoretical computer science problem
| and remains an open problem.
|
| So the second mistake is your probabilistic reductionism. It is
| true that LLMs, neural nets, and human brains alike are based
| on probabilistic computations. But the reasonable definition of
| a Hallucination is stronger than that - it needs to capture the
| notion that the probabilistic errors are way too extreme
| compared to the space of possible correct answers. An example
| of this is that Humans and LLMs get Right Answers and Wrong
| Answers in qualitatively very different ways. A concrete
| example of that is that Humans can demonstrate correctly the
| sequence of a power set (an EXP-TIME problem), but LLMs
| theoretically cannot ever do so. Yet both Humans and LLMs are
| probabilistic, we are made of chemicals and atoms.
|
| Thirdly, the authors' thesis is that mitigation is impossible.
| It is not some "lens" where mitigation is equal to alignment,
| in fact one should use their thesis to debunk the notion that
| Alignmnent is an attainable problem at all. It is formally
| unsolvable and should be rendered as a absurd as someone
| claiming prima facie that the Halting Problem is solvable.
|
| Finally, the meta issue is that the AI field is full of people
| who know zip about theoretical computer science. The vast
| majority of CS graduates have had maybe 1-2 weeks on Turing
| machines; an actual year-long course at the sophomore-senior
| level on theoretical computer science is Optional and for
| mathematically mature students who wish to concentrate in it.
| So the problem arises is a matter of a language and conceptual
| gap between two subdisciplines, the AI community and the TCS
| community. So you see lots of people believing in very
| simplistic arguments for or against some AI issue without a
| strong theoretical grounding that while CS itself has, but is
| not by default taught to undergraduates.
| Terr_ wrote:
| > You rhetorically declared hallucinations to be part of the
| normal functioning (i.e., the word "Normal" is already a
| value judgement).
|
| No they aren't: When you flip a coin, it landing to display
| heads or tails is "normal". That's no value judgement, it's
| just a way to characterize what is common in the mechanics.
|
| If it landed perfectly on its edge or was snatched out of the
| air by a hawk, that would not be "normal", but--to introduce
| a value judgement--it'd be pretty dang cool.
| inglor_cz wrote:
| If the model starts concocting _nonexistent sources_ , like
| "articles from serious newspapers that just never existed", it
| is definitely a malfunction for me. AFAIK this is what happened
| in the Jonathan Turley case.
| wordofx wrote:
| Instead of hallucination we could call it the Kamala effect.
| mlindner wrote:
| I agreed with you until your last sentence. Solving alignment
| is not a necessity for solving hallucinations even though
| solving hallucinations is a necessity for solving alignment.
|
| Put another way, you can have a hypothetical model that doesn't
| have hallucinations and still has no alignment but you can't
| have alignment if you have hallucinations. Alignment is about
| skillful lying/refusing to answer questions and is a more
| complex task than simply telling no lies. (My personal opinion
| is that trying to solve alignment is a dystopian action and
| should not be attempted.)
| lolinder wrote:
| My point is that eliminating hallucinations is just a special
| case of alignment: the case where we want to bound the
| possible text outputs to be constrained by the truth (for a
| value of truth defined by $SOMEONE).
|
| Other alignment issues have a problem statement that is
| effectively identical, but s/truth/morals/ or
| s/truth/politics/ or s/truth/safety/. It's all the same
| problem: how do we get probabilistic text to match our
| expectations of what should be outputted while still allowing
| it to be useful sometimes?
|
| As for whether we should be solving alignment, I'm inclined
| to agree that we shouldn't, but by extension I'd apply that
| to hallucinations. Truth, like morality, is much harder to
| define than we instinctively think it is, and any effort to
| eliminate hallucinations will run up against the problem of
| how we define truth.
| advael wrote:
| It's crazy to me that we managed to get such an exciting
| technology both theoretically and as a practical tool and still
| managed to make it into a bubbly hype wave because business
| people want it to be an automation technology, which is just a
| poor fit for what they actually do
|
| It's kind of cool that we can make mathematical arguments for
| this, but the idea that generative models can function as
| universal automation is a fiction mostly being pushed by non-
| technical business and finance people, and it's a good
| demonstration of how we've let such people drive the priorities
| of technological development and adoption for far too long
|
| A common argument I see folks make is that humans are fallible
| too. Yes, no shit. No automation even close to as fallible as a
| human at its task could function as an automation. When we
| automate, we remove human accountability and human versatility
| from the equation entirely, and can scale the error accumulation
| far beyond human capability. Thus, an automation that actually
| works needs drastically superhuman reliability, which is why
| functioning automations are usually narrow-domain machines
| akira2501 wrote:
| > want it to be an automation technology
|
| They want it to be a wage reduction technology. Everything else
| you've noticed is a direct consequence of this, and only this,
| so the analysis doesn't actually need to be any deeper than
| that.
| advael wrote:
| Business culture's tribal knowledge about technology seems to
| have by and large devolved into wanting only bossware, that
| all gizmos should function as a means of controlling their
| suppliers, their labor force, or their customers. I think
| this is a good sign that the current economic incentives
| aren't functioning in a desirable way for most people and
| this needs significant intervention
| seydor wrote:
| Isn't that obvious without invoking godel's theorem etc?
| zer00eyz wrote:
| Shakes fist at clouds... Back in my day we called these "bugs"
| and if you didn't fix them your program didn't work.
|
| Jest aside, there is a long list of "flaws" in LLMS that no one
| seems to be addressing. Hallucinations, Cut off dates, Lack of
| true reasoning (the parlor tricks to get there don't cut it),
| size/cost constraints...
|
| LLM's face the same issues as expert systems, without the
| constant input of experts (subject matter) your llm becomes
| quickly outdated and useless, for all but the most trivial of
| tasks.
| OutOfHere wrote:
| This seems to miss the point, which is how to minimize
| hallucinations to a desirable level. Good prompts refined over
| time can minimize hallucinations by a significant degree, but
| they cannot fully eliminate them.
| danenania wrote:
| Perplexity does a pretty good job on this. I find myself reaching
| for it first when looking for a factual answer or doing research.
| It can still make mistakes but the hallucination rate is very
| low. It feels comparable to a google search in terms of accuracy.
|
| Pure LLMs are better for brainstorming or thinking through a
| task.
| ninetyninenine wrote:
| Incomplete training data is kind of a pointless thing to measure.
|
| Isn't incomplete data the whole point of learning in general? The
| reason why we have machine learning is because data was
| incomplete. If we had complete data we don't need ml. We just
| build a function that maps the input to output based off the
| complete data. Machine learning is about filling in the gaps
| based off of a prediction.
|
| In fact this is what learning in general is doing. It means this
| whole thing about incomplete data applies to human intelligence
| and learning as well.
|
| Everything this theory is going after basically has application
| learning and intelligence in general.
|
| So sure you can say that LLMs will always hallucinate. But humans
| will also always hallucinate.
|
| The real problem that needs to be solved is: how do we get LLMs
| to hallucinate in the same way humans hallucinate?
| skzv wrote:
| Yes, but it also makes a huge difference whether we are asking
| the model to interpolate or extrapolate.
|
| Generally speaking, models perform much better on the former
| task, and have big problems with the latter.
| abernard1 wrote:
| > Machine learning is about filling in the gaps based off of a
| prediction.
|
| I think this is a generous interpretation of network-based ML.
| ML was designed to solve problems. We had lots of data, and we
| knew large amounts of data could derive functions (networks) as
| opposed to deliberate construction of algorithms with GOFAI.
|
| But "intelligence" with ML as it stands now is not how humans
| think. Humans do not need millions of examples of cats to know
| what a cat is. They might need two or three, and they can
| permanently identify them later. Moreover, they don't need to
| see all sorts of "representative" cats. A human could see a
| single instance of black cat and identify all other types of
| house cats _as cats_ correctly. (And they do: just observe
| children).
|
| Intelligence is the ability to come up with a solution without
| previous knowledge. The more intelligent an entity is, the
| _less_ data it needs. As we approach more intelligent systems,
| they will need less data to be effective, not more.
| imoverclocked wrote:
| > Humans do not need millions of examples of cats to know
| what a cat is.
|
| We have evolved over time to recognize things in our
| environment. We also don't need to be told that snakes are
| dangerous as many humans have an innate understanding of
| that. Our training data is partially inherited.
| reilly3000 wrote:
| Better given them some dried frog pills.
| leobg wrote:
| Isn't hallucination just the result of speaking out loud the
| first possible answer to the question you've been asked?
|
| A human does not do this.
|
| First of all, most questions we have been asked before. We have
| made mistakes in answering them before, and we remember these, so
| we don't repeat them.
|
| Secondly, we (at least some of us) think before we speak. We have
| an initial reaction to the question, and before expressing it, we
| relate that thought to other things we know. We may do "sanity
| checks" internally, often habitually without even realizing it.
|
| Therefore, we should not expect an LLM to generate the correct
| answer immediately without giving it space for reflection.
|
| In fact, if you observe your thinking, you might notice that your
| thought process often takes on different roles and personas.
| Rarely do you answer a question from just one persona. Instead,
| most of your answers are the result of internal discussion and
| compromise.
|
| We also create additional context, such as imagining the
| consequences of saying the answer we have in mind. Thoughts like
| that are only possible once an initial "draft" answer is formed
| in your head.
|
| So, to evaluate the intelligence of an LLM based on its first
| "gut reaction" to a prompt is probably misguided.
|
| Let me know if you need any further revisions!
| nickpsecurity wrote:
| Our brains also seem to tie our thoughts to observed reality in
| some way. The parts that do sensing and reasoning interact with
| the parts that handle memory. Different types of memory exist
| to handle trade offs. Memory of what makes sense also grows in
| strength compared to random things we observed.
|
| The LLM's don't seem to be doing these things. Their design is
| weaker than the brain on mitigating hallucinations.
|
| For brain-inspired research, I'd look at portions of the brain
| that seem to be abnormal in people with hallucinations. Then,
| models of how they work. Then, see if we can apply that to
| LLM's.
|
| My other idea was models of things like the hippocampus applied
| to NN's. That's already being done by a number of researchers,
| though.
| soared wrote:
| I also like comparing a human thought experiment like Einstein
| (?) would do to forcing an llm to write code to answer a
| question. Yes you can make a good guess, but making many
| smaller obvious decisions that lead to an answer is a stronger
| process.
| gus_massa wrote:
| > _A human does not do this._
|
| You obviously had never asked me anything. (Specialy tech
| questions while drinking a cup of cofee.) If I had a cent for
| every wrong answer, I'd be already a millionair.
| NemoNobody wrote:
| Why?? To defend AI you used yourself as an example of how we
| can also be that dumb too.
|
| I don't understand. Your example isn't true - what the OP
| posted is the human condition regarding this particular
| topic. You, as a human being obviously kno better than to
| blurt out the first thing that pop into your head - you even
| have different preset iterations of acceptable things to
| blurt in certain situations solely to avoid saying the wrong
| thing like - I'm sorry for your loss. Thoughts and prayers"
| and stuff like "Yes, Boss" or all the many rules of
| politeness, all of that is second nature to you, a prevents
| from blurting shit out.
|
| Lastly, how do dumb questions in the mornings with coffee at
| a tech meeting in any way compare to an AI hallucination??
|
| Did you ever reply with information that you completely made
| up, has seemingly little to do with the question and doesn't
| appear to make any logical or reasonable sense as to why
| that's your answer or how you even got there??
|
| That's clearly not the behavior of an "awake" or sentient
| thing. That is perhaps the simplest way for normal people to
| "get it" is by realizing what a hallucination is and that
| their toddler is likely more capable of comprehending
| context.
|
| You dismissed a plainly stated and correct position, with
| self depreciating humor - for why?
| paulddraper wrote:
| > A human does not do this
|
| Have you met humans?
| aprilthird2021 wrote:
| I have never had a human give me some of the answers an LLM
| has given me, and I've met humans who can't tell basically
| any country on a map, including the one they live in
| dsjoerg wrote:
| > Let me know if you need any further revisions!
|
| Fun, you got me :)
| burnte wrote:
| > So, to evaluate the intelligence of an LLM based on its first
| "gut reaction" to a prompt is probably misguided.
|
| There's no intelligence to evaluate. They're not intelligent.
| There's no logic or cogitation in them.
| JieJie wrote:
| The US had a president for eight years who was re-elected on
| his ability to act on his "gut reaction"s.
|
| Not saying this is ideal, just that it isn't the showstopper
| you present it as. In fact, when people talk about "human
| values", it might be worth reflecting on whether this a thing
| we're supposed to be protecting or expunging?
|
| "I'm not a textbook player, I'm a gut player." --President
| George W. Bush.
|
| https://www.heraldtribune.com/story/news/2003/01/12/going-to...
| jrflowers wrote:
| > Isn't hallucination just the result of speaking out loud the
| first possible answer to the question you've been asked?
|
| No.
|
| > In fact, if you observe your thinking...
|
| There is no reason to believe that LLMs should be compared to
| human minds other than our bad and irrational tendency towards
| anthropomorphizing everything.
|
| > So, to evaluate the intelligence of an LLM based on its first
| "gut reaction" to a prompt is probably misguided.
|
| LLMs do not have guts and do not experience time. They are not
| some nervous kid randomly filling in a scantron before the
| clock runs out. They are the product of software developers
| abandoning the half-century+ long tradition of making computers
| output correct answers and chasing vibes instead
| pessimizer wrote:
| > LLMs do not have guts
|
| Just going to ignore the scare quotes then?
|
| > do not experience time
|
| None of us experience time. Time is a way to describe cause
| and effect, and change. LLMs have a time when they have been
| invoked with a prompt, and a time when they have generated
| output based on that prompt. LLMs _don 't experience
| anything,_ they're computer programs, but we certainly
| experience LLMs taking time. When we run multiple stages and
| techniques, each depending on the output of a previous stage,
| those are time.
|
| So when somebody says "gut reaction" they're trying to get
| you to compare the straight probabilistic generation of text
| to your instinctive reaction to something. They're asking you
| to use introspection and ask yourself if you review that
| first instinctive reaction i.e. have another stage afterwards
| that relies on the result of the instinctive reaction. If you
| do, then asking for LLMs to do well in one pass, rather than
| using the first pass to guide the next passes, is asking for
| superhuman performance.
|
| I feel like this is too obvious to be explaining.
| Anthropomorphizing things is worth bitching about, but
| anthropomorphizing human languages and human language output
| is necessary and not wrong. You don't have to think computer
| programs have souls to believe that running algorithms over
| human languages to produce free output that is comprehensible
| and convincing to humans _requires_ comparisons to humans.
| Otherwise, you might as well be lossy compressing music
| without referring to ears, or video without referring to
| eyes.
| jrflowers wrote:
| > Just going to ignore the scare quotes then?
|
| Yep. The analogy is bad even with that punctuation.
|
| > None of us experience time.
|
| That is not true and would only be worthy of discussion if
| we had agreed that comparing human experience to LLMs
| predicting tokens was worthwhile (which I emphatically have
| not done)
|
| > You don't have to think computer programs have souls to
| believe that running algorithms over human languages to
| produce free output that is comprehensible and convincing
| to humans requires comparisons to humans.
|
| This is true. You also don't have to think that comparing
| this software to humans is required. That's a belief that a
| person can hold, but holding it strongly does not make it
| an immutable truth.
| kristiandupont wrote:
| >> Isn't hallucination just the result of speaking out loud
| the first possible answer to the question you've been asked?
|
| >No.
|
| Not literally, but it's certainly comparable.
|
| >There is no reason to believe that LLMs should be compared
| to human minds
|
| There is plenty of reason to do that. They are not the same,
| but that doesn't mean it's useless to look at the
| similarities that do exist.
| NemoNobody wrote:
| Very well said!
| AlexandrB wrote:
| > In fact, if you observe your thinking, you might notice that
| your thought process often takes on different roles and
| personas.
|
| I don't think it's possible to actually observe one's own
| thinking. A lot of the "eureka" moments one has in the shower,
| for example, were probably being thought about _somewhere_ in
| your head but that process is completely hidden from your
| conscious mind.
| ein0p wrote:
| Humans totally do this if their prefrontal cortex shuts down
| due to fight or flight response. See eg stage fright or giving
| bullshit answers in leetcode style interviews.
| GuB-42 wrote:
| No, if I ask a human about something he doesn't know, the first
| thing he will think about is not a made up answer, it is "I
| don't know". It actually takes effort to make up a story, and
| without training we tend to be pretty bad at it. Some people do
| it naturally, but it is considered a disorder.
|
| For LLMs, there is no concept of "not knowing", they will just
| write something that best matches their training data, and
| since there is not much "I don't know" in their training data,
| it is not a natural answer.
|
| For example, I asked for a list of bars in a small city the LLM
| clearly didn't know much about, and gave me a nice list with
| names, addresses, phone numbers, etc... all hallucinated. Try
| to ask a normal human to give you a list of bars in a city he
| doesn't know well enough, and force him to answer something
| plausible, no "I don't know". Eventually, especially if he
| knows a lot about bars, you will get an answer, but it
| absolutely won't be his first thought, he will probably need to
| think hard about it.
| pessimizer wrote:
| > No, if I ask a human about something he doesn't know, the
| first thing he will think about is not a made up answer, it
| is "I don't know".
|
| You've just made this up, through. It's not what happens. How
| would somebody even know that they didn't know without trying
| to come up with an answer?
|
| But maybe more convincingly, people who have brain injuries
| that cause them to neglect a side (i.e. not see the left or
| right side of things) often don't realize (without a lot of
| convincing) the extent to which this is happening. If you ask
| them to explain their unexplainable behaviors, they'll
| spontaneously concoct the most convincing explanation that
| they can.
|
| https://en.wikipedia.org/wiki/Hemispatial_neglect
|
| https://en.wikipedia.org/wiki/Anosognosia
|
| People try to make things make sense. LLMs try to minimize a
| loss function.
| nailuj wrote:
| To tire a comparison to human thinking, you can conceive of it as
| hallucinations too, we just have another layer behind the
| hallucinations that evaluates each one and tries to integrate
| them with what we believe to be true. You can observe this when
| you're about to fall asleep or are snoozing, sometimes you go
| down wild thought paths until the critical thinking part of your
| brain kicks in with "everything you've been thinking about these
| past 10 seconds is total incoherent nonsense". Dream logic.
|
| In that sense, a hallucinating system seems like a promising step
| towards stronger AI. AI systems simply are lacking a way to test
| their beliefs against a real world in the way we can, so natural
| laws, historical information, art and fiction exist on the same
| epistemological level. This is a problem when integrating them
| into a useful theory because there is no cost to getting the
| fundamentals wrong.
| rw_panic0_0 wrote:
| since it doesn't have emotions I believe
| davesque wrote:
| The way that LLMs hallucinate now seems to have everything to do
| with the way in which they represent knowledge. Just look at the
| cost function. It's called log likelihood for a reason. The only
| real goal is to produce a sequence of tokens that are plausible
| in the most abstract sense, not consistent with concepts in a
| sound model of reality.
|
| Consider that when models hallucinate, they are still doing what
| we trained them to do quite well, which is to at least produce a
| text that is likely. So they implicitly fall back onto more
| general patterns in the training data i.e. grammar and simple
| word choice.
|
| I have to imagine that the right architectural changes could
| still completely or mostly solve the hallucination problem. But
| it still seems like an open question as to whether we could make
| those changes and still get a model that can be trained
| efficiently.
|
| _Update:_ I took out the first sentence where I said "I don't
| agree" because I don't feel that I've given the paper a careful
| enough read to determine if the authors aren't in fact agreeing
| with me.
| yencabulator wrote:
| I posit that when someone figures out those architectural
| changes, the result won't be called an LLM anymore, and the
| paper will be correct.
| davesque wrote:
| Yep, could be.
| nybsjytm wrote:
| Does it matter that, like so much in the Math for AI sphere, core
| details seem to be totally bungled? e.g. see various comments in
| this thread
| https://x.com/waltonstevenj/status/1834327595862950207
| gdiamos wrote:
| Disagree - https://arxiv.org/abs/2406.17642
|
| We cover halting problem and intractable problems in the related
| work.
|
| Of course LLMs cannot give answers to intractable problems.
|
| I also don't see why you should call an answer of "I cannot
| compute that" to a halting problem question a hallucination.
| lsy wrote:
| LLM and other generative output can only be useful for a purpose
| or not useful. Creating a generative model that only produces
| absolute truths (as if this was possible, or there even were such
| a thing) would make them useless for creative pursuits, jokes,
| and many of the other purposes to which people want to put them.
| You can't generate a cowboy frog emoji with a perfectly reality-
| faithful model.
|
| To me this means two things:
|
| 1. Generative models can only be helpful for tasks where the user
| can already decide whether the output is useful. Retrieving a
| fact the user doesn't already know is not one of those use cases.
| Making memes or emojis or stories that the user finds enjoyable
| might be. Writing pro forma texts that the user can proofread
| also might be.
|
| 2. There's probably no successful business model for LLMs or
| generative models that is not already possible with the current
| generation of models. If you haven't figured out a business model
| for an LLM that is "60% accurate" on some benchmark, there won't
| be anything acceptable for an LLM that is "90% accurate", so
| boiling yet another ocean to get there is not the golden path to
| profit. Rather, it will be up to companies and startups to create
| features that leverage the existing models and profit that way
| rather than investing in compute, etc.
| renjimen wrote:
| Models are often wrong but sometimes useful. Models that provide
| answers couched in a certain level of confidence are
| miscalibrated when all answers are given confidently. New
| training paradigms attempt to better calibrate model confidence
| in post-training, but clearly there are competing incentives to
| give answers confidently given the economics of the AI arms race.
| TMWNN wrote:
| How goes the research on whether hallucinations are the AI
| equivalent of human imagination, or daydreaming?
| pkphilip wrote:
| Hallucinations in LLM will severely affect its usage in scenarios
| where such hallucinations are completely unacceptable - and there
| are many such scenarios. This is a good thing because it will
| mean that human intelligence and oversight will continue to be
| needed.
| reliableturing wrote:
| I'm not sure what this paper is supposed to prove and find it
| rather trivial.
|
| > All of the LLMs knowledge comes from data. Therefore,... a
| larger more complete dataset is a solution for hallucination.
|
| Not being able to include everything in the training data is the
| whole point of intelligence. This also holds for humans. If
| sufficiently intelligent it should be able to infer new
| knowledge, refuting the very first assumption at the core of the
| work.
| badsandwitch wrote:
| Due to the limitations of gradient descent and training data we
| are limited in the architectures that are viable. All the top
| LLM's are decoder-only for efficiency reasons and all models
| train on the production of text because we are not able to train
| on the thoughts behind the text.
|
| Something that often gives me pause is the consideration that it
| is actually possible to come up with an architecture which has a
| good chance of being capable of being an AGI (RNNs, transformers
| etc as dynamical systems) but the model weights that would allow
| it to happen cannot be found because gradient descent will fail
| or not even be viable.
| mrkramer wrote:
| So hallucinations are something like cancer, it will have sooner
| or later, in another words, it is inevitable.
| willcipriano wrote:
| When will I see AI dialogue in video games? Imagine a RPG where
| instead of picking from a series of pre recorded dialogues, you
| could just talk to that villager. If it worked it would be mind
| blowing. The first studio to really pull it off in the AAA game
| would rake in the cash.
|
| That seems like the lowest hanging fruit to me, like we would do
| that long before we have AI going over someone's medical records.
|
| If the major game studios aren't confident enough in the tech to
| have it write dialogue for a Disney character for fear of it
| saying the wrong thing, I'm not ready for it to anything in the
| real world.
| fsndz wrote:
| We can't get rid of hallucinations. Hallucinations are a feature
| not a bug. A recent study by researchers Jim Waldo and Soline
| Boussard highlights the risks associated with this limitation. In
| their analysis, they tested several prominent models, including
| ChatGPT-3.5, ChatGPT-4, Llama, and Google's Gemini. The
| researchers found that while the models performed well on well-
| known topics with a large body of available data, they often
| struggled with subjects that had limited or contentious
| information, resulting in inconsistencies and errors.
|
| This challenge is particularly concerning in fields where
| accuracy is critical, such as scientific research, politics, or
| legal matters. For instance, the study noted that LLMs could
| produce inaccurate citations, misattribute quotes, or provide
| factually wrong information that might appear convincing but
| lacks a solid foundation. Such errors can lead to real-world
| consequences, as seen in cases where professionals have relied on
| LLM-generated content for tasks like legal research or coding,
| only to discover later that the information was incorrect.
| https://www.lycee.ai/blog/llm-hallucinations-report
| ndespres wrote:
| We don't need to "live with this". We can just not use them,
| ignore them, or argue against their proliferation and acceptance,
| as I will continue doing.
| CatWChainsaw wrote:
| This is "anti-progress", and we must always pursue progress
| even if it leads us to a self-made reality-melting hellmouth.
| Onward to Wonderland, I say!
| inglor_cz wrote:
| Technically, you are right. Donald Knuth still doesn't use
| e-mail, after all.
|
| But for the global "we" entity, it is almost certain that it is
| not going to heed your call.
| rapatel0 wrote:
| Been saying this from the beginning. Let's look at comparitor of
| a human result.
|
| What is the likelihood that a junior college student with access
| to google will generate a "hallucination" after reading a
| textbook and doing some basic research on a given topic. Probably
| pretty high.
|
| In our culture, we're often told to fake it till you make it. How
| many of us are probabilistic-ly hallucinating knowledge we've
| regurgitate from other sources?
| fny wrote:
| If you ask a student to solve a problem while admitting when
| they don't an answer, they will stop at generating gook for an
| answer.
|
| LLMs on the other hand regularly spew bogus with high
| confidence.
___________________________________________________________________
(page generated 2024-09-14 23:00 UTC)