[HN Gopher] LLMs Will Always Hallucinate, and We Need to Live wi...
       ___________________________________________________________________
        
       LLMs Will Always Hallucinate, and We Need to Live with This
        
       Author : Anon84
       Score  : 173 points
       Date   : 2024-09-14 17:02 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | zyklonix wrote:
       | We might as well embrace them:
       | https://github.com/DivergentAI/dreamGPT
        
       | bicx wrote:
       | I treat LLMs like a fallible being, the same way I treat humans.
       | I don't just trust output implicitly, and I accept help with
       | tasks knowing I am taking a certain degree of risk. Mostly, my
       | experience has been very positive with GPT-4o / ChatGPT and
       | GitHub copilot with that in mind. I use each constantly
       | throughout the day.
        
         | ulbu wrote:
         | I treat them as always hallucinating and it just so happens
         | that, by accident, they sometimes produce results that resemble
         | intentional, considered or otherwise veritable to the human
         | observer. The accident is sometimes of a high probability, but
         | still an accident. Humans are similar, but we are the standard
         | for what's to be considered a hallucination, clinically
         | speaking. For us, a hallucination is a cause of concern. For
         | llms, it's just one of all the possible results. Monkeys
         | bashing on a typewriter. This difference is an essential one,
         | imo.
        
         | everdrive wrote:
         | One big difference is that at least some people have a healthy
         | sense for when they may be wrong. This sort of meta-cognitive
         | introspection is currently not possible for an LLM. For
         | instance, let's say I asked someone "do you know the first 10
         | elements of the periodic table of elements?" Most people would
         | be able to accurately say "honestly I'm not sure what comes
         | after Helium." But an LLM will just make up some bullshit, at
         | least some of the time.
        
           | TheGeminon wrote:
           | There are ways to gauge the confidence of the LLM (token
           | probabilities over the response, generating multiple outputs
           | and checking consistency), but yeah that's outside the LLM
           | itself. You could feed the info back to the LLM as a
           | status/message I suppose
        
             | christianqchung wrote:
             | The problem is also that the model may have a very high
             | confidence in token probability and is still wrong, but I'm
             | sure it could help in some cases.
        
             | Der_Einzige wrote:
             | The idea of hooking LLMs back up to themselves, i.e. giving
             | them token prob information somehow or even giving them
             | control over the settings they use to prompt themselves is
             | AWESOME and I cannot believe that no one has seriously done
             | this yet.
             | 
             | I've done it in some jupyter notebooks and the results are
             | really neat, especially since LLMs can be made with a tiny
             | bit of extra code to generate a context "timer" that they
             | wait before they _prompt themselves_ to respond, creating a
             | proper conversational agent system (i.e. not the walkie
             | talkie systems of today)
             | 
             | I wrote a paper that mentioned doing things like this for
             | having LLMs act as AI art directors:
             | https://arxiv.org/abs/2311.03716
        
           | The_Colonel wrote:
           | That's indeed the biggest problem, because it limits its
           | usefulness to questions for which you can verify the
           | correctness. (Don't get me wrong, you can still get a lot of
           | utility out of that, since for many problems finding the
           | [candidate] solutions is much more difficult than verifying
           | them)
           | 
           | OT, but this also reminds me how much I despise bullshitters.
           | Sometimes right, something wrong, but always confident. In
           | the end, you can't trust what they say.
        
         | giantrobot wrote:
         | > I treat LLMs like a fallible being, the same way I treat
         | humans.
         | 
         | The issue is LLMs are not marketed in this way. They're
         | marketed as all knowing oracles to people that have been
         | conditioned to just accept the first result Google gives them.
        
       | mxwsn wrote:
       | OK - there's always a nonzero chance of hallucination. There's
       | also a non-zero chance that macroscale objects can do quantum
       | tunnelling, but no one is arguing that we "need to live with
       | this" fact. A theoretical proof of the impossibility of reaching
       | 0% probability of some event is nice, but in practice it says
       | little about whether we can exponentially decrease the
       | probability of it happening or not to effectively mitigate risk.
        
         | panarky wrote:
         | Exactly.
         | 
         | LLMs will sometimes be inaccurate. So are humans. When LLMs are
         | clearly better than humans for specific use cases, we don't
         | need 100% perfection.
         | 
         | Autonomous cars will sometimes cause accidents. So do humans.
         | When AVs are clearly safer than humans for specific driving
         | scenarios, we don't need 100% perfection.
        
           | talldayo wrote:
           | > When AVs are clearly safer than humans for specific driving
           | scenarios, we don't need 100% perfection.
           | 
           | People didn't stop refining the calculator once it was fast
           | enough to beat a human. It's reasonable to expect absolute
           | idempotent perfection from a robot designed to manufacture
           | text.
        
             | wkat4242 wrote:
             | Yeah if the computer had been wrong 3 out of 10 times it
             | never would have been a thing.
        
             | kfarr wrote:
             | Not sure absolute perfection is a concept that can exist in
             | the universe of words.
        
             | tourmalinetaco wrote:
             | Maybe, down the line. The calculator went through a long
             | period of perfecting until it became as powerful as they
             | are today. It's only natural LLMs will also take time. And
             | much like calculators moving from stepped drums, to vacuum
             | tubes, to finally transistors, the way we build LLMs are
             | sure to change. Although I'm not quite sure idempotence is
             | something LLMs are capable of.
        
               | anthk wrote:
               | A Scientific HP calculator from late 80's was powerful
               | enough to cover most of Engineering classes.
        
           | krapp wrote:
           | If we only used LLMs for use cases where they exceed human
           | ability, that would be great. But we don't. We use them to
           | replace human beings in the general case, and many people
           | believe that they exceed human ability in every relevant
           | factor. Yet if human beings failed as often as LLMs do at the
           | tasks for which LLMs are employed, those humans would be
           | fired, sued and probably committed.
           | 
           | Yet any arbitrary degree of error can be dismissed in LLMs
           | because "humans do it too." It's weird.
        
             | SpicyLemonZest wrote:
             | I don't think it's true that modern LLMs are used to
             | replace human beings in the general case, or that any
             | significant number of people believe they exceed human
             | ability in every relevant factor.
        
           | digger495 wrote:
           | LLMs will always have some degree of inaccuracy.
           | 
           | FtFY.
        
         | unshavedyak wrote:
         | Plus, why do we care about that degree? If we could make it so
         | humans don't hallucinate too that would be great, but it ain't
         | happening. Humans memory gets polluted the moment you feed them
         | new information, as evidence by how much care we have to give
         | when trying to extract information when it matters, like law
         | enforcement.
         | 
         | People rag on LLMs constantly and i get it, but they then give
         | humans way too much credit imo. The primary difference i feel
         | like we see with LLMs vs Humans is complexity. No, i don't
         | personally believe LLMs can scale to human "intelligence".
         | However atm it feels like comparing a worm brain to a human
         | intelligence and saying that's evidence that neurons can't
         | reach human intelligence level.. despite the worm being a
         | fraction of the underling complexity.
        
           | threeseed wrote:
           | Humans have two qualities that make them infinitely superior
           | to LLMs for similar tasks.
           | 
           | a) They don't give detailed answers for questions they have
           | no knowledge about.
           | 
           | b) They learn from their mistakes.
        
         | amelius wrote:
         | > there's always a nonzero chance of hallucination. There's
         | also a non-zero chance that macroscale objects can do quantum
         | tunnelling, but no one is arguing that we "need to live with
         | this" fact.
         | 
         | True, but it is defeatist and goes against a good
         | engineering/scientific mindset.
         | 
         | With this attitude we'd still be practicing alchemy.
        
       | simonw wrote:
       | A key skill necessary to work effectively with LLMs is learning
       | how to use technology that is fundamentally unreliable and non-
       | deterministic.
       | 
       | A lot of people appear to find this hurdle almost impossible to
       | overcome.
        
         | jampekka wrote:
         | LLMs are not fundamentally non-deterministic. It's trivial to
         | do e.g. greedy sampling generation.
        
         | threeseed wrote:
         | Honesty and accuracy builds trust.
         | 
         | And when you trust something it reduces the cognitive load
         | because you don't have to build a mental model of the different
         | ways it could be deceiving you and how to handle it.
         | 
         | Which is why for me at least when I use LLMs I find them useful
         | but stressful.
        
       | jampekka wrote:
       | I'm of the opinion that the current architectures are
       | fundamentally ridden with "hallucinations" that will severely
       | limit their practical usage (including very much what the hype
       | thinks they could do). But this article puts an impossible limit
       | to what it is to "not-hallucinate".
       | 
       | It essentially restates well known fundamental limitations of
       | formal systems and mechanistic computation and then presents the
       | trivial result that LLMs also share these limitations.
       | 
       | Unless some dualism or speculative supercomputational quantum
       | stuff is invoked, this holds very much to humans too.
        
         | diggan wrote:
         | > fundamentally ridden with "hallucinations" that will severely
         | limit their practical usage
         | 
         | On the other hand, a LLM that got rid of "hallucinations" is
         | basically just a thing that copy-paste at that point. The
         | interesting properties from LLMs comes from the fact that it
         | can kind of make things up but still make them believable.
        
           | jampekka wrote:
           | As per this article, even copy-paste hallucinate e.g. because
           | there are no infinite datasets.
        
           | SoftTalker wrote:
           | > can kind of make things up but still make them believable
           | 
           | This is the definition of a _bullshitter_ , by the way.
        
             | Kerb_ wrote:
             | Or, you know, fiction writer. Some of us like little
             | stories.
        
         | User23 wrote:
         | C.S. Peirce, who is known for characterizing abductive
         | reasoning and had a considerable on John Sowa's old school AI
         | work, had an interesting take on this. I can't fully do it
         | justice, but essentially he held that both matter and mind are
         | real, but aren't dual. Rather, there is a smooth and continuous
         | transition between the two.
         | 
         | However, whatever the nature of mind and matter really is, we
         | have convincing evidence of human beings creating meaning in
         | symbols by a process Peirce called semiosis. We lack a properly
         | formally described semiotic, although much interesting
         | mathematical applied philosophy has been done in the space (and
         | frankly a ton of bullshit in the academy calls itself semiotic
         | too). Until we can do that, we will probably have great
         | difficulty producing an automaton that can perform semiosis.
         | So, for now, there certainly remains a qualitative difference
         | between the capabilities of humans and LLMs.
        
           | wrs wrote:
           | I don't know the technical philosophy terms for this, but my
           | simplistic way of thinking about it is that when I'm
           | "seriously" talking (not just emitting thoughtless cliche
           | phrases), I'm talking _about_ something. And this is
           | observable because sometimes I have an idea that I have
           | trouble expressing in words, where I know that the words I'm
           | saying are not properly expressing the idea that I have. (I
           | mean -- that's happening right now!)
           | 
           | I don't see how that could ever happen for an LLM, because
           | _all it does_ is express things in words, and all it knows is
           | the words that people expressed things with. We know for
           | sure, that's just what the code _does_ ; there's no question
           | about the underlying mechanism, like there is with humans.
        
             | User23 wrote:
             | > I have an idea that I have trouble expressing in words,
             | where I know that the words I'm saying are not properly
             | expressing the idea that I have. (I mean -- that's
             | happening right now!)
             | 
             | That's a quality insight. Which, come to think of it, is an
             | interestingly constructed word given what you just said.
        
             | gradus_ad wrote:
             | Producing text is only the visible end product. The LLM is
             | doing a whole lot behind the scenes, which is conceivably
             | analogous to the thought space from which our own words
             | flow.
        
               | smokedetector1 wrote:
               | You can simulate a NAND gate using balls rolling down a
               | specially designed wood board. In theory you could
               | construct a giant wood board with billions and billions
               | of balls that would implement the inference step of an
               | LLM. Do you see these balls rolling down a wood board as
               | a form of interiority/subjective experience? If not, then
               | why do you give it to electric currents in silicon? Just
               | because it's faster?
        
               | User23 wrote:
               | Tangentially, Peirce has a diagrammatic logical system
               | that's built entirely on conjunction and negation which
               | is isomorphic to propositional logic. He also defined
               | extensions for what we now call predicate logic and modal
               | logic.
               | 
               | John Sowa annotated Peirce's tutorial and it's quite
               | interesting[1].
               | 
               | [1] https://www.jfsowa.com/pubs/egtut.pdf
        
               | RealityVoid wrote:
               | Your point of disagreement is the _medium_ of
               | computation? The same point can be made about neurons.
               | 
               | Do you think you could have the same kind of cognitive
               | processes you have now if you were thinking 1000x slower
               | than you do? Speed of processing matters, especially when
               | you have time bounds on reaction, such in real life.
               | 
               | Another problem with balls would be the necessity of
               | perception, that you can't really do with balls alone,
               | you need different kind of medium for perception and
               | interaction, that humans, (and comuputers) do have.
        
               | User23 wrote:
               | Are you familiar with Searle's work[1] on the subject?
               | It's fun how topical it is here. Anyhow maybe the medium
               | doesn't matter, but the burden of proof for that claim is
               | on you, because it's contrary to experience, intuition,
               | and thought experiment.
               | 
               | [1] https://plato.stanford.edu/entries/chinese-room/
        
               | wrs wrote:
               | Yes, but that space is entirely derived from human
               | expressions, in words, of their own thought space. The
               | LLM has no direct training access to the humans' thoughts
               | like it does to their words. So if it does have
               | comparable thought space, that would imply such a space
               | can be reconstructed accurately after passing through
               | expression in words, which seems like a unsupported claim
               | based on millennia of humans having trouble understanding
               | each others' thoughts based on verbal communication, and
               | students writing essays that are superficially similar to
               | the texts they've read, but clearly indicate they haven't
               | internalized the concepts they were supposedly learning.
               | 
               | It's not to say there couldn't be a highly multimodal and
               | self-training model that developed a similar thought
               | space, which would be very interesting to study. It just
               | seems like LLMs aren't enough.
        
         | o11c wrote:
         | What impresses me is frankly how _bad_ it is.
         | 
         | I can't claim to have tried every model out there, but most
         | models very quickly fail when asked to do something along the
         | lines of "describe the interaction of 3 entities." They can
         | usually handle 2 (up to the point where they inevitably start
         | talking in circles - often repeating entire chunks verbatim in
         | many models), but 3 seems utterly beyond them.
         | 
         | LLMs might have a role in the field of "burn money to generate
         | usually-wrong ideas that are cheap enough to check in case
         | there's actually a good one" though.
        
       | treebeard901 wrote:
       | If everyone else can hallucinate along with it then problem
       | solved.
        
       | jmakov wrote:
       | Maybe time for getting synth minds from guessing to reason.
        
       | feverzsj wrote:
       | Maybe, it's time for the bubble to burst.
        
         | rvz wrote:
         | But but before that we need to achieve what we call "AGI"
         | first.
         | 
         | Even before that we need to define it, first and the reality is
         | _no-one_ knows what  "AGI" even is. Thus it could be anything.
         | 
         | The fact that Sam doesn't believe that AGI has been "achieved"
         | yet even after GPT-3.5, ChatGPT, GPT-4 (multi-modal), and with
         | o1 (Strawberry) suggests that what AGI _really_ means is to
         | capture the creation and work of billions, raise hundreds of
         | billions of dollars and for everyone to be on their UBI based
         | scheme, whilst they enrich themselves as the bubble continues.
         | 
         | Seems like the hallucinations are an excuse to say that AGI has
         | not yet been achieved. So it time to raise billions more money
         | on training, energy costs on inference all for it to continue
         | to hallucinate.
         | 
         | Once all the value has been captured by OpenAI and the insiders
         | cash out, THEN they would want the bubble to burst with 95% of
         | AI startups disappearing (Except OpenAI).
        
       | lolinder wrote:
       | > By establishing the mathematical certainty of hallucinations,
       | we challenge the prevailing notion that they can be fully
       | mitigated
       | 
       | Having a mathematical proof is nice, but honestly this whole
       | misunderstanding could have been avoided if we'd just picked a
       | different name for the concept of "producing false information in
       | the course of generating probabilistic text".
       | 
       | "Hallucination" makes it sound like something is going awry in
       | the normal functioning of the model, which subtly suggests that
       | if we could just identify what went awry we could get rid of the
       | problem and restore normal cognitive function to the LLM. The
       | trouble is that the normal functioning of the model is simply to
       | produce plausible-sounding text.
       | 
       | A "hallucination" is not a malfunction of the model, it's a value
       | judgement we assign to the resulting text. All it says is that
       | the text produced is not fit for purpose. Seen through that lens
       | it's obvious that mitigating hallucinations and creating
       | "alignment" are actually identical problems, and we won't solve
       | one without the other.
        
         | gwervc wrote:
         | This comment should be pinned at the top of any LLM-related
         | comment section.
        
           | Moru wrote:
           | It should be part of every AI related story on the news. Just
           | like they keep saying "X, formerly Twitter".
        
           | elif wrote:
           | Nah it's quite pedantic to say that 'this neologism does not
           | encapsulate the meaning it's meant to'
           | 
           | This is the nature of language evolution. Everyone knows what
           | hallucination means with respect to AI, without trying to
           | confer to its definition the baggage of a term used for
           | centuries as a human psychology term.
        
             | lolinder wrote:
             | Neologism undersells what this term is being used for. It's
             | a technical term of art that's created its own semantic
             | category in LLM research that separates "text generated
             | that is factually inaccurate according to ${sources}" from
             | "text generated that is morally repugnant to
             | ${individuals}" or "text generated that ${governments} want
             | to censor".
             | 
             | These three categories are entirely identical at a
             | technological level, so I think it's entirely reasonable to
             | flag that serious LLM researchers are treating them as
             | distinct categories of problems when they're fundamentally
             | not at all distinct. This isn't just a case of linguistic
             | pedantry, this is a case of the language actively impeding
             | a proper understanding of the problem by the researchers
             | who are working on that problem.
        
         | wrs wrote:
         | Yes, exactly, it's a post-facto value judgment, not a precise
         | term. If I understand the meaning of the word, "hallucination"
         | is _all the model does_. If it happens to hallucinate something
         | we think is objectively true, we just decide not to call that a
         | "hallucination". But there's literally no functional difference
         | between that case and the case of the model saying something
         | that's objectively false, or something whose objective truth is
         | unknown or undefinable.
         | 
         | I haven't read the paper yet, but if they resolve this
         | definition usefully, that would be a good contribution.
        
           | diputsmonro wrote:
           | Exactly this, I've been saying this since the beginning.
           | _Every_ response is a hallucination - a probabilistic string
           | of words divorced from any concept of truth or reality.
           | 
           | By total coincidence, some hallucinations happen to reflect
           | the truth, but only because the training data happened to
           | generally be truthful sentences. Therefore, creating
           | something that imitates a truthful sentence will often happen
           | to also be truthful, but there is absolutely no guarantee or
           | any function that even attempts to enforce that.
           | 
           | All responses are hallucinations. Some hallucinations happen
           | to overlap the truth.
        
             | fumeux_fume wrote:
             | Ok, but I think it would be more productive to educate
             | people that LLMs have no concept of truth rather than
             | insist they use the term "hallucinate" in an unintuitive
             | way.
        
               | lolinder wrote:
               | I don't know about OP, but I'm suggesting that the term
               | 'hallucinate' be abolished entirely as applies to LLMs,
               | not redefined. It draws an arbitrary line in the middle
               | of the set of problems that all amount to "how do we make
               | sure that the output of an LLM is consistently
               | acceptable" and will all be solved using the same
               | techniques if at all.
        
               | aeternum wrote:
               | LLMs do now have a concept of truth now since much of the
               | RLHF is focused on making them more accurate and true.
               | 
               | I think the problem is that humanity has a poor concept
               | of truth. We think of most things as true or not true
               | when much of our reality is uncertain due to fundamental
               | limitations or because we often just don't know yet.
               | During covid for example humanity collectively
               | hallucinated the importance of disinfecting groceries for
               | awhile.
        
               | jbm wrote:
               | > humanity collectively hallucinated the importance of
               | disinfecting groceries for awhile
               | 
               | I reject this history.
               | 
               | I homeschooled my kids during covid due to uncertainty
               | and even I didn't reach that level, and nor did anyone I
               | knew in person.
               | 
               | A very tiny number who were egged on by some YouTubers
               | did this, including one person I knew remotely.
               | Unsurprisingly that person was based in SV.
        
             | plaidfuji wrote:
             | In other words, all models are wrong, but some are useful.
        
             | TeMPOraL wrote:
             | I think you're going too far here.
             | 
             | > _By total coincidence, some hallucinations happen to
             | reflect the truth, but only because the training data
             | happened to generally be truthful sentences._
             | 
             | It's not a "total coincidence". It's the default. Thus, the
             | model's responses aren't "divorced from any concept of
             | truth or reality" - the whole distribution from which those
             | responses are pulled is strongly aligned with reality.
             | 
             | (Which is why people started using the term
             | "hallucinations" to describe the failure mode, instead of
             | "fishing a coherent and true sentence out of line noise" to
             | describe the success mode - because success mode
             | dominates.)
             | 
             | Humans didn't invent language for no reason. They don't
             | communicate to entertain themselves with meaningless
             | noises. Most of communication - whether spoken or written -
             | is deeply connected to reality. Language itself is deeply
             | connected to reality. Even the most blatant lies, even all
             | of fiction writing, they're all incorrect or fabricated
             | only at the surface level - the whole thing, accounting for
             | the utterance, what it is about, the meanings, the words,
             | the grammar - is strongly correlated with truth and
             | reality.
             | 
             | So there's absolutely no coincidence that LLMs get things
             | right more often than not. Truth is thoroughly baked into
             | the training data, simply because it's a data set of real
             | human communication, instead of randomly generated
             | sentences.
        
           | ahepp wrote:
           | maybe hallucination is all cognition is, and humans are just
           | really good at it?
        
             | theturtle32 wrote:
             | In my experience, humans are _at least_ as bad at it as
             | GPT-4, if not _far worse_. In terms, specifically, of being
             | "factually accurate" and grounded in absolute reality.
             | Humans operate entirely in the probabilistic realm of what
             | seems right to us based on how we were educated, the values
             | we were raised with, our religious beliefs, etc. -- Human
             | beings are all over the map with this.
        
               | ruthmarx wrote:
               | > In my experience, humans are at least as bad at it as
               | GPT-4, if not far worse.
               | 
               | I had an argument with a former friend recently, because
               | he read some comments on YouTube and was convinced a
               | racoon raped a cat and produced some kind of hybrid
               | offspring that was terrorizing a neighborhood. Trying to
               | explain that different species can't procreate like that
               | resulted in him pointing to the fact that other people
               | believed it in the comments as proof.
               | 
               | Say what you will about LLMs, but they seem to have a
               | better basic education than an awful lot of adults, and
               | certainly significantly better basic reasoning
               | capabilities.
        
         | Terr_ wrote:
         | "So we built a blind-guessing machine, but how can we tweak it
         | so that its blind guesses always happen to be good?"
        
         | durumu wrote:
         | I think there's a useful distinction between plausible-seeming
         | text that is wrong in some subtle way, vs text that is
         | completely fabricated to match a superficial output format, and
         | the latter is what I wish people used "hallucination" to mean.
         | A clear example of this is when you ask an LLM for some
         | sources, with ISBNs, and it just makes up random titles and
         | ISBNs that it knows full well do not correspond with reality.
         | If you ask "Did you just make that up?" the LLM will respond
         | with something like "Sorry, yes, I made that up, I actually
         | just remembered I can't cite direct sources." I wonder if this
         | is because RLHF teaches the LLM that humans in practice prefer
         | properly formatted fake output over truthful refusals?
        
           | majormajor wrote:
           | How does a model "know full well" that it output a fake ISBN?
           | 
           | It's been trained that sources look like plausible-titles +
           | random numbers.
           | 
           | It's been trained that when challenged it should say "oh
           | sorry I can't do this."
           | 
           | Are those things actually distinct?
        
           | rerdavies wrote:
           | In fairness, they will also admit they were wrong even if
           | they were right.
        
         | paulddraper wrote:
         | It's an inaccuracy.
        
         | renjimen wrote:
         | Maybe with vanilla LLMs, but new LLM training paradigms include
         | post-training with the explicit goal of avoiding over-confident
         | answers to questions the LLM should not be confident about
         | answering. So hallucination is a malfunction, just like any
         | overconfident incorrect prediction by a model.
        
           | aprilthird2021 wrote:
           | I still think OP has a point. The LLMs evolved after public
           | use to be positioned as oracles which know so much knowledge.
           | They were always probabilistic content generators, but people
           | use them the way they use search engines, to retrieve info
           | they know exists but don't exactly know.
           | 
           | Since LLMs aren't designed for this there's a whole post
           | process to try to make them amenable to this use case, but it
           | will never plug that gap
        
             | ta8645 wrote:
             | > but it will never plug that gap
             | 
             | They don't have to be perfect, they just have to be better
             | than humans. And that seems very likely to be achievable
             | eventually.
        
               | AlexandrB wrote:
               | To be better than humans they have to able confidently
               | say "I don't know" when the correct answer is not
               | available[1]. To me this sounds like a totally different
               | type of "knowledge" than stringing words together based
               | on a training set.
               | 
               | [1] LLMs are already better than humans in terms of
               | breadth, and sometimes depth, of knowledge. So it's not a
               | problem of the AI knowing more facts.
        
               | aprilthird2021 wrote:
               | Umm, is this true? Tons of worthless technology is better
               | than humans at something. It has to be better than humans
               | AND better than existing technology.
        
           | tsimionescu wrote:
           | The only time the LLM can be somewhat confident of its answer
           | is when it is reproducing verbatim text from its training
           | set. In any other circumstance, it has no way of knowing if
           | the text it produced is true or not, because fundamentally it
           | only knows if it's a likely completion of its input.
        
             | renjimen wrote:
             | Post training includes mechanisms to allow LLMs to
             | understand areas that they should exercise caution in
             | answering. It's not as simple as you say anymore.
        
         | BurningFrog wrote:
         | "Hallucinations" just means that occasionally the LLM is wrong.
         | 
         | The same is true of people, and I still find people extremely
         | helpful.
        
           | leptons wrote:
           | Except the LLM didn't deliver a "wrong" result, it delivered
           | text that is human readable and makes grammatical sense.
           | Whether or not the information contained in the text is
           | "wrong" is subjective, and the reader gets to decide if it's
           | factual or not. If the LLM delivered unreadable gibberish,
           | then that could be considered "wrong", but there is no
           | "hallicinating" going on with LLMs. That's an
           | anthropomorphism that is divorced from the reality of what an
           | LLM does. Whoever called it "hallucinating" in the context of
           | an LLM should have their nerd credentials revoked.
        
             | imchillyb wrote:
             | Human beings have a tendency to prefer comforting lies over
             | uncomfortable truths.
             | 
             | "The truth may set you free, but first it's really gonna
             | piss you off." -G.S.
        
             | AnimalMuppet wrote:
             | Human readable, makes grammatical sense, and _wrong_. And
             | no, that 's often not subjective.
        
           | lolinder wrote:
           | People constantly make this mistake, so just to clarify:
           | absolutely nothing about what I just said implies that llms
           | are not helpful.
           | 
           | Having an accurate mental model for what a tool is doing does
           | not preclude seeing its value, but it does preclude getting
           | caught up in unrealistic hype.
        
         | nativeit wrote:
         | I don't know who/how the term was initially coined in this
         | context, but I'm concerned that the things that make it
         | inaccurate are also, perhaps counterintuitively, things that
         | serve the interests of those who would overstate the
         | capabilities of LLMs, and seek to cloud their true nature
         | (along with inherent limitations) to investors and potential
         | buyers. As you already pointed out, the term implies that the
         | problems that are represented are temporary "bugs", rather than
         | symptoms of the underlying nature of the technology itself.
        
         | rerdavies wrote:
         | How different things would be if the phenomenon had been called
         | "makin' stuff up" instead. Humans make stuff up all the time,
         | and make up far more outrageous things than AIs make up. One
         | has to ask whether humans are really intelligent /not entirely
         | sarcasm.
        
           | TeMPOraL wrote:
           | I'd prefer the phenomenon be called "saying the first thing
           | that comes to your mind" instead, because humans do that a
           | lot as well, and _that_ happens to produce pretty much the
           | same failures as LLMs do.
           | 
           | IOW, humans "hallucinate" _exactly_ the same way LLMs do -
           | they just usually don 't say those things out loud, but
           | rather it's a part of the thinking process.
           | 
           | See also: people who are somewhat drunk, or very excited,
           | tend to lose inhibitions around speaking, and end up
           | frequently just blurting whatever comes to their mind
           | verbatim (including apologizing and backtracking and "it's
           | not what I meant" when someone points out the nonsense).
        
         | pointlessone wrote:
         | Confabulation is the term I've seen used a few times. I think
         | it reflects what's going on in LLMs better.
        
         | calf wrote:
         | Your argument makes several mistakes.
         | 
         | First, you have just punted the validation problem of what a
         | Normal LLM Model ought to be doing. You rhetorically declared
         | hallucinations to be part of the normal functioning (i.e., the
         | word "Normal" is already a value judgement). But we don't even
         | know that - we would need theoretical proof that ALL
         | theoretical LLMs (or neural networks as a more general
         | argument) cannot EVER attain a certain probabilistic
         | distribution. This is a theoretical computer science problem
         | and remains an open problem.
         | 
         | So the second mistake is your probabilistic reductionism. It is
         | true that LLMs, neural nets, and human brains alike are based
         | on probabilistic computations. But the reasonable definition of
         | a Hallucination is stronger than that - it needs to capture the
         | notion that the probabilistic errors are way too extreme
         | compared to the space of possible correct answers. An example
         | of this is that Humans and LLMs get Right Answers and Wrong
         | Answers in qualitatively very different ways. A concrete
         | example of that is that Humans can demonstrate correctly the
         | sequence of a power set (an EXP-TIME problem), but LLMs
         | theoretically cannot ever do so. Yet both Humans and LLMs are
         | probabilistic, we are made of chemicals and atoms.
         | 
         | Thirdly, the authors' thesis is that mitigation is impossible.
         | It is not some "lens" where mitigation is equal to alignment,
         | in fact one should use their thesis to debunk the notion that
         | Alignmnent is an attainable problem at all. It is formally
         | unsolvable and should be rendered as a absurd as someone
         | claiming prima facie that the Halting Problem is solvable.
         | 
         | Finally, the meta issue is that the AI field is full of people
         | who know zip about theoretical computer science. The vast
         | majority of CS graduates have had maybe 1-2 weeks on Turing
         | machines; an actual year-long course at the sophomore-senior
         | level on theoretical computer science is Optional and for
         | mathematically mature students who wish to concentrate in it.
         | So the problem arises is a matter of a language and conceptual
         | gap between two subdisciplines, the AI community and the TCS
         | community. So you see lots of people believing in very
         | simplistic arguments for or against some AI issue without a
         | strong theoretical grounding that while CS itself has, but is
         | not by default taught to undergraduates.
        
           | Terr_ wrote:
           | > You rhetorically declared hallucinations to be part of the
           | normal functioning (i.e., the word "Normal" is already a
           | value judgement).
           | 
           | No they aren't: When you flip a coin, it landing to display
           | heads or tails is "normal". That's no value judgement, it's
           | just a way to characterize what is common in the mechanics.
           | 
           | If it landed perfectly on its edge or was snatched out of the
           | air by a hawk, that would not be "normal", but--to introduce
           | a value judgement--it'd be pretty dang cool.
        
         | inglor_cz wrote:
         | If the model starts concocting _nonexistent sources_ , like
         | "articles from serious newspapers that just never existed", it
         | is definitely a malfunction for me. AFAIK this is what happened
         | in the Jonathan Turley case.
        
         | wordofx wrote:
         | Instead of hallucination we could call it the Kamala effect.
        
         | mlindner wrote:
         | I agreed with you until your last sentence. Solving alignment
         | is not a necessity for solving hallucinations even though
         | solving hallucinations is a necessity for solving alignment.
         | 
         | Put another way, you can have a hypothetical model that doesn't
         | have hallucinations and still has no alignment but you can't
         | have alignment if you have hallucinations. Alignment is about
         | skillful lying/refusing to answer questions and is a more
         | complex task than simply telling no lies. (My personal opinion
         | is that trying to solve alignment is a dystopian action and
         | should not be attempted.)
        
           | lolinder wrote:
           | My point is that eliminating hallucinations is just a special
           | case of alignment: the case where we want to bound the
           | possible text outputs to be constrained by the truth (for a
           | value of truth defined by $SOMEONE).
           | 
           | Other alignment issues have a problem statement that is
           | effectively identical, but s/truth/morals/ or
           | s/truth/politics/ or s/truth/safety/. It's all the same
           | problem: how do we get probabilistic text to match our
           | expectations of what should be outputted while still allowing
           | it to be useful sometimes?
           | 
           | As for whether we should be solving alignment, I'm inclined
           | to agree that we shouldn't, but by extension I'd apply that
           | to hallucinations. Truth, like morality, is much harder to
           | define than we instinctively think it is, and any effort to
           | eliminate hallucinations will run up against the problem of
           | how we define truth.
        
       | advael wrote:
       | It's crazy to me that we managed to get such an exciting
       | technology both theoretically and as a practical tool and still
       | managed to make it into a bubbly hype wave because business
       | people want it to be an automation technology, which is just a
       | poor fit for what they actually do
       | 
       | It's kind of cool that we can make mathematical arguments for
       | this, but the idea that generative models can function as
       | universal automation is a fiction mostly being pushed by non-
       | technical business and finance people, and it's a good
       | demonstration of how we've let such people drive the priorities
       | of technological development and adoption for far too long
       | 
       | A common argument I see folks make is that humans are fallible
       | too. Yes, no shit. No automation even close to as fallible as a
       | human at its task could function as an automation. When we
       | automate, we remove human accountability and human versatility
       | from the equation entirely, and can scale the error accumulation
       | far beyond human capability. Thus, an automation that actually
       | works needs drastically superhuman reliability, which is why
       | functioning automations are usually narrow-domain machines
        
         | akira2501 wrote:
         | > want it to be an automation technology
         | 
         | They want it to be a wage reduction technology. Everything else
         | you've noticed is a direct consequence of this, and only this,
         | so the analysis doesn't actually need to be any deeper than
         | that.
        
           | advael wrote:
           | Business culture's tribal knowledge about technology seems to
           | have by and large devolved into wanting only bossware, that
           | all gizmos should function as a means of controlling their
           | suppliers, their labor force, or their customers. I think
           | this is a good sign that the current economic incentives
           | aren't functioning in a desirable way for most people and
           | this needs significant intervention
        
       | seydor wrote:
       | Isn't that obvious without invoking godel's theorem etc?
        
       | zer00eyz wrote:
       | Shakes fist at clouds... Back in my day we called these "bugs"
       | and if you didn't fix them your program didn't work.
       | 
       | Jest aside, there is a long list of "flaws" in LLMS that no one
       | seems to be addressing. Hallucinations, Cut off dates, Lack of
       | true reasoning (the parlor tricks to get there don't cut it),
       | size/cost constraints...
       | 
       | LLM's face the same issues as expert systems, without the
       | constant input of experts (subject matter) your llm becomes
       | quickly outdated and useless, for all but the most trivial of
       | tasks.
        
       | OutOfHere wrote:
       | This seems to miss the point, which is how to minimize
       | hallucinations to a desirable level. Good prompts refined over
       | time can minimize hallucinations by a significant degree, but
       | they cannot fully eliminate them.
        
       | danenania wrote:
       | Perplexity does a pretty good job on this. I find myself reaching
       | for it first when looking for a factual answer or doing research.
       | It can still make mistakes but the hallucination rate is very
       | low. It feels comparable to a google search in terms of accuracy.
       | 
       | Pure LLMs are better for brainstorming or thinking through a
       | task.
        
       | ninetyninenine wrote:
       | Incomplete training data is kind of a pointless thing to measure.
       | 
       | Isn't incomplete data the whole point of learning in general? The
       | reason why we have machine learning is because data was
       | incomplete. If we had complete data we don't need ml. We just
       | build a function that maps the input to output based off the
       | complete data. Machine learning is about filling in the gaps
       | based off of a prediction.
       | 
       | In fact this is what learning in general is doing. It means this
       | whole thing about incomplete data applies to human intelligence
       | and learning as well.
       | 
       | Everything this theory is going after basically has application
       | learning and intelligence in general.
       | 
       | So sure you can say that LLMs will always hallucinate. But humans
       | will also always hallucinate.
       | 
       | The real problem that needs to be solved is: how do we get LLMs
       | to hallucinate in the same way humans hallucinate?
        
         | skzv wrote:
         | Yes, but it also makes a huge difference whether we are asking
         | the model to interpolate or extrapolate.
         | 
         | Generally speaking, models perform much better on the former
         | task, and have big problems with the latter.
        
         | abernard1 wrote:
         | > Machine learning is about filling in the gaps based off of a
         | prediction.
         | 
         | I think this is a generous interpretation of network-based ML.
         | ML was designed to solve problems. We had lots of data, and we
         | knew large amounts of data could derive functions (networks) as
         | opposed to deliberate construction of algorithms with GOFAI.
         | 
         | But "intelligence" with ML as it stands now is not how humans
         | think. Humans do not need millions of examples of cats to know
         | what a cat is. They might need two or three, and they can
         | permanently identify them later. Moreover, they don't need to
         | see all sorts of "representative" cats. A human could see a
         | single instance of black cat and identify all other types of
         | house cats _as cats_ correctly. (And they do: just observe
         | children).
         | 
         | Intelligence is the ability to come up with a solution without
         | previous knowledge. The more intelligent an entity is, the
         | _less_ data it needs. As we approach more intelligent systems,
         | they will need less data to be effective, not more.
        
           | imoverclocked wrote:
           | > Humans do not need millions of examples of cats to know
           | what a cat is.
           | 
           | We have evolved over time to recognize things in our
           | environment. We also don't need to be told that snakes are
           | dangerous as many humans have an innate understanding of
           | that. Our training data is partially inherited.
        
       | reilly3000 wrote:
       | Better given them some dried frog pills.
        
       | leobg wrote:
       | Isn't hallucination just the result of speaking out loud the
       | first possible answer to the question you've been asked?
       | 
       | A human does not do this.
       | 
       | First of all, most questions we have been asked before. We have
       | made mistakes in answering them before, and we remember these, so
       | we don't repeat them.
       | 
       | Secondly, we (at least some of us) think before we speak. We have
       | an initial reaction to the question, and before expressing it, we
       | relate that thought to other things we know. We may do "sanity
       | checks" internally, often habitually without even realizing it.
       | 
       | Therefore, we should not expect an LLM to generate the correct
       | answer immediately without giving it space for reflection.
       | 
       | In fact, if you observe your thinking, you might notice that your
       | thought process often takes on different roles and personas.
       | Rarely do you answer a question from just one persona. Instead,
       | most of your answers are the result of internal discussion and
       | compromise.
       | 
       | We also create additional context, such as imagining the
       | consequences of saying the answer we have in mind. Thoughts like
       | that are only possible once an initial "draft" answer is formed
       | in your head.
       | 
       | So, to evaluate the intelligence of an LLM based on its first
       | "gut reaction" to a prompt is probably misguided.
       | 
       | Let me know if you need any further revisions!
        
         | nickpsecurity wrote:
         | Our brains also seem to tie our thoughts to observed reality in
         | some way. The parts that do sensing and reasoning interact with
         | the parts that handle memory. Different types of memory exist
         | to handle trade offs. Memory of what makes sense also grows in
         | strength compared to random things we observed.
         | 
         | The LLM's don't seem to be doing these things. Their design is
         | weaker than the brain on mitigating hallucinations.
         | 
         | For brain-inspired research, I'd look at portions of the brain
         | that seem to be abnormal in people with hallucinations. Then,
         | models of how they work. Then, see if we can apply that to
         | LLM's.
         | 
         | My other idea was models of things like the hippocampus applied
         | to NN's. That's already being done by a number of researchers,
         | though.
        
         | soared wrote:
         | I also like comparing a human thought experiment like Einstein
         | (?) would do to forcing an llm to write code to answer a
         | question. Yes you can make a good guess, but making many
         | smaller obvious decisions that lead to an answer is a stronger
         | process.
        
         | gus_massa wrote:
         | > _A human does not do this._
         | 
         | You obviously had never asked me anything. (Specialy tech
         | questions while drinking a cup of cofee.) If I had a cent for
         | every wrong answer, I'd be already a millionair.
        
           | NemoNobody wrote:
           | Why?? To defend AI you used yourself as an example of how we
           | can also be that dumb too.
           | 
           | I don't understand. Your example isn't true - what the OP
           | posted is the human condition regarding this particular
           | topic. You, as a human being obviously kno better than to
           | blurt out the first thing that pop into your head - you even
           | have different preset iterations of acceptable things to
           | blurt in certain situations solely to avoid saying the wrong
           | thing like - I'm sorry for your loss. Thoughts and prayers"
           | and stuff like "Yes, Boss" or all the many rules of
           | politeness, all of that is second nature to you, a prevents
           | from blurting shit out.
           | 
           | Lastly, how do dumb questions in the mornings with coffee at
           | a tech meeting in any way compare to an AI hallucination??
           | 
           | Did you ever reply with information that you completely made
           | up, has seemingly little to do with the question and doesn't
           | appear to make any logical or reasonable sense as to why
           | that's your answer or how you even got there??
           | 
           | That's clearly not the behavior of an "awake" or sentient
           | thing. That is perhaps the simplest way for normal people to
           | "get it" is by realizing what a hallucination is and that
           | their toddler is likely more capable of comprehending
           | context.
           | 
           | You dismissed a plainly stated and correct position, with
           | self depreciating humor - for why?
        
         | paulddraper wrote:
         | > A human does not do this
         | 
         | Have you met humans?
        
           | aprilthird2021 wrote:
           | I have never had a human give me some of the answers an LLM
           | has given me, and I've met humans who can't tell basically
           | any country on a map, including the one they live in
        
         | dsjoerg wrote:
         | > Let me know if you need any further revisions!
         | 
         | Fun, you got me :)
        
         | burnte wrote:
         | > So, to evaluate the intelligence of an LLM based on its first
         | "gut reaction" to a prompt is probably misguided.
         | 
         | There's no intelligence to evaluate. They're not intelligent.
         | There's no logic or cogitation in them.
        
         | JieJie wrote:
         | The US had a president for eight years who was re-elected on
         | his ability to act on his "gut reaction"s.
         | 
         | Not saying this is ideal, just that it isn't the showstopper
         | you present it as. In fact, when people talk about "human
         | values", it might be worth reflecting on whether this a thing
         | we're supposed to be protecting or expunging?
         | 
         | "I'm not a textbook player, I'm a gut player." --President
         | George W. Bush.
         | 
         | https://www.heraldtribune.com/story/news/2003/01/12/going-to...
        
         | jrflowers wrote:
         | > Isn't hallucination just the result of speaking out loud the
         | first possible answer to the question you've been asked?
         | 
         | No.
         | 
         | > In fact, if you observe your thinking...
         | 
         | There is no reason to believe that LLMs should be compared to
         | human minds other than our bad and irrational tendency towards
         | anthropomorphizing everything.
         | 
         | > So, to evaluate the intelligence of an LLM based on its first
         | "gut reaction" to a prompt is probably misguided.
         | 
         | LLMs do not have guts and do not experience time. They are not
         | some nervous kid randomly filling in a scantron before the
         | clock runs out. They are the product of software developers
         | abandoning the half-century+ long tradition of making computers
         | output correct answers and chasing vibes instead
        
           | pessimizer wrote:
           | > LLMs do not have guts
           | 
           | Just going to ignore the scare quotes then?
           | 
           | > do not experience time
           | 
           | None of us experience time. Time is a way to describe cause
           | and effect, and change. LLMs have a time when they have been
           | invoked with a prompt, and a time when they have generated
           | output based on that prompt. LLMs _don 't experience
           | anything,_ they're computer programs, but we certainly
           | experience LLMs taking time. When we run multiple stages and
           | techniques, each depending on the output of a previous stage,
           | those are time.
           | 
           | So when somebody says "gut reaction" they're trying to get
           | you to compare the straight probabilistic generation of text
           | to your instinctive reaction to something. They're asking you
           | to use introspection and ask yourself if you review that
           | first instinctive reaction i.e. have another stage afterwards
           | that relies on the result of the instinctive reaction. If you
           | do, then asking for LLMs to do well in one pass, rather than
           | using the first pass to guide the next passes, is asking for
           | superhuman performance.
           | 
           | I feel like this is too obvious to be explaining.
           | Anthropomorphizing things is worth bitching about, but
           | anthropomorphizing human languages and human language output
           | is necessary and not wrong. You don't have to think computer
           | programs have souls to believe that running algorithms over
           | human languages to produce free output that is comprehensible
           | and convincing to humans _requires_ comparisons to humans.
           | Otherwise, you might as well be lossy compressing music
           | without referring to ears, or video without referring to
           | eyes.
        
             | jrflowers wrote:
             | > Just going to ignore the scare quotes then?
             | 
             | Yep. The analogy is bad even with that punctuation.
             | 
             | > None of us experience time.
             | 
             | That is not true and would only be worthy of discussion if
             | we had agreed that comparing human experience to LLMs
             | predicting tokens was worthwhile (which I emphatically have
             | not done)
             | 
             | > You don't have to think computer programs have souls to
             | believe that running algorithms over human languages to
             | produce free output that is comprehensible and convincing
             | to humans requires comparisons to humans.
             | 
             | This is true. You also don't have to think that comparing
             | this software to humans is required. That's a belief that a
             | person can hold, but holding it strongly does not make it
             | an immutable truth.
        
           | kristiandupont wrote:
           | >> Isn't hallucination just the result of speaking out loud
           | the first possible answer to the question you've been asked?
           | 
           | >No.
           | 
           | Not literally, but it's certainly comparable.
           | 
           | >There is no reason to believe that LLMs should be compared
           | to human minds
           | 
           | There is plenty of reason to do that. They are not the same,
           | but that doesn't mean it's useless to look at the
           | similarities that do exist.
        
         | NemoNobody wrote:
         | Very well said!
        
         | AlexandrB wrote:
         | > In fact, if you observe your thinking, you might notice that
         | your thought process often takes on different roles and
         | personas.
         | 
         | I don't think it's possible to actually observe one's own
         | thinking. A lot of the "eureka" moments one has in the shower,
         | for example, were probably being thought about _somewhere_ in
         | your head but that process is completely hidden from your
         | conscious mind.
        
         | ein0p wrote:
         | Humans totally do this if their prefrontal cortex shuts down
         | due to fight or flight response. See eg stage fright or giving
         | bullshit answers in leetcode style interviews.
        
         | GuB-42 wrote:
         | No, if I ask a human about something he doesn't know, the first
         | thing he will think about is not a made up answer, it is "I
         | don't know". It actually takes effort to make up a story, and
         | without training we tend to be pretty bad at it. Some people do
         | it naturally, but it is considered a disorder.
         | 
         | For LLMs, there is no concept of "not knowing", they will just
         | write something that best matches their training data, and
         | since there is not much "I don't know" in their training data,
         | it is not a natural answer.
         | 
         | For example, I asked for a list of bars in a small city the LLM
         | clearly didn't know much about, and gave me a nice list with
         | names, addresses, phone numbers, etc... all hallucinated. Try
         | to ask a normal human to give you a list of bars in a city he
         | doesn't know well enough, and force him to answer something
         | plausible, no "I don't know". Eventually, especially if he
         | knows a lot about bars, you will get an answer, but it
         | absolutely won't be his first thought, he will probably need to
         | think hard about it.
        
           | pessimizer wrote:
           | > No, if I ask a human about something he doesn't know, the
           | first thing he will think about is not a made up answer, it
           | is "I don't know".
           | 
           | You've just made this up, through. It's not what happens. How
           | would somebody even know that they didn't know without trying
           | to come up with an answer?
           | 
           | But maybe more convincingly, people who have brain injuries
           | that cause them to neglect a side (i.e. not see the left or
           | right side of things) often don't realize (without a lot of
           | convincing) the extent to which this is happening. If you ask
           | them to explain their unexplainable behaviors, they'll
           | spontaneously concoct the most convincing explanation that
           | they can.
           | 
           | https://en.wikipedia.org/wiki/Hemispatial_neglect
           | 
           | https://en.wikipedia.org/wiki/Anosognosia
           | 
           | People try to make things make sense. LLMs try to minimize a
           | loss function.
        
       | nailuj wrote:
       | To tire a comparison to human thinking, you can conceive of it as
       | hallucinations too, we just have another layer behind the
       | hallucinations that evaluates each one and tries to integrate
       | them with what we believe to be true. You can observe this when
       | you're about to fall asleep or are snoozing, sometimes you go
       | down wild thought paths until the critical thinking part of your
       | brain kicks in with "everything you've been thinking about these
       | past 10 seconds is total incoherent nonsense". Dream logic.
       | 
       | In that sense, a hallucinating system seems like a promising step
       | towards stronger AI. AI systems simply are lacking a way to test
       | their beliefs against a real world in the way we can, so natural
       | laws, historical information, art and fiction exist on the same
       | epistemological level. This is a problem when integrating them
       | into a useful theory because there is no cost to getting the
       | fundamentals wrong.
        
       | rw_panic0_0 wrote:
       | since it doesn't have emotions I believe
        
       | davesque wrote:
       | The way that LLMs hallucinate now seems to have everything to do
       | with the way in which they represent knowledge. Just look at the
       | cost function. It's called log likelihood for a reason. The only
       | real goal is to produce a sequence of tokens that are plausible
       | in the most abstract sense, not consistent with concepts in a
       | sound model of reality.
       | 
       | Consider that when models hallucinate, they are still doing what
       | we trained them to do quite well, which is to at least produce a
       | text that is likely. So they implicitly fall back onto more
       | general patterns in the training data i.e. grammar and simple
       | word choice.
       | 
       | I have to imagine that the right architectural changes could
       | still completely or mostly solve the hallucination problem. But
       | it still seems like an open question as to whether we could make
       | those changes and still get a model that can be trained
       | efficiently.
       | 
       |  _Update:_ I took out the first sentence where I said  "I don't
       | agree" because I don't feel that I've given the paper a careful
       | enough read to determine if the authors aren't in fact agreeing
       | with me.
        
         | yencabulator wrote:
         | I posit that when someone figures out those architectural
         | changes, the result won't be called an LLM anymore, and the
         | paper will be correct.
        
           | davesque wrote:
           | Yep, could be.
        
       | nybsjytm wrote:
       | Does it matter that, like so much in the Math for AI sphere, core
       | details seem to be totally bungled? e.g. see various comments in
       | this thread
       | https://x.com/waltonstevenj/status/1834327595862950207
        
       | gdiamos wrote:
       | Disagree - https://arxiv.org/abs/2406.17642
       | 
       | We cover halting problem and intractable problems in the related
       | work.
       | 
       | Of course LLMs cannot give answers to intractable problems.
       | 
       | I also don't see why you should call an answer of "I cannot
       | compute that" to a halting problem question a hallucination.
        
       | lsy wrote:
       | LLM and other generative output can only be useful for a purpose
       | or not useful. Creating a generative model that only produces
       | absolute truths (as if this was possible, or there even were such
       | a thing) would make them useless for creative pursuits, jokes,
       | and many of the other purposes to which people want to put them.
       | You can't generate a cowboy frog emoji with a perfectly reality-
       | faithful model.
       | 
       | To me this means two things:
       | 
       | 1. Generative models can only be helpful for tasks where the user
       | can already decide whether the output is useful. Retrieving a
       | fact the user doesn't already know is not one of those use cases.
       | Making memes or emojis or stories that the user finds enjoyable
       | might be. Writing pro forma texts that the user can proofread
       | also might be.
       | 
       | 2. There's probably no successful business model for LLMs or
       | generative models that is not already possible with the current
       | generation of models. If you haven't figured out a business model
       | for an LLM that is "60% accurate" on some benchmark, there won't
       | be anything acceptable for an LLM that is "90% accurate", so
       | boiling yet another ocean to get there is not the golden path to
       | profit. Rather, it will be up to companies and startups to create
       | features that leverage the existing models and profit that way
       | rather than investing in compute, etc.
        
       | renjimen wrote:
       | Models are often wrong but sometimes useful. Models that provide
       | answers couched in a certain level of confidence are
       | miscalibrated when all answers are given confidently. New
       | training paradigms attempt to better calibrate model confidence
       | in post-training, but clearly there are competing incentives to
       | give answers confidently given the economics of the AI arms race.
        
       | TMWNN wrote:
       | How goes the research on whether hallucinations are the AI
       | equivalent of human imagination, or daydreaming?
        
       | pkphilip wrote:
       | Hallucinations in LLM will severely affect its usage in scenarios
       | where such hallucinations are completely unacceptable - and there
       | are many such scenarios. This is a good thing because it will
       | mean that human intelligence and oversight will continue to be
       | needed.
        
       | reliableturing wrote:
       | I'm not sure what this paper is supposed to prove and find it
       | rather trivial.
       | 
       | > All of the LLMs knowledge comes from data. Therefore,... a
       | larger more complete dataset is a solution for hallucination.
       | 
       | Not being able to include everything in the training data is the
       | whole point of intelligence. This also holds for humans. If
       | sufficiently intelligent it should be able to infer new
       | knowledge, refuting the very first assumption at the core of the
       | work.
        
       | badsandwitch wrote:
       | Due to the limitations of gradient descent and training data we
       | are limited in the architectures that are viable. All the top
       | LLM's are decoder-only for efficiency reasons and all models
       | train on the production of text because we are not able to train
       | on the thoughts behind the text.
       | 
       | Something that often gives me pause is the consideration that it
       | is actually possible to come up with an architecture which has a
       | good chance of being capable of being an AGI (RNNs, transformers
       | etc as dynamical systems) but the model weights that would allow
       | it to happen cannot be found because gradient descent will fail
       | or not even be viable.
        
       | mrkramer wrote:
       | So hallucinations are something like cancer, it will have sooner
       | or later, in another words, it is inevitable.
        
       | willcipriano wrote:
       | When will I see AI dialogue in video games? Imagine a RPG where
       | instead of picking from a series of pre recorded dialogues, you
       | could just talk to that villager. If it worked it would be mind
       | blowing. The first studio to really pull it off in the AAA game
       | would rake in the cash.
       | 
       | That seems like the lowest hanging fruit to me, like we would do
       | that long before we have AI going over someone's medical records.
       | 
       | If the major game studios aren't confident enough in the tech to
       | have it write dialogue for a Disney character for fear of it
       | saying the wrong thing, I'm not ready for it to anything in the
       | real world.
        
       | fsndz wrote:
       | We can't get rid of hallucinations. Hallucinations are a feature
       | not a bug. A recent study by researchers Jim Waldo and Soline
       | Boussard highlights the risks associated with this limitation. In
       | their analysis, they tested several prominent models, including
       | ChatGPT-3.5, ChatGPT-4, Llama, and Google's Gemini. The
       | researchers found that while the models performed well on well-
       | known topics with a large body of available data, they often
       | struggled with subjects that had limited or contentious
       | information, resulting in inconsistencies and errors.
       | 
       | This challenge is particularly concerning in fields where
       | accuracy is critical, such as scientific research, politics, or
       | legal matters. For instance, the study noted that LLMs could
       | produce inaccurate citations, misattribute quotes, or provide
       | factually wrong information that might appear convincing but
       | lacks a solid foundation. Such errors can lead to real-world
       | consequences, as seen in cases where professionals have relied on
       | LLM-generated content for tasks like legal research or coding,
       | only to discover later that the information was incorrect.
       | https://www.lycee.ai/blog/llm-hallucinations-report
        
       | ndespres wrote:
       | We don't need to "live with this". We can just not use them,
       | ignore them, or argue against their proliferation and acceptance,
       | as I will continue doing.
        
         | CatWChainsaw wrote:
         | This is "anti-progress", and we must always pursue progress
         | even if it leads us to a self-made reality-melting hellmouth.
         | Onward to Wonderland, I say!
        
         | inglor_cz wrote:
         | Technically, you are right. Donald Knuth still doesn't use
         | e-mail, after all.
         | 
         | But for the global "we" entity, it is almost certain that it is
         | not going to heed your call.
        
       | rapatel0 wrote:
       | Been saying this from the beginning. Let's look at comparitor of
       | a human result.
       | 
       | What is the likelihood that a junior college student with access
       | to google will generate a "hallucination" after reading a
       | textbook and doing some basic research on a given topic. Probably
       | pretty high.
       | 
       | In our culture, we're often told to fake it till you make it. How
       | many of us are probabilistic-ly hallucinating knowledge we've
       | regurgitate from other sources?
        
         | fny wrote:
         | If you ask a student to solve a problem while admitting when
         | they don't an answer, they will stop at generating gook for an
         | answer.
         | 
         | LLMs on the other hand regularly spew bogus with high
         | confidence.
        
       ___________________________________________________________________
       (page generated 2024-09-14 23:00 UTC)