hngopher.com

       [HN Gopher] AI Hallucination Cases Database
       ___________________________________________________________________
        
       AI Hallucination Cases Database
        
       Author : Tomte
       Score  : 60 points
       Date   : 2025-05-25 16:05 UTC (6 hours ago)
        
 (HTM) web link (www.damiencharlotin.com)
 (TXT) w3m dump (www.damiencharlotin.com)
        
       | irrational wrote:
       | I still think confabulation is a better term for what LLMs do
       | than hallucination.
       | 
       | Hallucination - A hallucination is a false perception where a
       | person senses something that isn't actually there, affecting any
       | of the five senses: sight, sound, smell, touch, or taste. These
       | experiences can seem very real to the person experiencing them,
       | even though they are not based on external stimuli.
       | 
       | Confabulation - Confabulation is a memory error consisting of the
       | production of fabricated, distorted, or misinterpreted memories
       | about oneself or the world. It is generally associated with
       | certain types of brain damage or a specific subset of dementias.
        
         | bluefirebrand wrote:
         | You're not wrong in a strict sense, but you have to remember
         | that most people aren't that strict about language
         | 
         | I would bet that for most people they define the words like:
         | 
         | Hallucination - something that isn't real
         | 
         | Confabulation - a word that they have never heard of
        
           | static_void wrote:
           | We should not bend over backwards to use language the way
           | ignorant people do.
        
             | add-sub-mul-div wrote:
             | "Bending over backwards" is a pretty ignorant metaphor for
             | this situation, it describes explicit activity whereas
             | letting people use metaphor loosely only requires
             | passivity.
        
             | furyofantares wrote:
             | I like communicating with people using a shared
             | understanding of the words being used, even if I have an
             | additional, different understanding of the words, which I
             | can use with other people.
             | 
             | That's what words are, anyway.
        
               | dingnuts wrote:
               | I like calling it bullshit[0] because it's the most
               | accurate, most understandable, and the most fun to use
               | with a footnote
               | 
               | 0 (featured previously on HN) https://link.springer.com/a
               | rticle/10.1007/s10676-024-09775-5
        
               | rad_gruchalski wrote:
               | Ignorance is easy to hide behind many words.
        
               | static_void wrote:
               | I'm glad we can agree. I also like communicating with
               | people using a shared understanding of the words being
               | used, i.e. their definitions.
        
             | AllegedAlec wrote:
             | We should not bend over backwards to use language the way
             | anally retentive people demand we do.
        
               | rad_gruchalski wrote:
               | Ignorance clusters easily. You'll have no problem finding
               | alike.
        
               | vkou wrote:
               | > Ignorance clusters easily.
               | 
               | So does pedantry and prickliness.
               | 
               | Intelligence is knowing that a tomato is a fruit, wisdom
               | is not putting it in a fruit salad. It's fine to want to
               | do your part to steer language, but this is not one of
               | those cases where it's important enough for anyone to be
               | an asshole about it.
        
               | rad_gruchalski wrote:
               | It also becomes apparent that ignorance leads to a weird
               | aggressive asshole fetish.
               | 
               | Hey... here's a fruit salad with tomatoes:
               | https://www.spoonabilities.com/stone-fruit-caprese-
               | salad/.
        
               | AllegedAlec wrote:
               | Sure bud.
        
               | blooalien wrote:
               | Problem is that in some fields of study / work, and in
               | some other situations absolute clarity and accuracy are
               | _super important_ to avoid dangerous or harmful mistakes.
               | Many of the sciences are that way, and A.I. is absolutely
               | one of those sciences where communicating accurately can
               | matter quite a lot. Otherwise you end up with massive
               | misunderstandings about the technology being spread
               | around as gospel truth by people who are quite simply
               | mis-informed (like you see happening right now with all
               | the A.I. hype).
        
               | static_void wrote:
               | Just in case you're talking about descriptivism vs.
               | prescriptivism.
               | 
               | I'm a descriptivist. I don't believe language should have
               | arbitrary rules, like which kinds of words you're allowed
               | to end a sentence with.
               | 
               | However, to be an honest descriptivist, you must
               | acknowledge that words are used in certain ways more
               | frequently than others. Definitions attempt to capture
               | the canonical usage of a word.
               | 
               | Therefore, if you want to communicate clearly, you should
               | use words the way they are commonly understood to be
               | used.
        
           | resonious wrote:
           | I would go one step further and suppose that a lot of people
           | just don't know what confabulation means.
        
         | maxbond wrote:
         | I think "apophenia" (attributing meaning to spurious
         | connections) or "pareidolia" (the form of aphonenia where we
         | see faces where there are none) would have been good choices,
         | as well.
        
           | cratermoon wrote:
           | anthropoglossic systems.
        
             | Terr_ wrote:
             | Largely Logorrhea Models.
        
         | rollcat wrote:
         | There's a simpler word for that: lying.
         | 
         | It's also equally wrong. Lying implies intent. Stop
         | anthropomorphising language models.
        
           | sorcerer-mar wrote:
           | Lying is different from confabulation. As you say, lying
           | implies intent. Confabulation does not necessarily, ergo it's
           | a far better word than either lying or hallucinating.
           | 
           | A person with dementia confabulates a lot, which entails
           | describing reality "incorrectly";, but it's not quite fair to
           | describe it as lying.
        
           | bandrami wrote:
           | A liar seeks to hide the truth; a confabulator is indifferent
           | to the truth entirely. It's an important distinction. True
           | statements can still be confabulations.
        
         | matkoniecz wrote:
         | And why confabulation is better one of those?
        
         | bee_rider wrote:
         | It seems like these are all anthropomorphic euphemisms for
         | things that would otherwise be described as bugs, errors (in
         | the "broken program" sense), or error (in the "accumulation of
         | numerical error" sense), if LLMs didn't have the easy-to-
         | anthropomorphize chat interface.
        
           | diggan wrote:
           | Imagine you have function that is called "is_true" but it
           | only gets it right 60% of the time. We're doing this within
           | CS/ML, so lets call that "correctness" or something fancier.
           | In order for that function to be valuable, would we need to
           | hit a 100% in correctness? I mean probably most of the time,
           | yeah. But sometimes, maybe even rarely, we're fine with it
           | being less than 100%, but still as high as possible.
           | 
           | So in this point of view, it's not a bug or error that it
           | currently sits at 60%, but if we manage to find a way to hit
           | 70%, it would be better. But in order to figure this out, we
           | need to call this "correct for most part, but could be
           | better" concept something. So we look at what we already know
           | and are familiar with, and try to draw parallels, maybe even
           | borrow some names/words.
        
             | bee_rider wrote:
             | This doesn't seem too different from my third thing, error
             | (in the "accumulation of numerical error" sense).
        
             | timewizard wrote:
             | > but if we manage to find a way to hit 70%, it would be
             | better.
             | 
             | Yet still absolutely worthless.
             | 
             | > "correct for most part, but could be better" concept
             | something.
             | 
             | When humans do that we just call it "an error."
             | 
             | > so lets call that "correctness" or something
             | 
             | The appropriate term is "confidence." These LLM tools all
             | could give you a confidence rating with each and every
             | "fact" it attempts to relay to you. Of course they don't
             | actually do that because no one would use a tool that
             | confidently gives you answers based on a 70% self
             | confidence rating.
             | 
             | We can quibble over terms but more appropriately this is
             | just "garbage." It's a giant waste of energy and resources
             | that produces flawed results. All of that money and effort
             | could be better used elsewhere.
        
               | vrighter wrote:
               | and even those confidence ratings are useless, imo. If
               | trained with wrong data, it will report high confidence
               | for the wrong answer. And curating a dataset is a black
               | art in the first place
        
               | furyofantares wrote:
               | > These LLM tools all could give you a confidence rating
               | with each and every "fact" it attempts to relay to you.
               | Of course they don't actually do that because no one
               | would use a tool that confidently gives you answers based
               | on a 70% self confidence rating.
               | 
               | Why do you believe they could give you a confidence
               | rating? They can't, at least not a meaningful one.
        
           | georgemcbay wrote:
           | They aren't really bugs though in the traditional sense
           | because all LLMs ever do is "hallucinate", seeing what we
           | call a hallucination as something fundamentally different
           | than what we consider a correct response is further
           | anthropomorphising the LLM.
           | 
           | We just label it with that word when it statistically
           | generates something we know to be wrong, but functionally
           | what it did in that case is no different than when it
           | statistically generated something that we know to be correct.
        
           | skybrian wrote:
           | It's metaphor. A hardware "bug" is occasionally due to an
           | actual insect in the machinery, but usually it isn't, and for
           | software bugs it couldn't be.
           | 
           | The word "hallucination" was pretty appropriate for images
           | made by DeepDream.
           | 
           | https://en.m.wikipedia.org/wiki/DeepDream
        
       | anshumankmr wrote:
       | Can we submit ChatGPT convo histories??
        
       | Flemlo wrote:
       | So what's the amount of cases were it was wrong but no one
       | checked?
        
         | add-sub-mul-div wrote:
         | Good point. People putting the least amount of effort into
         | their job that they can get away with is universal, judges are
         | no more immune to it than lawyers.
        
       | mullingitover wrote:
       | This seems like a perfect use case for a legal MCP server that
       | can provide grounding for citations. Protomated already has
       | one[1].
       | 
       | [1] https://github.com/protomated/legal-context
        
       | 0xDEAFBEAD wrote:
       | These penalties need to be larger. Think of all the hours of work
       | that using ChatGPT could save a lawyer. An occasional $2500 fine
       | will not deter the behavior.
       | 
       | And this matters, because this database is only the fabrications
       | which _got caught_. What happens when a decision is formulated
       | based on AI-fabricated evidence, and that decision becomes
       | precedent?
       | 
       | Here in the US, our legal system is already having its legitimacy
       | assailed on multiple fronts. We don't need additional legitimacy
       | challenges.
       | 
       | How about disbarring lawyers who present confabulated evidence?
        
       ___________________________________________________________________
       (page generated 2025-05-25 23:01 UTC)