[HN Gopher] AI Hallucination Cases Database
___________________________________________________________________
AI Hallucination Cases Database
Author : Tomte
Score : 60 points
Date : 2025-05-25 16:05 UTC (6 hours ago)
(HTM) web link (www.damiencharlotin.com)
(TXT) w3m dump (www.damiencharlotin.com)
| irrational wrote:
| I still think confabulation is a better term for what LLMs do
| than hallucination.
|
| Hallucination - A hallucination is a false perception where a
| person senses something that isn't actually there, affecting any
| of the five senses: sight, sound, smell, touch, or taste. These
| experiences can seem very real to the person experiencing them,
| even though they are not based on external stimuli.
|
| Confabulation - Confabulation is a memory error consisting of the
| production of fabricated, distorted, or misinterpreted memories
| about oneself or the world. It is generally associated with
| certain types of brain damage or a specific subset of dementias.
| bluefirebrand wrote:
| You're not wrong in a strict sense, but you have to remember
| that most people aren't that strict about language
|
| I would bet that for most people they define the words like:
|
| Hallucination - something that isn't real
|
| Confabulation - a word that they have never heard of
| static_void wrote:
| We should not bend over backwards to use language the way
| ignorant people do.
| add-sub-mul-div wrote:
| "Bending over backwards" is a pretty ignorant metaphor for
| this situation, it describes explicit activity whereas
| letting people use metaphor loosely only requires
| passivity.
| furyofantares wrote:
| I like communicating with people using a shared
| understanding of the words being used, even if I have an
| additional, different understanding of the words, which I
| can use with other people.
|
| That's what words are, anyway.
| dingnuts wrote:
| I like calling it bullshit[0] because it's the most
| accurate, most understandable, and the most fun to use
| with a footnote
|
| 0 (featured previously on HN) https://link.springer.com/a
| rticle/10.1007/s10676-024-09775-5
| rad_gruchalski wrote:
| Ignorance is easy to hide behind many words.
| static_void wrote:
| I'm glad we can agree. I also like communicating with
| people using a shared understanding of the words being
| used, i.e. their definitions.
| AllegedAlec wrote:
| We should not bend over backwards to use language the way
| anally retentive people demand we do.
| rad_gruchalski wrote:
| Ignorance clusters easily. You'll have no problem finding
| alike.
| vkou wrote:
| > Ignorance clusters easily.
|
| So does pedantry and prickliness.
|
| Intelligence is knowing that a tomato is a fruit, wisdom
| is not putting it in a fruit salad. It's fine to want to
| do your part to steer language, but this is not one of
| those cases where it's important enough for anyone to be
| an asshole about it.
| rad_gruchalski wrote:
| It also becomes apparent that ignorance leads to a weird
| aggressive asshole fetish.
|
| Hey... here's a fruit salad with tomatoes:
| https://www.spoonabilities.com/stone-fruit-caprese-
| salad/.
| AllegedAlec wrote:
| Sure bud.
| blooalien wrote:
| Problem is that in some fields of study / work, and in
| some other situations absolute clarity and accuracy are
| _super important_ to avoid dangerous or harmful mistakes.
| Many of the sciences are that way, and A.I. is absolutely
| one of those sciences where communicating accurately can
| matter quite a lot. Otherwise you end up with massive
| misunderstandings about the technology being spread
| around as gospel truth by people who are quite simply
| mis-informed (like you see happening right now with all
| the A.I. hype).
| static_void wrote:
| Just in case you're talking about descriptivism vs.
| prescriptivism.
|
| I'm a descriptivist. I don't believe language should have
| arbitrary rules, like which kinds of words you're allowed
| to end a sentence with.
|
| However, to be an honest descriptivist, you must
| acknowledge that words are used in certain ways more
| frequently than others. Definitions attempt to capture
| the canonical usage of a word.
|
| Therefore, if you want to communicate clearly, you should
| use words the way they are commonly understood to be
| used.
| resonious wrote:
| I would go one step further and suppose that a lot of people
| just don't know what confabulation means.
| maxbond wrote:
| I think "apophenia" (attributing meaning to spurious
| connections) or "pareidolia" (the form of aphonenia where we
| see faces where there are none) would have been good choices,
| as well.
| cratermoon wrote:
| anthropoglossic systems.
| Terr_ wrote:
| Largely Logorrhea Models.
| rollcat wrote:
| There's a simpler word for that: lying.
|
| It's also equally wrong. Lying implies intent. Stop
| anthropomorphising language models.
| sorcerer-mar wrote:
| Lying is different from confabulation. As you say, lying
| implies intent. Confabulation does not necessarily, ergo it's
| a far better word than either lying or hallucinating.
|
| A person with dementia confabulates a lot, which entails
| describing reality "incorrectly";, but it's not quite fair to
| describe it as lying.
| bandrami wrote:
| A liar seeks to hide the truth; a confabulator is indifferent
| to the truth entirely. It's an important distinction. True
| statements can still be confabulations.
| matkoniecz wrote:
| And why confabulation is better one of those?
| bee_rider wrote:
| It seems like these are all anthropomorphic euphemisms for
| things that would otherwise be described as bugs, errors (in
| the "broken program" sense), or error (in the "accumulation of
| numerical error" sense), if LLMs didn't have the easy-to-
| anthropomorphize chat interface.
| diggan wrote:
| Imagine you have function that is called "is_true" but it
| only gets it right 60% of the time. We're doing this within
| CS/ML, so lets call that "correctness" or something fancier.
| In order for that function to be valuable, would we need to
| hit a 100% in correctness? I mean probably most of the time,
| yeah. But sometimes, maybe even rarely, we're fine with it
| being less than 100%, but still as high as possible.
|
| So in this point of view, it's not a bug or error that it
| currently sits at 60%, but if we manage to find a way to hit
| 70%, it would be better. But in order to figure this out, we
| need to call this "correct for most part, but could be
| better" concept something. So we look at what we already know
| and are familiar with, and try to draw parallels, maybe even
| borrow some names/words.
| bee_rider wrote:
| This doesn't seem too different from my third thing, error
| (in the "accumulation of numerical error" sense).
| timewizard wrote:
| > but if we manage to find a way to hit 70%, it would be
| better.
|
| Yet still absolutely worthless.
|
| > "correct for most part, but could be better" concept
| something.
|
| When humans do that we just call it "an error."
|
| > so lets call that "correctness" or something
|
| The appropriate term is "confidence." These LLM tools all
| could give you a confidence rating with each and every
| "fact" it attempts to relay to you. Of course they don't
| actually do that because no one would use a tool that
| confidently gives you answers based on a 70% self
| confidence rating.
|
| We can quibble over terms but more appropriately this is
| just "garbage." It's a giant waste of energy and resources
| that produces flawed results. All of that money and effort
| could be better used elsewhere.
| vrighter wrote:
| and even those confidence ratings are useless, imo. If
| trained with wrong data, it will report high confidence
| for the wrong answer. And curating a dataset is a black
| art in the first place
| furyofantares wrote:
| > These LLM tools all could give you a confidence rating
| with each and every "fact" it attempts to relay to you.
| Of course they don't actually do that because no one
| would use a tool that confidently gives you answers based
| on a 70% self confidence rating.
|
| Why do you believe they could give you a confidence
| rating? They can't, at least not a meaningful one.
| georgemcbay wrote:
| They aren't really bugs though in the traditional sense
| because all LLMs ever do is "hallucinate", seeing what we
| call a hallucination as something fundamentally different
| than what we consider a correct response is further
| anthropomorphising the LLM.
|
| We just label it with that word when it statistically
| generates something we know to be wrong, but functionally
| what it did in that case is no different than when it
| statistically generated something that we know to be correct.
| skybrian wrote:
| It's metaphor. A hardware "bug" is occasionally due to an
| actual insect in the machinery, but usually it isn't, and for
| software bugs it couldn't be.
|
| The word "hallucination" was pretty appropriate for images
| made by DeepDream.
|
| https://en.m.wikipedia.org/wiki/DeepDream
| anshumankmr wrote:
| Can we submit ChatGPT convo histories??
| Flemlo wrote:
| So what's the amount of cases were it was wrong but no one
| checked?
| add-sub-mul-div wrote:
| Good point. People putting the least amount of effort into
| their job that they can get away with is universal, judges are
| no more immune to it than lawyers.
| mullingitover wrote:
| This seems like a perfect use case for a legal MCP server that
| can provide grounding for citations. Protomated already has
| one[1].
|
| [1] https://github.com/protomated/legal-context
| 0xDEAFBEAD wrote:
| These penalties need to be larger. Think of all the hours of work
| that using ChatGPT could save a lawyer. An occasional $2500 fine
| will not deter the behavior.
|
| And this matters, because this database is only the fabrications
| which _got caught_. What happens when a decision is formulated
| based on AI-fabricated evidence, and that decision becomes
| precedent?
|
| Here in the US, our legal system is already having its legitimacy
| assailed on multiple fronts. We don't need additional legitimacy
| challenges.
|
| How about disbarring lawyers who present confabulated evidence?
___________________________________________________________________
(page generated 2025-05-25 23:01 UTC)