[HN Gopher] LLMs Don't Know What They Don't Know-and That's a Pr...
___________________________________________________________________
LLMs Don't Know What They Don't Know-and That's a Problem
Author : ColinEberhardt
Score : 38 points
Date : 2025-03-06 19:36 UTC (3 hours ago)
(HTM) web link (blog.scottlogic.com)
(TXT) w3m dump (blog.scottlogic.com)
| corytheboyd wrote:
| The day can't come fast enough where we just see things like this
| as trivial misuse of the tool-- like using a hammer to drive in a
| screw. We use the hammer for nails and the screwdriver for
| screws. We use LLM for exploring data with language, and our
| brains for reasoning.
| bluefirebrand wrote:
| > We use LLM for exploring data with language, and our brains
| for reasoning
|
| I know this is very cynical of me, but I am becoming very
| convinced that this is actually the biggest draw to AI for a
| lot of people
|
| They don't want to use their brains
|
| People talk about how it can summarize long text for them. This
| is framed as being a time saver, but I'm positive for a lot of
| people they just don't want to read the full text
|
| It can generate images for people who don't want to learn how
| to draw, or save them money by not hiring artists
| hooverd wrote:
| If you're generating long text to send to people and
| summarizing long text that people send to you, you're just
| wasting other people's time.
|
| They also talk about democratizing art- are they using LLMs'
| probably vast corpus of art feedback to improve their own
| work? Well, no.
| K0balt wrote:
| Yes. Please. Stop. AI summaries are often terrible and miss
| anything resembling a subtle point. Generating long form
| from a summary is just literally packing the summary with
| obvious information and made up bullshit.
|
| LLMs generate text, not knowledge. They are great for
| parsing human culture... but not good at thinking.
| ColinEberhardt wrote:
| > We use the hammer for nails and the screwdriver for screws
|
| The difference is, the hammers and screwdrivers perform a
| single task, and have been designed and optimised for that
| specific task.
|
| LLMs are much more versatile and capable of performing a wide
| range of tasks. Yet, at the same time, their capabilities are
| ill defined.
| corytheboyd wrote:
| I know my example is very contrived, I wasn't trying very
| hard, just went with the first thing that came to mind.
|
| > LLMs are much more versatile and capable of performing a
| wide range of tasks. Yet, at the same time, their
| capabilities are ill defined.
|
| That's my point, I want to skip to the part where we know
| what LLMs are good for, what they are bad for, and just
| consider them another tool at our disposal. We're still in
| the phase of throwing shit at the wall to see what sticks,
| and it is exhausting more often than not.
| ColinEberhardt wrote:
| > That's my point, I want to skip to the part where we know
| what LLMs are good for, what they are bad for, and just
| consider them another tool at our disposal.
|
| Totally agree with that.
| TZubiri wrote:
| My take is:
|
| GOOD: Language parsing.
|
| BAD: Information retrieval.
|
| We are now seeing the LLM is used to parse the question and
| retrieve information from elsewhere.
|
| Before you would ask the LLM who the president of the US
| was and the LLM would autocomplete. Now the LLM constructs
| a query through a tool and searches the internet for an
| answer.
|
| It parsed the entire internet to have enough data to learn
| about language, but you don't necessarily want to depend on
| what it learned, other than to parse the syntax of the
| user.
| TZubiri wrote:
| Hammers are NOT designed with a specific purpose, they are
| just big heavy metal things with a handle for leverage.
|
| Similarly LLMs are a thing that turned out to be useful and
| we end up looking foru usecases for it.
|
| Similar to the YC analogy of the company that discovers a
| brick and they have to find out useful ways to use it: To put
| out fires, to hit people in the head, etc..
| K0balt wrote:
| I just think of LLMs as "what if my uncle Steve went to
| college" because it's like that. And if I'm using quants it's
| q5kM =1 beer. Q4=6 beers
|
| Still, drunk , educated uncle Steve is pretty handy sometimes.
| drewcoo wrote:
| > We use LLM for exploring data with language
|
| That seems problematic, too.
|
| https://en.wikipedia.org/wiki/HARKing
| TZubiri wrote:
| I find most articles of the sort "LLMs have this flaw" to be of a
| cynical one-upmanship kind.
|
| "If you say please LLMs think you are a grandma". Well then don't
| say you are a grandma. At this point we have a rough idea of what
| these things are, what their limitations are, people are using
| them to great effect in very different areas, their objective is
| usually to hack the LLM into doing useful stuff, while the
| article writers are hacking the LLM into doing stuff that is
| wrong.
|
| If a group of guys is making applications with an LLM and another
| dude is making shit applications with the LLM, am I supposed to
| be surprised at the latter instead of the former? Anyone can do
| an LLM do weird shit, the skill and area of interest is in the
| former.
| spwa4 wrote:
| LLMs learn from the internet. Refuse to admit they don't know
| something. I have to admit I'm not entirely surprised by this.
| ColinEberhardt wrote:
| No, I'm not surprised either.
|
| In fact, I'm much more surprised at just how capable their are
| of such a wide range of task, given that they have just 'learnt
| from the internet'!
| rickydroll wrote:
| I'm not surprised either. I see this as another example of
| LLMs' emulating human behavior. I've met way too many people
| that refuse to admit they didn't know something (he says while
| looking in the mirror)
| johnisgood wrote:
| Claude does ask questions for clarification or asks me to provide
| something it does not know though, at least it has happened many
| times to me. At other times I will have to ask if it needs X or Y
| to be able to answer more accurately, although this may be the
| same case with other LLMs, too. The former though was quite a
| surprise to me, coming from GPT.
| ColinEberhardt wrote:
| Ah, interesting - I've not had much experience with Claude,
| will give it a go. Thanks.
| zamalek wrote:
| I am working on a pet project, using tactile "premium" 4/5-way
| switches in super-ergonomic form-factor keyboard (initially
| like the logitech vertical mouse, but that turned out awful).
| The only model to not get hung up on Cherry MX and hallucinate
| 4-way cherry switches has been Claude (the others did make
| attempts at other manufacturers, but hallucinated part
| numbers). It is significantly ahead of the competition.
| jug wrote:
| On this topic, SimpleQA benchmark has a component measuring
| hallucination rate vs "know" vs "don't know". OpenAI models
| have often been more troubled than the rest. See also, from the
| paper: https://imgur.com/7NDZ0ON (you want a low "Incorrect"
| score as it's an attempted answer, but wrong)
|
| I wish hallucination benchmarks were far more popular.
| nottorp wrote:
| LLMs don't know period. They can be useful to summarize well and
| redundantly publicized information, but they don't "know" even
| that.
| rowanseymour wrote:
| I use copilot every day and every day I'm more and more convinced
| that LLMs aren't going to rule the world but will continue to be
| "just" neat autocomplete tools whose utility degrades the more
| you expect from them.
| delichon wrote:
| Here's an actual sentence I typed yesterday: "the previous
| three answers you gave me were hallucinations and i'm
| skeptical, so confirm that this answer is not another one." But
| then it actually gave me a different (5th) answer that was
| useful, and it's not clear that reading the docs would have
| been faster.
| nh23423fefe wrote:
| same. i was trying to do something random with java generics
| today.
|
| i got 3 wrong answers in a row (that i could easily confirm
| were wrong by compiling)
|
| then the 4th worked. it was much faster than reading the jvm
| spec about wildcard generic subtyping relation (something ive
| read before but couldn't quote) and it taught me something i
| didn't know even though it was wrong
| zamadatix wrote:
| I wonder how much of this is an inherent problem which is hard to
| work a solution into vs "confidently guessing the answer every
| time yields a +x% gain for a modelon all of the other benchmark
| results so nobody wants to reward opposite of that".
___________________________________________________________________
(page generated 2025-03-06 23:01 UTC)