[HN Gopher] LLMs Don't Know What They Don't Know-and That's a Pr...
       ___________________________________________________________________
        
       LLMs Don't Know What They Don't Know-and That's a Problem
        
       Author : ColinEberhardt
       Score  : 38 points
       Date   : 2025-03-06 19:36 UTC (3 hours ago)
        
 (HTM) web link (blog.scottlogic.com)
 (TXT) w3m dump (blog.scottlogic.com)
        
       | corytheboyd wrote:
       | The day can't come fast enough where we just see things like this
       | as trivial misuse of the tool-- like using a hammer to drive in a
       | screw. We use the hammer for nails and the screwdriver for
       | screws. We use LLM for exploring data with language, and our
       | brains for reasoning.
        
         | bluefirebrand wrote:
         | > We use LLM for exploring data with language, and our brains
         | for reasoning
         | 
         | I know this is very cynical of me, but I am becoming very
         | convinced that this is actually the biggest draw to AI for a
         | lot of people
         | 
         | They don't want to use their brains
         | 
         | People talk about how it can summarize long text for them. This
         | is framed as being a time saver, but I'm positive for a lot of
         | people they just don't want to read the full text
         | 
         | It can generate images for people who don't want to learn how
         | to draw, or save them money by not hiring artists
        
           | hooverd wrote:
           | If you're generating long text to send to people and
           | summarizing long text that people send to you, you're just
           | wasting other people's time.
           | 
           | They also talk about democratizing art- are they using LLMs'
           | probably vast corpus of art feedback to improve their own
           | work? Well, no.
        
             | K0balt wrote:
             | Yes. Please. Stop. AI summaries are often terrible and miss
             | anything resembling a subtle point. Generating long form
             | from a summary is just literally packing the summary with
             | obvious information and made up bullshit.
             | 
             | LLMs generate text, not knowledge. They are great for
             | parsing human culture... but not good at thinking.
        
         | ColinEberhardt wrote:
         | > We use the hammer for nails and the screwdriver for screws
         | 
         | The difference is, the hammers and screwdrivers perform a
         | single task, and have been designed and optimised for that
         | specific task.
         | 
         | LLMs are much more versatile and capable of performing a wide
         | range of tasks. Yet, at the same time, their capabilities are
         | ill defined.
        
           | corytheboyd wrote:
           | I know my example is very contrived, I wasn't trying very
           | hard, just went with the first thing that came to mind.
           | 
           | > LLMs are much more versatile and capable of performing a
           | wide range of tasks. Yet, at the same time, their
           | capabilities are ill defined.
           | 
           | That's my point, I want to skip to the part where we know
           | what LLMs are good for, what they are bad for, and just
           | consider them another tool at our disposal. We're still in
           | the phase of throwing shit at the wall to see what sticks,
           | and it is exhausting more often than not.
        
             | ColinEberhardt wrote:
             | > That's my point, I want to skip to the part where we know
             | what LLMs are good for, what they are bad for, and just
             | consider them another tool at our disposal.
             | 
             | Totally agree with that.
        
             | TZubiri wrote:
             | My take is:
             | 
             | GOOD: Language parsing.
             | 
             | BAD: Information retrieval.
             | 
             | We are now seeing the LLM is used to parse the question and
             | retrieve information from elsewhere.
             | 
             | Before you would ask the LLM who the president of the US
             | was and the LLM would autocomplete. Now the LLM constructs
             | a query through a tool and searches the internet for an
             | answer.
             | 
             | It parsed the entire internet to have enough data to learn
             | about language, but you don't necessarily want to depend on
             | what it learned, other than to parse the syntax of the
             | user.
        
           | TZubiri wrote:
           | Hammers are NOT designed with a specific purpose, they are
           | just big heavy metal things with a handle for leverage.
           | 
           | Similarly LLMs are a thing that turned out to be useful and
           | we end up looking foru usecases for it.
           | 
           | Similar to the YC analogy of the company that discovers a
           | brick and they have to find out useful ways to use it: To put
           | out fires, to hit people in the head, etc..
        
         | K0balt wrote:
         | I just think of LLMs as "what if my uncle Steve went to
         | college" because it's like that. And if I'm using quants it's
         | q5kM =1 beer. Q4=6 beers
         | 
         | Still, drunk , educated uncle Steve is pretty handy sometimes.
        
         | drewcoo wrote:
         | > We use LLM for exploring data with language
         | 
         | That seems problematic, too.
         | 
         | https://en.wikipedia.org/wiki/HARKing
        
       | TZubiri wrote:
       | I find most articles of the sort "LLMs have this flaw" to be of a
       | cynical one-upmanship kind.
       | 
       | "If you say please LLMs think you are a grandma". Well then don't
       | say you are a grandma. At this point we have a rough idea of what
       | these things are, what their limitations are, people are using
       | them to great effect in very different areas, their objective is
       | usually to hack the LLM into doing useful stuff, while the
       | article writers are hacking the LLM into doing stuff that is
       | wrong.
       | 
       | If a group of guys is making applications with an LLM and another
       | dude is making shit applications with the LLM, am I supposed to
       | be surprised at the latter instead of the former? Anyone can do
       | an LLM do weird shit, the skill and area of interest is in the
       | former.
        
       | spwa4 wrote:
       | LLMs learn from the internet. Refuse to admit they don't know
       | something. I have to admit I'm not entirely surprised by this.
        
         | ColinEberhardt wrote:
         | No, I'm not surprised either.
         | 
         | In fact, I'm much more surprised at just how capable their are
         | of such a wide range of task, given that they have just 'learnt
         | from the internet'!
        
         | rickydroll wrote:
         | I'm not surprised either. I see this as another example of
         | LLMs' emulating human behavior. I've met way too many people
         | that refuse to admit they didn't know something (he says while
         | looking in the mirror)
        
       | johnisgood wrote:
       | Claude does ask questions for clarification or asks me to provide
       | something it does not know though, at least it has happened many
       | times to me. At other times I will have to ask if it needs X or Y
       | to be able to answer more accurately, although this may be the
       | same case with other LLMs, too. The former though was quite a
       | surprise to me, coming from GPT.
        
         | ColinEberhardt wrote:
         | Ah, interesting - I've not had much experience with Claude,
         | will give it a go. Thanks.
        
         | zamalek wrote:
         | I am working on a pet project, using tactile "premium" 4/5-way
         | switches in super-ergonomic form-factor keyboard (initially
         | like the logitech vertical mouse, but that turned out awful).
         | The only model to not get hung up on Cherry MX and hallucinate
         | 4-way cherry switches has been Claude (the others did make
         | attempts at other manufacturers, but hallucinated part
         | numbers). It is significantly ahead of the competition.
        
         | jug wrote:
         | On this topic, SimpleQA benchmark has a component measuring
         | hallucination rate vs "know" vs "don't know". OpenAI models
         | have often been more troubled than the rest. See also, from the
         | paper: https://imgur.com/7NDZ0ON (you want a low "Incorrect"
         | score as it's an attempted answer, but wrong)
         | 
         | I wish hallucination benchmarks were far more popular.
        
       | nottorp wrote:
       | LLMs don't know period. They can be useful to summarize well and
       | redundantly publicized information, but they don't "know" even
       | that.
        
       | rowanseymour wrote:
       | I use copilot every day and every day I'm more and more convinced
       | that LLMs aren't going to rule the world but will continue to be
       | "just" neat autocomplete tools whose utility degrades the more
       | you expect from them.
        
         | delichon wrote:
         | Here's an actual sentence I typed yesterday: "the previous
         | three answers you gave me were hallucinations and i'm
         | skeptical, so confirm that this answer is not another one." But
         | then it actually gave me a different (5th) answer that was
         | useful, and it's not clear that reading the docs would have
         | been faster.
        
           | nh23423fefe wrote:
           | same. i was trying to do something random with java generics
           | today.
           | 
           | i got 3 wrong answers in a row (that i could easily confirm
           | were wrong by compiling)
           | 
           | then the 4th worked. it was much faster than reading the jvm
           | spec about wildcard generic subtyping relation (something ive
           | read before but couldn't quote) and it taught me something i
           | didn't know even though it was wrong
        
       | zamadatix wrote:
       | I wonder how much of this is an inherent problem which is hard to
       | work a solution into vs "confidently guessing the answer every
       | time yields a +x% gain for a modelon all of the other benchmark
       | results so nobody wants to reward opposite of that".
        
       ___________________________________________________________________
       (page generated 2025-03-06 23:01 UTC)