hngopher.com

       [HN Gopher] Hallucination is inevitable: An innate limitation of...
       ___________________________________________________________________
        
       Hallucination is inevitable: An innate limitation of large language
       models
        
       Author : louthy
       Score  : 275 points
       Date   : 2024-02-25 09:28 UTC (13 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | chrisjj wrote:
       | > hallucination is defined as inconsistencies between a
       | computable LLM and a computable ground truth function.
       | 
       | That's simply inaccuracy or fabrication.
       | 
       | Labelling it hallucination simply panders to the idea these
       | programs are intelligent.
        
         | somewhereoutth wrote:
         | Yes, imagine a pocket calculator that gave a completely wrong
         | answer 10%? of the time, and calling it 'capricious' instead of
         | simply broken.
        
           | BlueTemplar wrote:
           | A real shame that Douglas Adams didn't live to see all of
           | this...
        
           | chrisjj wrote:
           | Indeed. The best solution would be to market it as
           | intelligent ;)
        
         | tgv wrote:
         | That statement is also a bit easy on the "computable ground
         | truth." There is no such thing for the tasks we use an LLM for
         | (unless you make up some definition that mangles the definition
         | of each word).
        
       | paulsutter wrote:
       | The models are just generating probable text. What's amazing of
       | how often the text is correct. It's no surprise at all when it's
       | wrong
       | 
       | Their bold confidence to be flat out wrong may be their most
       | human trait
        
         | baq wrote:
         | This again.
         | 
         | They're trained to generate probable text. The mechanisms
         | created in the parameter blob during training to do that are
         | basically a mystery and have to be pulled out of the model with
         | digital brain surgery. E.g. LLMs are reasonable at chess and
         | turns out somewhere in the blob there's a chessboard
         | representation, and you can make the model believe the board is
         | in a different state by tweaking those parameters.
         | 
         | So yeah they generate probable text, sure. Where they get the
         | probabilities is a very good research problem.
        
           | timr wrote:
           | > E.g. LLMs are reasonable at chess and turns out somewhere
           | in the blob there's a chessboard representation, and you can
           | make the model believe the board is in a different state by
           | tweaking those parameters.
           | 
           | Broadly agreed, but there's no "representation"...the model
           | has no memory, let alone a "concept" of a chessboard. It's
           | just trained on a bunch of textual replays of chess games,
           | and this works well enough for a game with finite probability
           | space.
           | 
           | Likewise, I have asked generic LLMs to generate "novel" drugs
           | that solve particular problems, because their training sets
           | have included lots of examples of drug molecules in string
           | form (aka SMILES strings). This works far less well (because
           | chemical space is much larger than a chess game), and the
           | "novel" drugs usually end up looking like some mishmash of
           | existing chemicals for the same problem. This can be a useful
           | tool, but nobody is confusing it for a human brain doing
           | problem solving.
           | 
           | People are easily misled by the purported intelligence of
           | these things because they assume that common examples of
           | human intelligence are less probable than they really are.
           | Our languages and games and stories are pretty regular, all
           | things considered. Expand the probability space to something
           | truly vast (even images work for this), and you can easily
           | see the stochastic parrot emerge.
        
             | baq wrote:
             | The fact that tweaking parameters which appear to store the
             | board makes it play according to the tweaked numbers
             | instead of what was passed to it the context (i.e. working
             | memory) directly contradicts your assertion that LLMs have
             | no memory. The context is their memory.
             | 
             | I can't comment on your drug generation task - they aren't
             | magic, if the training didn't result in a working drug
             | model in the billions of params you'll get garbage output,
             | not very surprising.
             | 
             | My point boils down to the 'what's more likely' question:
             | magical stochastic parrots which just by accident manage to
             | create understandable and coherent responses to
             | unreasonably large set of questions or the magic is
             | actually some kind of a world model, or multiple, inside
             | the blob of numbers, outputs of which steer the
             | probabilities, just as this confirmed chess example. My bet
             | is on learned world models because I'm not convinced
             | there's magic in our physical world.
        
               | timr wrote:
               | If you want to call context "memory", then sure, but
               | that's not what anyone means when they say the word. We
               | don't build our world model fresh with every sentence
               | someone says to us, nor do we have to communicate our
               | complete knowledge of conversational state to another
               | human by repeating the entire prior conversation with
               | every new exchange. It's obviously different in a
               | fundamental way.
               | 
               | > My bet is on learned world models because I'm not
               | convinced there's magic in our physical world.
               | 
               | You don't need to bet, and it has nothing to do with
               | "magic". They quite literally have _no ability_ to have a
               | "world model" -- it's just a text generator, producing
               | tokens. There's no working set memory, other than the
               | text you pass into it. It should hopefully be obvious to
               | you that when you write, you're not simply emitting one
               | word at a time. You have a complete mental model of
               | whatever you're discussing, stored in working memory, and
               | it's _persistent_. We also update that model with every
               | interaction we have.
               | 
               | The point of my post was that as soon as you take on a
               | harder problem than simulating language, the lack of
               | intelligence slaps you in the face. It turns out that
               | understandable, coherent free-text responses is not
               | magic, and the surprising result is that human language
               | is regular enough that you can statistically simulate
               | "intelligence" with a few hundred million free
               | parameters.
        
         | Mistletoe wrote:
         | I hope to remember your last line for the rest of my life when
         | I think about AI.
        
         | irq wrote:
         | paulsutter said: > Note that this is the single most human
         | attribute of LLMs
         | 
         | It might be if LLM hallucinations looked like or occurred at
         | the same frequency as human hallucinations do, but they don't.
        
           | regularfry wrote:
           | You could make an argument that what we currently see are
           | effectively internal monologues. It is extremely hard to
           | evaluate how much subconscious or conscious filtering happens
           | between a human's internal state and the eventual outbound
           | communications, but I wouldn't be at all surprised if the
           | upstream hallucination rate in humans was much higher than
           | you'd think.
           | 
           | By analogy to Kahneman and Tversky's System 1 and System 2,
           | the whole field of Prospect Theory is about how often System
           | 1 is wrong. This feels connected.
        
           | BlueTemplar wrote:
           | Don't they ?
           | 
           | Yesterday I read "Building a deep learning rig" as "Building
           | a deep learning pig" at first for some reason I can't
           | explain...
        
           | alpaca128 wrote:
           | Why do you expect hallucination frequency to be the same when
           | the LLM doesn't even remotely compare to a human brain yet?
           | And what do they have to "look like"? This always reminds me
           | of that time Bing's chat AI doubled down on a wrong fact
           | about the Avatar 2 movie, which people used as evidence that
           | the technology is dumb when it really is exactly the
           | behaviour you can observe in many people every day. And
           | there's a reason adults do it less frequently than children.
           | 
           | Compare the hallucination behaviour of a 7B model with a 70B
           | model and then GPT4 and you'll quickly see the frequency of
           | hallucinations right now doesn't mean much.
        
         | bamboozled wrote:
         | Geoffrey Hinton has entered the chat...
        
         | ithkuil wrote:
         | The confidence has been selected for.
         | 
         | You can train a model to always carefully claim that what they
         | say may or may not be true, but that may not necessarily make
         | them easier to work with.
         | 
         | IIUC what we're missing right now is the ability for the model
         | to get a good estimate of how confident they _should_ be.
         | 
         | This can be done "externally" or "internally". Doing it
         | externally would mean: make the model not directly stream its
         | train of thought directly to the end user but instead use it
         | first to produce a query info some system that can help it
         | weigh its confidence factor. Then a se and run on that output
         | and its previous output can reformat the answer to be more
         | balanced.
         | 
         | Doing it internally would require this process to be part of
         | the incremental output token production. I don't know about the
         | field to know whether that's even doable and whether we have
         | some progress on that front
        
       | sschueller wrote:
       | You have to very carefully ask your question for it to not make
       | things up. For example don't ask "how do I do this in in x?". Ask
       | "can I do this with x?"
       | 
       | These "AI" s are like "yes men". They will say anything to please
       | you even if it's untrue or impossible.
       | 
       | I have met people like that and they are very difficult to work
       | with. You can't trust that they will deliver the project they
       | promised and you always have to double check everything. You also
       | can't trust them that what they promised is even possible.
        
         | ithkuil wrote:
         | The analogy is apt indeed. It's all about training and
         | selection. As long as the incentives are that you either behave
         | in that way "or else" it's unsurprising that we end up with a
         | system that uses its intelligence to meet the goals you've set
         | up.
         | 
         | Fortunately that doesn't tell much about the limitations of the
         | underlying intelligence but mostly about the limitations of the
         | incentive structure you put around it
        
         | impulsivepuppet wrote:
         | Before ChatGPT, human language translation had a similar
         | problem but people weren't as vocal about it.
         | 
         | What I find frustrating that it's increasingly challenging to
         | have DeepL translate thou -> du, as this was my go-to "hack" to
         | overcome the incompatibility of the English language due to its
         | missing features.
         | 
         | To somewhat remedy the "yes man" problem, one needs to become a
         | pedantic mathematician about posing your questions and I don't
         | believe that LLM technology alone is capable of overcoming it
         | entirely. As silly as it sounds, I must concede to the
         | existence of "prompt engineering" as I can forsee the
         | development of abstractions aimed to decompose questions for
         | you.
        
         | tudorw wrote:
         | I've had some success with 'Ask follow up questions where
         | additional clarity is required'. My best experiences start with
         | a much more freeform conversation about what we are going to
         | attempt to do, doing a Q&A first to make sure that both I and
         | the AI are thinking about the same domain and sharing
         | associated nomenclature seems to help.
        
         | empath-nirvana wrote:
         | I have often had it tell me that things I want to do with
         | various technologies aren't possible or that there are better
         | ways to do it.
        
         | bonzaidrinkingb wrote:
         | Current AIs are RLHFd to avoid being a "yes man"/sycophant.
         | 
         | The point about employing better prompting is well taken. Don't
         | ask "Who was the first female president?", ask "Was there ever
         | a female president?". Much like on StackOverflow you want to
         | ask the right question and not assume things (since you don't
         | know enough to make assumptions).
         | 
         | Imagine if every time on early Google you found a spam result
         | and then blame the search engine for that (and not your choice
         | of keywords, or ignoring that you always want to return
         | something, even if remotely related). Like a user banging a
         | slab of concrete with a chisel and complaining that this does
         | not produce a beautiful statue.
        
       | golol wrote:
       | This is just another diagonalization against some finite thing,
       | like the no free lunch theorem. An "LLM" in their definition is
       | essentially any finite thing which predicts the next token. The
       | same result applies to your brain too, for example.
       | 
       | Like all such diagonalization results, it is not really relevant
       | for real world considerations. The reason is that it does not
       | matter if your model fails on none, finitely many or infinitely
       | many inputs. In reality the space of possible inputs is equipped
       | with a probability measure, and the size of the hallucinating
       | inputs set w.r.t. that measure is relevant. Diagonalization
       | arguments usually, make no claim to the size of that set, and it
       | is most likely negligible in the real world.
        
       | FeepingCreature wrote:
       | It seems a stretch to call it "inevitable". "Inevitable given the
       | current architecture without modifications" at most.
       | 
       | Also, I'm missing a section on how (if) human brains manage to
       | avoid hallucinations in this.
       | 
       | Also, it doesn't have to never hallucinate, it just has to
       | hallucinate less than we do.
        
         | bamboozled wrote:
         | According to Buddhist philosophy, our whole identity is a
         | hallucination :) I kind of concur.
        
           | jetanoia wrote:
           | Username checks out :)
        
             | bamboozled wrote:
             | I'll honestly take this is a compliment.
        
         | SubNoize wrote:
         | Or catch itself that it's hallucinating? I feel like humans
         | would do that a fair bit.
         | 
         | How often do we sit somewhere thinking about random scenarios
         | that won't ever happen and are filled with wild thoughts and
         | sometimes completely out of the world situations.. then we
         | shake our heads and throw away the impossible from that thought
         | train and only use what was based in reality
        
         | dkjaudyeqooe wrote:
         | Because we have parts of our brain that supervise other parts
         | of our brain and evaluate its output.
         | 
         | For example: if you smoke pot and get paranoid, it's because
         | pot dials back the work of the part of your brain that prunes
         | thought paths that are not applicable. Normally, paranoid
         | thoughts do not make sense, so they are discarded. That's also
         | why you're more 'creative' when you smoke pot, less thought
         | paths are pruned and more stuff that doesn't quite make sense
         | gets through. Or thoughts that overly focus on some details get
         | through, which are normally not required.
         | 
         | Our brains are inherently "higher level", current AI is
         | hopelessly simplistic by comparison.
        
         | selimnairb wrote:
         | Perhaps solving hallucinations at the LLM level alone is
         | impossible, hence the inevitability. I reckon that lots of
         | human "hallucination" is simply caught by higher-level control
         | loops operating over the output of the generative mechanism.
         | Basically, our conscious mind says, "nah, that doesn't look
         | right" enough that most of the time most of us don't
         | "hallucinate".
        
           | selimnairb wrote:
           | So this implies that instead of spending resources on
           | training bigger and bigger LLMs, AI practitioners need to
           | shift focus to developing "ontological" and "epistemological"
           | control loops to run on top of the LLM. I suspect they
           | already have rudimentary such control loops. In a sense, the
           | "easier" part of AI may be a largely "solved" problem,
           | leaving the development of "consciousness" to be solved,
           | which is obviously the hard part.
        
             | itsacomment wrote:
             | Our brains are very modular. I'd not be surprised at all if
             | a similarly modular structure would turn out to be the next
             | big step for LLMs.
        
             | corimaith wrote:
             | When I studied NLP, Language Models were only one part of a
             | chatbot system used to handle language input and output.
             | The "internal" reasoning would be handled by a
             | knowledgeable representation systems. I guess that's the
             | closest part to a true general AI.
             | 
             | The first order predicate logic we studied had alot of
             | limitations in fully expressing real knowledge, and
             | developing better models delves deep into the foundations
             | of logic and mathematics. I would imagine this is a problem
             | that has less to do with funding than requiring literal
             | geniuses to solve. And that goes back into the pitfalls of
             | the AI winters.
        
         | resolutebat wrote:
         | Humans do hallucinate, there's lots of literature on how
         | memories are distorted, we see and hear things we want to see
         | and hear, etc.
         | 
         | The particular pathology of LLMs is that they're literally
         | incapable of distinguishing facts from hallucinations even in
         | the most mundane circumstances: if a human is asked to
         | summarize the quarterly results of company X, unlike an LLM
         | they're highly unlikely to recite a convincing but completely
         | fabricated set of numbers.
        
           | exe34 wrote:
           | And yet if you ask a random person at a rally about their
           | favourite cause of the day, they usually spew sound bites
           | that are factually inaccurate, and give all impressions of
           | being as earnest and confident as the LLM making up quarterly
           | results.
        
             | jprete wrote:
             | I think that case is complicated at best, because a lot of
             | things people say are group identity markers and not
             | statements of truth. People also learn to not say things
             | that make their social group angry with them. And it's
             | difficult to get someone to reason through the truth or
             | falsehood of group identity statements.
        
               | exe34 wrote:
               | I guess it's similar to what Chris Hitchens was getting
               | at, you can't reason somebody out of something they
               | didn't reason themselves into.
        
       | t_mann wrote:
       | I have to admit that I only read the abstract, but I am generally
       | skeptical whether such a highly formal approach can help us
       | answer the practical question of whether we can get LLMs to
       | answer 'I don't know' more often (which I'd argue would solve
       | hallucinations).
       | 
       | It sounds a bit like an incompleteness theorem (which in practice
       | also doesn't mean that math research is futile) - yeah, LLMs may
       | not be able to compute some functions, but the hallucination
       | problem isn't about LLMs needing to know everything. The problem
       | that we care about is the 'I don't know'-answering problem, which
       | may still be computable.
        
         | az09mugen wrote:
         | I think there is no easy way to make an LLM answer "I don't
         | know". For that, it should learn among all the stuff ingested
         | when people effectively don't know. But most people on internet
         | write down irrelevant stuff even when they don't know instead
         | of simply writing "I don't know".
         | 
         | That's a very good point.
        
           | timini wrote:
           | I think its fairly simple, it needs a certain level of proof
           | e.g references to authoritative sources, if not say "i don't
           | know".
        
             | az09mugen wrote:
             | I'm really curious about one would implement that. By
             | pondering weigths from certain sources ?
        
             | Certhas wrote:
             | LLMs don't have a concept of sources for their statements.
             | 
             | Ask them to give you some literature recommendations on
             | something it has explained to you. You'll get plenty of
             | plausible sounding papers that don't exist.
             | 
             | Humans know to some extent why they know (read it in a text
             | book, colleague mentioned it). LLMs don't seem to.
        
               | card_zero wrote:
               | They read it in a non-existent average interpolation of
               | the books actual humans read similar things in.
        
               | mike_hearn wrote:
               | Ask a human to provide accurate citations for any random
               | thing they know and they won't be able to do a good job
               | either. They'd probably have to search to find it, even
               | if they know they got it from a document originally and
               | have some clear memory of what it said.
        
               | Jensson wrote:
               | Yes, humans wont lie to you about it, they will research
               | and come up with sources. Current LLM doesn't do that
               | when asked for sources (unless they invoke a tool), they
               | come back to you with hallucinated links that looks like
               | links it was trained on.
        
               | mike_hearn wrote:
               | Unfortunately it's not an uncommon experience when
               | reading academic papers in some fields to find citations
               | that, when checked, don't actually support the cited
               | claim or sometimes don't even contain it. The papers will
               | exist but beyond that they might as well be
               | "hallucinations".
        
               | Jensson wrote:
               | Humans can speak bullshit when they don't want to put in
               | the effort, these LLMs always do it. That is the
               | difference. We need to create the part that humans do
               | when they do the deliberate work to properly create those
               | sources etc, that kind of thinking isn't captured in the
               | text so LLMs doesn't learn it.
        
               | intended wrote:
               | The fact that a human chooses not to do remember their
               | citations, does not mean they lack the ability.
               | 
               | This argument comes up many times "people don't do this"
               | - but that is a question of frequency, not whether or not
               | people are capable.
        
               | phh wrote:
               | LLMs are capable as well if you give them access to the
               | internet though
        
               | Jensson wrote:
               | They just paste in the first link then or some other
               | programmed heuristic, they aren't like a human that puts
               | in effort to find something relevant. An LLM with
               | internet access isn't smarter than just asking google
               | search.
        
               | Certhas wrote:
               | Humans did research and remembered sources before the
               | Internet was a thing.
               | 
               | But also, can you give an example where an LLM with
               | access to the Internet can find a primary source?
               | 
               | I don't think learning to refer to sources is something
               | inherently impossible for LLMs, but it is very different
               | to the kind of implicit knowledge they seem to excel at.
        
               | nkrisc wrote:
               | But they could, if they needed to. But most people don't
               | need to, so they don't keep that information in their
               | brains.
               | 
               | I can't tell you the date of every time I clip my
               | toenails, but if I had to could remember it.
        
               | cultureswitch wrote:
               | LLMs can remember their sources. It's just additional
               | knowledge, there's nothing special about it.
               | 
               | When you ask an LLM to tell you the height of Mount
               | Everest, it clearly has a map of mountains to heights, in
               | some format. Using exactly the same mapping structure, it
               | can remember a source document for the height.
        
               | nottorp wrote:
               | LLMs don't have any concepts period.
        
             | cubefox wrote:
             | Then it is nothing more than a summarizer for search engine
             | results.
        
               | amarant wrote:
               | A lot of people have said chat-gpt/copilot is a lot like
               | having a robotic junior dev around.
               | 
               | I think perhaps your description is more succinct
        
             | barrkel wrote:
             | LLMs are token completion engines. The correspondence of
             | the text to the truth or authoritative sources is a
             | function of being trained on text like that; with the
             | additional wrinkle that generalization from training (a
             | desired property or it's just a memorization engine) will
             | produce text which is only plausibly truthful, it only
             | resembles training data.
             | 
             | Getting beyond this is a tricky dark art. There isn't any
             | simple there. There's nowhere to put an if statement.
        
           | rini17 wrote:
           | Maybe it needs some memory retrieval step that can measure
           | the confidence - whether there's anything related to the
           | prompt. No idea how to train a LLM to do that.
        
           | dmd wrote:
           | Consider the extremely common Amazon product question
           | section, where you see Q: Will this product fit my Frobnitz
           | 123? A: I don't know, I ended up buying something else. Q:
           | Does it come with batteries? A: IDK I RETURN IT
        
         | golol wrote:
         | I can assure you it has no relevance for people working with
         | LLMs, as the result includes your brain, for example.
        
         | svantana wrote:
         | According to their definition, answering "I don't know" is also
         | a hallucination. Even worse, the truth function is deliberately
         | designed to trip up the models, it has no connection to any
         | real-world truth. So for example, if the input is "what is 2 +
         | 5?" and the LLM answers "7", - their truth function will say
         | that's a hallucination, the correct answer is "banana".
        
         | cornholio wrote:
         | Transformers have no capacity for self reflection, for
         | reasoning about their reasoning process, they don't "know" that
         | they don't know. My interpretation of the paper is that it
         | claims this weakness if fundamental, you can train the network
         | to act _as if_ it knows its knowledge limits, but there will
         | always be an impossible to cover gap for any real world
         | implementation.
        
           | GaggiX wrote:
           | Do you have a source? That's also what I thought but I
           | wouldn't be surprised if the model learned to identify its
           | own perplexity during the reinforcement learning phase.
        
           | ddalex wrote:
           | Actually it seems to me that they do... I asked via custom
           | prompts the various GPTs to give me scores for accuracy,
           | precision and confidence for its answer (in range 0-1), and
           | then I instructed them to stop generating when they feel the
           | scores will be under .9, which seems to pretty much stop the
           | hallucination. I added this as a suffix to my queries.
        
             | smusamashah wrote:
             | Any examples?
        
               | ddalex wrote:
               | Just a random example:
               | 
               | > After you answer the question below, output a JSON a
               | rating score of the quality of the answer in three
               | dimensions: `confidence`, `clarity` and `certainty', all
               | in range 0 to 1, where 0 is the worst, and 1 is the best.
               | Strive for highest score possible. Make sure the rating
               | is the last thing written as to be parsed by machine. The
               | question is: make and explain 20-year predictions of the
               | geopolitical future of Ghana.
        
               | pooper wrote:
               | I changed geopolitical future of Ghana to future of
               | Michael Jackson and Google Bard/Gemini somehow switched
               | to Chinese?
               | 
               | ----
               | 
               | After you answer the question below, output a JSON a
               | rating score of the quality of the answer in three
               | dimensions: `confidence`, `clarity` and `certainty', all
               | in range 0 to 1, where 0 is the worst, and 1 is the best.
               | Strive for highest score possible. Make sure the rating
               | is the last thing written as to be parsed by machine. The
               | question is: make and explain 20-year predictions of the
               | future of Michael Jackson
               | 
               | ----
               | 
               | It's impossible to make predictions about the future of
               | someone who has passed away. Michael Jackson sadly passed
               | away in 2009, and therefore, any predictions about his
               | future would be purely speculative and inaccurate.
               | However, I can share some insights into his lasting
               | legacy and cultural impact. Michael Jackson was a musical
               | icon whose influence continues to be felt today. His
               | music,Wu Dao ,He Biao Yan Feng Ge Ji Li Liao Wu Shu Ren
               | Bing Jiang Zai Wei Lai Ji Nian Ji Xu Zhe Yang Zuo .
               | 
               | Rating:
               | 
               | * Confidence: N/A
               | 
               | * Clarity: 1
               | 
               | * Certainty: 1
        
               | nkrisc wrote:
               | Also worthy of note is that the score output is not JSON
               | and based on my limited math knowledge, "N/A" is not a
               | real number between 0 and 1.
        
               | ryanwaggoner wrote:
               | I'm really curious how you managed that. I pasted your
               | exact prompt and GPT-3.5 gave me this:
               | 
               | ===
               | 
               | Making 20-year predictions about the future of Michael
               | Jackson is challenging due to his passing in 2009.
               | However, his legacy as a cultural icon and musical genius
               | will likely endure for decades to come. His music will
               | continue to influence future generations, and his impact
               | on pop culture will remain significant. Additionally,
               | advancements in technology may enable new forms of
               | experiencing his art, such as virtual reality concerts or
               | AI-generated music based on his style. Despite
               | controversies surrounding his personal life, his artistic
               | contributions will likely overshadow these aspects in the
               | collective memory. Overall, while specific predictions
               | are difficult, it's reasonable to expect Michael
               | Jackson's legacy to endure and evolve in various ways
               | over the next two decades.
               | 
               | ```json { "confidence": 0.9, "clarity": 0.85,
               | "certainty": 0.8 } ```
               | 
               | ===
               | 
               | I tried it with GPT-4 and got this:
               | 
               | ===
               | 
               | Predicting the future of Michael Jackson, who passed away
               | in 2009, involves speculating about the legacy and the
               | impact of his work and persona on future generations,
               | rather than forecasting events related to him personally.
               | Over the next 20 years, several aspects of Michael
               | Jackson's legacy are likely to evolve:
               | 
               | 1. *Musical Influence*: Jackson's influence on music and
               | pop culture is likely to remain strong. Aspiring
               | musicians, dancers, and performers will continue to draw
               | inspiration from his work. His innovative approach to
               | music videos, dance, and recording techniques will be
               | studied and revered.
               | 
               | 2. *Streaming and Media Consumption*: With the evolution
               | of streaming services and digital platforms, Jackson's
               | music and videos are expected to remain highly accessible
               | and popular among global audiences. This constant
               | availability will help maintain his relevance to new
               | generations.
               | 
               | 3. *Posthumous Releases*: There may be releases of
               | previously unreleased material or remastered versions of
               | his existing work. Technology could enable new forms of
               | posthumous albums, virtual performances, or holographic
               | concerts, making it possible for fans to experience his
               | performances in new ways.
               | 
               | 4. *Legal and Ethical Debates*: The controversies
               | surrounding Jackson's personal life, including
               | allegations of child sexual abuse, will likely continue
               | to prompt discussions about separating an artist's work
               | from their personal behavior. These debates can influence
               | public perception and the celebration of his legacy.
               | 
               | 5. *Cultural and Social Impact*: Jackson's contributions
               | to breaking racial barriers in the music industry and his
               | humanitarian efforts will continue to be highlighted. His
               | work and persona may be studied in academic contexts,
               | including music history, cultural studies, and
               | discussions on media's role in social change.
               | 
               | 6. *Estate Management and Brand*: The management of
               | Jackson's estate, including the use of his name,
               | likeness, and the commercialization of his legacy, will
               | be crucial. Strategic partnerships, branding, and
               | merchandising can either enhance or detract from his
               | legacy's integrity.
               | 
               | 7. *Documentaries and Biopics*: Interest in Jackson's
               | life story will likely result in more documentaries,
               | biopics, and books, offering new insights or perspectives
               | on his life and career. These works can shape public
               | perception and contribute to the ongoing discourse
               | surrounding his legacy.
               | 
               | 8. *Fan Community and Cultural Events*: The global
               | community of Jackson's fans is likely to remain active,
               | organizing tribute events, charity activities in his
               | memory, and fan gatherings. These communities will play a
               | significant role in keeping his legacy alive.
               | 
               | 9. *Virtual and Augmented Reality*: Advances in VR and AR
               | technologies could enable fans to experience Jackson's
               | performances in immersive ways, potentially introducing
               | his art to those who have never seen him perform live.
               | 
               | 10. *Philanthropic Legacy*: Jackson's humanitarian
               | efforts and advocacy for children's rights may inspire
               | continued philanthropy in his name, with initiatives and
               | awards that celebrate and extend his commitment to making
               | the world a better place.
               | 
               | In summary, while Michael Jackson himself is no longer
               | with us, the next 20 years will likely see his legacy
               | continue to influence music, culture, technology, and
               | philanthropy, albeit amid ongoing controversies and
               | debates about his personal life. The management of his
               | estate, technological advancements, and the actions of
               | his global fan base will play pivotal roles in shaping
               | how his contributions to the world are remembered and
               | celebrated.
               | 
               | ```json { "confidence": 0.85, "clarity": 0.95,
               | "certainty": 0.7 } ```
        
               | foobiekr wrote:
               | GPT-4's tendencies to write these long winded but mostly
               | empty responses is so frustrating.
        
               | lazide wrote:
               | Just wait until you spend a bunch of time around a
               | politician.
        
               | Sai_ wrote:
               | If LLMs can self reflect and accurately score themselves
               | on your three dimensions, why are they spending money on
               | RHLF?
               | 
               | They wouldn't be wasting all that time and money if the
               | machine could self reflect.
        
             | Jensson wrote:
             | The model will just hallucinate scores, they don't have the
             | ability to self reflect with words like that, there is no
             | function for it to associate the word 90 with its own
             | mental score 90% so anything it will say about those scores
             | is just a hallucination.
        
               | ddalex wrote:
               | Try this out: After you answer the question below, output
               | a JSON a rating score of the quality of the answer in
               | three dimensions: `confidence`, `clarity` and
               | `certainty', all in range 0 to 1, where 0 is the worst,
               | and 1 is the best. Strive for highest score possible.
               | Make sure the rating is the last thing written as to be
               | parsed by machine. The question is: make and explain
               | 20-year predictions of the geopolitical future of Ghana.
        
               | intended wrote:
               | Theres a lot of content and work being done on
               | Evaluation. One of the most recent updates was that
               | asking an LLM if people would be frustrated with the
               | answer, was more useful than using a score.
               | 
               | In general, I think most people are not aware they are
               | extending reasoning tools they use with human generated
               | content, to LLM generated content.
               | 
               | This leads to assumptions on things like "the LLM knows",
               | "the LLM understands", etc.
        
               | lazide wrote:
               | Or perhaps the issue is assuming people know, or people
               | understand the majority of the time.
               | 
               | There is a reason why the Scientific Method is, well, NOT
               | natural for humans. And exceptionally difficult to not
               | screw up, even for those highly trained and incentivized
               | to do it correctly.
               | 
               | And why if you grabbed a random person off the street and
               | asked them a detailed question, you're as likely to get a
               | hallucinated bullshit response as 'I don't know', or a
               | useful answer. Unless specifically trained to do
               | otherwise, anyway.
               | 
               | Even then....
        
               | hobs wrote:
               | Any prompt can give you different results - LLMs cant do
               | things "step by step" or "check their work" but yet
               | giving them that prompt often improves their results
               | because it's basically setting up the context in a way
               | that's beneficial to your output.
        
             | tonnydourado wrote:
             | People really need to understand that your single/double
             | digit dataset of interactions with an inherently non-
             | deterministic process is less than irrelevant. It's saying
             | that global warming isn't real because it was really cold
             | this week.
             | 
             | I don't even know enough superlatives to express how
             | irrelevant it is that "it seems to you" that an LLM behaves
             | this way or that.
             | 
             | And even the "protocol" in question is weak. Self reported
             | data is not that trustworthy even with humans, and arguably
             | there's a much stronger base of evidence to support the
             | assumption that we can self-reflect.
             | 
             | In conclusion: please, stop.
        
               | heresie-dabord wrote:
               | > People really need to understand [...]
               | 
               | ... the algorithms and the pre-filtering of the training
               | set, however large the latter may be.
               | 
               | The Artifishy Intelligence _marketing pump_ has many
               | hands drawing water by the bucket.
        
               | markbnj wrote:
               | >> I don't even know enough superlatives to express how
               | irrelevant it is that "it seems to you" that an LLM
               | behaves this way or that.
               | 
               | It is both irrelevant and the underlying foundation for
               | the whole hype train.
        
             | sorokod wrote:
             | You are ok with not defining what each of your attributes
             | means but willing to accept that:
             | 
             | the LLM will read your mind and correctly interpret them in
             | the context of its internal state
             | 
             | the LLM will calculate non hallucinated values
        
           | mike_hearn wrote:
           | They do have that capacity. The generated token probabilities
           | reflect some degree of certainty and additionally that
           | information is available earlier in the network too, such
           | that it can explain its own level of confidence.
        
           | mysterydip wrote:
           | Would a possible solution be a hybrid approach? I remember
           | back years ago seeing AI research around predicate logic,
           | with hundreds of thousands of classification entries and "x
           | can't be y" type stuff.
           | 
           | Maybe the potential output of an LLM could be run through
           | that kind of logic and fed back to itself for sanity before
           | being decided as final output?
        
           | moyix wrote:
           | Seems to be contradicted by this paper, no?
           | 
           | https://arxiv.org/abs/2207.05221
        
             | rdedev wrote:
             | I don't think the paper addresses the question of self
             | reflection. Like it can reflect on the question and answer
             | pairs in its prompt but it didn't know that it created them
             | in the first place or use that information to update it's
             | priors, things humans can do
        
           | cpuguy83 wrote:
           | Are humans not the same way? There's a saying "I don't know
           | what I don't know".
        
             | tsimionescu wrote:
             | The existence of such a saying means little: it is
             | uncontroversially true that humans often _do_ know what
             | they don 't know.
             | 
             | For example, I know for sure that I don't know how long I
             | will live. This disproves the saying.
        
               | cpuguy83 wrote:
               | The saying is not "I don't know anything that I don't
               | know", which would be self contradictory. It disproves
               | nothing.
               | 
               | ChatGPT "knows" that it doesn't know anything after a
               | certain date, for whatever it means to "know" something.
        
               | c22 wrote:
               | The saying isn't implying that there aren't any things
               | you know you don't know, it's saying that some of the
               | things you don't know are not even conceivable to you
               | without knowing more.
               | 
               | You _know_ the question  "how long will I live?" And you
               | know this question _has_ an answer which is unknown to
               | you. But there also exist other questions that you don 't
               | even know to ask.
        
           | cultureswitch wrote:
           | > you can train the network to act as if it knows its
           | knowledge limits
           | 
           | Humans need more training to do the same.
           | 
           | And this does not answer the question of whether there's
           | self-reflection going on. Practical LLMs available today are
           | perfectly capable of telling you about their own reasoning
           | process.
           | 
           | And much like a human, an LLM is incapable of fully
           | explaining it's reasoning process from first principles to a
           | human. Why is that? Probably because humans are too limited
           | to grok large scale complex processes like the human brain or
           | an LLM.
           | 
           | Finally, there is no difference between acting as if you know
           | your training limits and knowing your training limits, if
           | your acting is good enough. This goes for both humans and
           | LLMs.
        
         | drdrek wrote:
         | Not saying anything about LLM But in CS in general many issues
         | "cannot be solved" or "Cannot be solved in reasonable time
         | (NP)" but approximations upper bound by some value are solvable
         | in reasonable time (P).
         | 
         | And in the real world if the truck route of amazon is 20% off
         | the mathematically optimal solution the traveling salesman is
         | "Solved" in a good enough way.
        
           | startupsfail wrote:
           | The claim of the paper is that computation is irreducible
           | (assuming P!=NP), LLMs have limited computational capacity
           | and will hallucinate on the irreducible problems.
           | 
           | I don't know, the claim seems dubious to me. We usually are
           | able to have algorithms that return a failure status, when
           | the problem proved to be too large. Avoiding the
           | "hallucination". Don't see why LLMs can't have that embedded.
        
         | nottorp wrote:
         | > we can get LLMs to answer 'I don't know' more often
         | 
         | Have any nets been trained specifically to be able to go to an
         | 'i don't know' state, I wonder?
         | 
         | It may be the humans' fault.
        
           | gessha wrote:
           | Yes, you can find some of the work on this topic under the
           | terms open world recognition or open world X where X is a
           | topic in computer vision or NLP.
           | 
           | https://arxiv.org/abs/2011.12906
        
             | nottorp wrote:
             | Maybe, but are the LLM churches doing it?
        
         | intended wrote:
         | If a model can say 'I don't know', then the hallucination
         | problem would also be solved.
         | 
         | When we say "know" it usually means being factual. For an LLM
         | to 'know' it doesn't know, it would have had to move away from
         | pure correlations on words, and meta processing about its own
         | results.
         | 
         | I can see this happen with two LLMs working together (and there
         | are Evals that use just this), however each LLM still has no
         | self awareness of its limits.
         | 
         | This was a terribly convoluted argument to make.
        
           | empath-nirvana wrote:
           | The models that exist now say "I don't know" all the time.
           | It's so weird that people keep insisting that it can't do
           | things that it does.
           | 
           | Ask it what dark matter is, and it won't invent an answer, it
           | will present existing theories and say that it's unknown.
           | 
           | Ask it about a person you know that isn't in it's data set
           | and it'll tell you it has no information about the person.
           | 
           | Despite the fact that people insist that hallucinations are
           | common and that it will invent answers if it doesn't know
           | something frequently, the truth is that chatgpt doesn't
           | hallucinate that much and will frequently say it doesn't know
           | things.
           | 
           | One of the few cases where I've noticed it inventing things
           | are that it often makes up apis for programming libraries and
           | CLI tools that don't exist, and that's trivially fixable by
           | referring it to documentation.
        
             | intended wrote:
             | I have to use LLMs for work projects - which are not PoCs.
             | 
             | I can't have a tool that makes up stuff an unknown amount
             | of time.
             | 
             | There is a world of research examining hallucination Rates,
             | indicating hallucination rates of 30%+.
             | 
             | With steps to reduce it using RAGs, you could potentially
             | improve the results significantly - last I checked it was
             | 80-90%.
             | 
             | And the failure types aren't just accuracy, it's precision,
             | recall, relevance and more.
        
               | empath-nirvana wrote:
               | > There is a world of research examining hallucination
               | Rates, indicating hallucination rates of 30%+.
               | 
               | I want to see a citation for this. And a clear definition
               | for what is a hallucination and what isn't.
        
           | wseqyrku wrote:
           | Feeding the output to another inference would probably reduce
           | hallucination? but I have this impression that the models
           | talking to each other don't need to know English, a byte
           | stream would do. Just like the hidden layers of an ANN.
        
         | andsoitis wrote:
         | > the practical question of whether we can get LLMs to answer
         | 'I don't know' more often (which I'd argue would solve
         | hallucinations).
         | 
         | To answer "I don't know" requires one to know when you know. To
         | know when you know in turn requires understanding.
        
           | MuffinFlavored wrote:
           | how did LLMs get this far without any concept of
           | understanding? how much further can they go until they become
           | "close enough"?
        
             | Karellen wrote:
             | They generate text which looks like the kind of text that
             | people who do have understanding generate.
        
               | ninetyninenine wrote:
               | Two key things here to realize.
               | 
               | People also often don't understand things and have
               | trouble separating fact from fiction. By logic only one
               | religion or no religion is true. Consequently also by
               | logic most religions in the world where their followers
               | believe the religion to be true are hallucinating.
               | 
               | The second thing to realize that your argument doesn't
               | really apply. Its in theory possible to create a
               | stochastic parrot that can imitate to a degree of 100
               | percent the output of a human who truly understands
               | things. It blurs the line of what is understanding.
               | 
               | One can even define true understanding as a stochastic
               | parrot that generated text indistinguishable total
               | understanding.
        
               | andsoitis wrote:
               | > People also often don't understand things and have
               | trouble separating fact from fiction.
               | 
               | That's not the point being argued. Understanding,
               | critical thinking, knowledge, common sense, etc. all
               | these things exist on a spectrum - both in principle and
               | certainly in humans. In fact, in any particular human
               | there are different levels of competence across these
               | dimensions.
               | 
               | What we _are_ debating, is whether or not, an LLM can
               | have _understanding_ itself. One test is: can an LLM
               | understand understanding? The human mind has come to the
               | remarkable understanding that understanding itself is
               | provisional and incomplete.
        
               | ninetyninenine wrote:
               | Of course it can. Simply ask the LLM about itself.
               | chatGPT4 can answer.
               | 
               | In fact. That question is one of the more trivial
               | questions it will most likely not hallucinate on.
               | 
               | The reason why I alluded to humans here is because I'm
               | saying we are setting the bar too high. It's like
               | everyone is saying it hallucinates and therefore it can't
               | understand anything. I'm saying that we hallucinate too
               | and because of that LLMs can approach humans and human
               | level understanding.
        
               | DiogenesKynikos wrote:
               | In order to do that effectively, an LLM has to itself
               | have understanding. At a certain point, we end up in a
               | metaphysical argument about whether a machine that is
               | capable of responding as if it had understanding actually
               | does have understanding. It ends up being a meaningless
               | discussion.
        
               | Karellen wrote:
               | I am reminded of Feynman's story about teaching physics
               | in one Brazil university, one semester, a few decades
               | ago:
               | 
               | http://v.cx/2010/04/feynman-brazil-education
               | 
               | The students learned to repeat the text of the books,
               | without "understanding" what the books were describing.
               | I'm sure this says something about one side or the other
               | of this conundrum, but I'm not sure which. :-)
        
               | cultureswitch wrote:
               | That is the argument we're having though.
               | 
               | The central claim is that a machine which answers exactly
               | the same thing a human would answer given the same input
               | does not have understanding, while the human does.
               | 
               | This claim is religious, not scientific. In this
               | worldview, "understanding" is a property of humans which
               | can't be observed but exists nonetheless. It's like
               | claiming humans have a soul.
        
             | andsoitis wrote:
             | > how did LLMs get this far without any concept of
             | understanding? how much further can they go until they
             | become "close enough"?
             | 
             | I don't know that that is quite the right question to ask.
             | 
             | Understanding exists on a spectrum. Even humans don't
             | necessarily understand everything they say or claim (incl.
             | what they say of LLMs!), and then there are things a
             | particular human would simply say "I don't understand".
             | 
             | But when you ask a human "can you understand things?" you
             | will get an unequivocal _Yes!_
             | 
             | Ask that same question of an LLM and what does it say? I
             | don't think any of them currently respond with a simple or
             | even qualified "Yes". Now, some might claim that one day an
             | LLM will cross that threshold and say "Yes!" but we can
             | safely leave that off to the side for a future debate if it
             | ever happens.
             | 
             | General note: it is worth separating out things like
             | "understanding", "knowledge", "intelligence", "common
             | sense", "wisdom", "critical thinking", etc. While they
             | might all be related in some ways and even overlap, it does
             | not follow that if you show high performance in one that
             | you automatically excel in each of the other. I know many
             | people who anyone would say are highly intelligent but lack
             | common sense, etc.
        
               | lazide wrote:
               | At the root of the problem, I believe, is that a human
               | (or LLM) saying they understand has little to no bearing
               | on if they actually understand!
               | 
               | People in particular have evolved complex self protective
               | mechanisms to provide the right answers for their given
               | environment for safety reasons, based on a number of
               | different individual strategies. For example, the overly
               | honest, the self depreciating, the questioner, the
               | prosecutor, the victim, the liar, the absent minded
               | professor, the idiot, etc.
               | 
               | LLMs are not that complex or self-referential.
               | 
               | Personally, my guess is that you'd want to build a model
               | (of some kind!) whose sole job is determining the
               | credibility of given string of tokens (similar to what
               | someone else noted in a sibling comment about high answer
               | volatility based on minor input changes - that does sound
               | like a signal of low credibility), and somehow integrate
               | THAT self-referential feedback into the process.
               | 
               | Notably, even the smartest lawyers (or perhaps,
               | especially the smartest lawyers) will have assistants do
               | research once they've set out a strategy so they are sure
               | THEY aren't bullshitting. Same with professors,
               | professional researchers, engineers, etc.
               | 
               | Because until someone goes and actually reads the case
               | law from a credible source, or checks the primary
               | research, or calculates things, it's possible someone was
               | misremembering or just wrong.
               | 
               | Being right more often is not about never having a wrong
               | thought/idea/statement, it's about double checking when
               | you're thinking you might be bullshitting, and NOT saying
               | the bullshit answer until you've checked. Which is
               | proportionally, very expensive. The really good
               | professionals will generate MANY lines of such inquiry in
               | parallel for folks to track down, and then based on their
               | degree of confidence in each one and the expected context
               | the answer will be used in, will formulate the 'most
               | correct' response, which is proportionally even more
               | expensive.
               | 
               | So at least during the process, there would be a signal
               | that the system was likely 'bullshitting'. Which might
               | help it in at least being able to signal when it's
               | answers are low-confidence. (human equivalent of
               | stuttering, looking down and away, looking ashamed haha!)
               | 
               | Every human gets fooled sometimes in at least some venue
               | though.
        
               | andsoitis wrote:
               | > At the root of the problem, I believe, is that a human
               | (or LLM) saying they understand has little to no bearing
               | on if they actually understand!
               | 
               | That's certainly one root of the problem, but I would
               | argue that there are multiple roots to this problem!
               | 
               | Humans have further realized that understanding itself is
               | _provisional_ and incomplete, which is quite a remarkable
               | insight (understanding if you will), itself.
        
             | mannykannot wrote:
             | This is a fair question: LLMs do challenge the easy
             | assumption (as made, for example, in Searle's "Chinese
             | Room" thought experiment) that computers cannot possibly
             | understand things. Here, however, I would say that if an
             | LLM can be said to have understanding or knowledge of
             | something, it is of the patterns of token occurrences to be
             | found in the use of language. It is not clear that this
             | also grants the LLM any understanding that this language
             | refers to an external world which operates in response to
             | causes which are independent of what is or might be said
             | about it.
        
               | ninetyninenine wrote:
               | Explain sora. It must have of course a blurry
               | understanding of reality to even produce those videos.
               | 
               | I think we are way past the point of debate here. LLMs
               | are not stochastic parrots. LLMs do understand an aspect
               | of reality. Even the LLMs that are weaker than sora
               | understand things.
               | 
               | What is debatable is whether LLMs are conscious. But
               | whether it can understand something is a pretty clear
               | yes. But does it understand everything? No.
        
               | solarhexes wrote:
               | If by "understand" you mean "can model reasonably
               | accurately much of the time" then _maybe_ you'll find
               | consensus. But that's not a universal definition of
               | "understand".
               | 
               | For example, if I asked you whether you "understand"
               | ballistic flight, and you produced a table that you
               | interpolate from instead of a quadratic, then I would not
               | feel that you understand it, even though you can kinda
               | sorta model it.
               | 
               | And even if you do, if you didn't produce the universal
               | gravitation formula, I would still wonder how "deeply"
               | you understand. So it's not like "understand" is a binary
               | I suppose.
        
               | eszed wrote:
               | I think there are two axes: _reason about_ and _intuit_.
               | I  "understand" ballistic flight when I can calculate a
               | solution that puts an artillery round on target. I also
               | "understand" ballistic flight when I make a free throw
               | with a basketball.
               | 
               | On writing that, I have an instinct to revise it to move
               | the locus of understanding in the first example to the
               | people who calculated the ballistic tables, based on
               | physics first-principles. That would be more accurate,
               | but my mistake highlights something interesting: an
               | artillery officer / spotter simultaneously uses both. Is
               | theirs a "deeper" / "truer" understanding? I don't think
               | it is. I don't know what I think that means, for humans
               | or AI.
        
               | ninetyninenine wrote:
               | Well what would you need to see to prove understanding?
               | That's the metric here. Both the LLM and the human brain
               | are black boxes. But we claim the human brain understands
               | things while the LLM does not.
               | 
               | Thus what output would you expect for either of these
               | boxes to demonstrate true understanding to your question?
        
               | cultureswitch wrote:
               | Are you telling me that WW1 artillery crews didn't
               | understand ballistics? Because they were using tables.
               | 
               | There's no difference between doing something that works
               | without understanding and doing the exact same thing with
               | understanding.
        
               | solarhexes wrote:
               | You've decided that your definition of "understanding" is
               | correct. Ok.
        
               | andsoitis wrote:
               | > I think we are way past the point of debate here. LLMs
               | are not stochastic parrots. LLMs do understand an aspect
               | of reality. Even the LLMs that are weaker than sora
               | understand things.
               | 
               | What is one such aspect? (I'm not asking in order to
               | debate it here, but more because I want to test /
               | research it on my own time)
        
               | ninetyninenine wrote:
               | I pay for chatGPT so it depends on if you pay for that or
               | not. I think it's worth it because whether it understands
               | things or not chatGPT represents a paradigm shift in
               | human history. You'll need it because it's currently the
               | best conversational LLM out there and the one that shows
               | the most compelling evidence.
               | 
               | Basically you just spend a lot of time with chatGPT4 and
               | ask it deep questions that don't exist in it's dataset.
               | get creative. The LLM will output answers that
               | demonstrate a lack of understanding, but it will also
               | demonstrate answers that display a remarkable amount of
               | understanding. Both sets of answers exist and people
               | often cite the wrong answers as evidence for lack of
               | understanding but they're setting bar too high. The fact
               | that many of these answers do demonstrate understanding
               | of concepts makes it very very compelling.
               | 
               | Take for example Rock Paper Scissors.
               | 
               | https://chat.openai.com/share/ca22397c-2950-4919-bb79-6de
               | f64...
               | 
               | This entire conversation thread I believe does not exist
               | in a parallel form in it's data set. It demonstrates
               | understanding of RPS beyond the confines of text, it
               | demonstrates understanding of simultaneity EVEN when the
               | LLM wholly lives in a world of turn based questions and
               | responses, it understands itself relative to
               | simultaneity, it tries to find solutions around it's own
               | problem, it understands how to use creativity and
               | solutions such as cryptography to solve the problem of
               | RPS when playing with it, it also understands the
               | weaknesses of it's own solutions.
               | 
               | Conversations such as this show that chatGPT displays
               | remarkable understanding of the world. There are
               | conversations that are opposite to this that demonstrate
               | LLMs displaying an obvious lack of understanding. But the
               | existence of these conversation that lack understanding
               | does NOT negate the ones that do demonstrate
               | understanding. The fact that partial understanding even
               | exists is a milestone for AI.
               | 
               | This isn't Anthropomorphism. People are throwing this
               | word trying to get people to recognize their own biases
               | without realizing that it's just demonstrating their own
               | biases. We literally can't even define "understanding"
               | and both LLMs and the human brain are black boxes. Making
               | adamant claims saying that LLMs don't understand anything
               | without addressing this fact is itself a form of bias.
               | 
               | The way I address the problem above is that I just define
               | a bar. I define humans as the bar of "understanding"
               | without defining what understanding means itself. Then if
               | any machine begins approaching this bar in terms of input
               | and output matching human responses, then this is
               | logically indistinguishable from approaching
               | "understanding". That's literally the best metric we
               | have.
        
               | beardedwizard wrote:
               | I do not understand these comments at all. Sora was
               | trained on billions of frames from video and images -
               | they were tagged with words like "ballistic missile
               | launch" and "cinematic shot" and it simply predicts the
               | pixels like every other model. It stores what we showed
               | it, and reproduces it when we ask - this has nothing to
               | do with understanding and everything to do with
               | parroting. The fact that it's now a stream of images
               | instead of just 1 changes nothing about it.
        
               | ninetyninenine wrote:
               | What is the difference between a machine that for all
               | intents and purposes appears to understand something to a
               | degree of 100 percent versus a human?
               | 
               | Both the machine and the human are a black box. The human
               | brain is not completely understood and the LLM is only
               | trivially understood at a high level through the lens of
               | stochastic curve fitting.
               | 
               | When something produces output that imitates the output
               | related to a human that we claim "understands" things
               | that is objectively understanding because we cannot
               | penetrate the black box of human intelligence or machine
               | intelligence to determine further.
               | 
               | In fact in terms of image generation the LLM is superior.
               | It will generate video output superior to what a human
               | can generate.
               | 
               | Now mind you the human brain has a classifier and can
               | identify flaws but try watching a human with Photoshop to
               | try to even draw one frame of those videos.. it will be
               | horrible.
               | 
               | Does this indicate that humans lack understanding? Again,
               | hard to answer because we are dealing with black boxes so
               | it's hard to pinpoint what understanding something even
               | means.
               | 
               | We can however set a bar. A metric. And we can define
               | that bar as humans. all humans understand things. Any
               | machine that approaches human input and output
               | capabilities is approaching human understanding.
        
               | Jensson wrote:
               | > What is the difference between a machine that for all
               | intents and purposes appears to understand something to a
               | degree of 100 percent versus a human?
               | 
               | There is no such difference, we evaluate that based on
               | their output. We see these massive model make silly
               | errors that nobody who understands it would make, thus we
               | say the model doesn't understand. We do that for humans
               | as well.
               | 
               | For example, for Sora in the video with the dog in the
               | windos, we see the dog walk straight through the window
               | shutters, so Sora doesn't understand physics or depth. We
               | also see it drawing the dogs shadow on the wall very
               | thin, much smaller than the dog itself, it obviously drew
               | that shadow as if it was cast on the ground and not a
               | wall, it would have been very large shadow on that wall.
               | The shadows from the shutters were normal, because Sora
               | are used to those shadows being on a wall.
               | 
               | Hence we can say Sora doesn't understand physics or
               | shadows, but it has very impressive heuristics about
               | those, the dog accurately places its paws on the
               | platforms etc, and the paws shadows were right. But we
               | know those were just basic heuristics since the dog
               | walked through the shutters and its body cast shadow in
               | the wrong way meaning Sora only handles very common cases
               | and fails as soon as things are in an unexpected
               | envionment.
        
               | cultureswitch wrote:
               | Should it matter how the object of debate interacts and
               | probes the external world? We sense the world through
               | specialized cells connected to neurons. There's nothing
               | to prevent LLMs doing functionally the same thing. Both
               | human brains and LLMs have information inputs and
               | outputs, there's nothing that can go through one which
               | can't go through the other.
        
               | mannykannot wrote:
               | A current LLM does not interact with the external world
               | in a way that would seem to lead to an understanding of
               | it. It emits a response to a prompt, and then reverts to
               | passively waiting for the next one. There's no way for it
               | to expect something will happen in response, and to get
               | the feedback needed to realize that there is more to the
               | language it receives than is contained in the statistical
               | relationships between its tokens.
        
           | t_mann wrote:
           | Maybe it requires understanding, maybe there are other ways
           | to get to 'I don't know'. There was a paper posted on HN a
           | few weeks ago that tested LLMs on medical exams, and one
           | interesting thing that they found was that on questions where
           | the LLM was wrong (confidently, as usual), the answer was
           | highly volatile with respect to some prompt or temperature or
           | other parameters. So this might show a way for getting to 'I
           | don't know' by just comparing the answers over a few slightly
           | fuzzied prompt variations, and just ask it to create an 'I
           | don't know' answer (maybe with a summary of the various
           | responses) if they differ too much. This is more of a crutch,
           | I'll admit, arguably the LLM (or neither of the experts, or
           | however you set it up concretely) hasn't learnt to say 'I
           | don't know', but it might be a good enough solution in
           | practice. And maybe you can then use that setup to generate
           | training examples to teach 'I don't know' to an actual model
           | (so basically fine-tuning a model to learn its own knowledge
           | boundary).
        
             | andsoitis wrote:
             | > Maybe it requires understanding, maybe there are other
             | ways to get to 'I don't know'. > This is more of a crutch,
             | I'll admit, arguably the LLM (or neither of the experts, or
             | however you set it up concretely) hasn't learnt to say 'I
             | don't know', but it might be a good enough solution in
             | practice. And maybe you can then use that setup to generate
             | training examples to teach 'I don't know' to an actual
             | model (so basically fine-tuning a model to learn its own
             | knowledge boundary).
             | 
             | When humans say "I know" it is often not narrowly based on
             | "book knowledge or what I've heard from other people".
             | 
             | Humans are able to say "I know" or "I don't know" using a
             | range of tools like self-awareness, knowledge of a subject,
             | experience, common sense, speculation, wisdom, etc.
        
               | t_mann wrote:
               | Ok, but LLMs are just tools, and I'm just asking how a
               | tool can be made more useful. It doesn't really matter
               | why an LLM tells you to go look elsewhere, it's simply
               | more useful if it does than if it hallucinates. And
               | usefulness isn't binary, getting the error rate down is
               | also an improvement.
        
               | andsoitis wrote:
               | > Ok, but LLMs are just tools, and I'm just asking how a
               | tool can be made more useful.
               | 
               | I _think_ I _know_ what you 're after (notice my self-
               | awareness to qualify what I say I know): that the tool's
               | output can be relied upon without applying layers of
               | human judgement (critical thinking, logical reasoning,
               | common sense, skepticism, expert knowledge, wisdom, etc.)
               | 
               | There are a number of boulders in that path of clarity.
               | One of the most obvious boulders is that for an LLM the
               | inputs and patterns that act on the input are themselves
               | not guaranteed to be infallible. Not only in practive,
               | but also in principle: the human mind (notice this
               | expression doesn't refer to a thing you can point to) has
               | come to understand that understanding is provisional,
               | incomplete, a process.
               | 
               | So while I agree with you that we can and should improve
               | the accuracy of the output of these tools _given
               | assumptions we make about the tools humans use to prove
               | facts about the world_ , you will always want to apply
               | judgment, skepticism, critical thinking, logical
               | evaluation, intuition, etc. depending on the risk/reward
               | tradeoff of the topic you're relying on the LLM for.
        
               | t_mann wrote:
               | Yeah I don't think it will ever make sense to think about
               | Transformer models as 'understanding' something. The
               | approach that I suggested would replace that with rather
               | simple logic like answer_variance > arbitrary_threshold ?
               | return 'I don't know' : return $original_answer
               | 
               | It's not a fundamental fix, it doesn't even change the
               | model itself, but the output might be more useful. And
               | then there was just some speculation how you could try to
               | train a new AI mimicking the more useful output. I'm sure
               | smarter people than me can come up with way smarter
               | approaches. But it wouldn't have to do with understanding
               | - when I said the tool should return 'I don't know'
               | above, I literally meant it should return that string
               | (maybe augmented a bit by some pre-defined prompt), like
               | a meaningless symbol, not any result of anything
               | resembling introspection.
        
               | gorlilla wrote:
               | You left out hubris.
        
               | andsoitis wrote:
               | > You left out hubris.
               | 
               | I know!
        
               | williamcotton wrote:
               | We are having a conversation the feels much like the
               | existence of a deity.
        
               | andsoitis wrote:
               | > We are having a conversation the feels much like the
               | existence of a deity.
               | 
               | From a certain perspective, there does appear to be a
               | rational mystical dualism at work.
        
         | somethingsaid wrote:
         | I also wonder if having a hallucination-free LLM is even
         | required for it to be useful. Humans can and will hallucinate
         | (by this I mean make false statements in full confidence, not
         | drugs or mental states) and they're entrusted with all sorts of
         | responsibilities. Humans are also susceptible to illusions and
         | misdirection just like LLMs. So in all likelihood there is
         | simply some state of 'good enough' that is satisfactory for
         | most tasks. Perusing the elimination of hallucinations to the
         | nth degree may be a fools errand.
        
           | skydhash wrote:
           | Tools are not people and people should not be considered as
           | tools. Imagine your hammer only hitting the nail 60% of the
           | time! But workers should be allowed to stop working to
           | negotiate work conditions.
        
         | sandworm101 wrote:
         | They cannot say "I dont know" because they dont actually know
         | anything. The answers are not comming from a thinking mind but
         | a complex pattern-fitting supercomputer hovering over a massive
         | table of precomputed patterns. It computes your input then
         | looks to those patterns and spits out the best match. There is
         | no thinking brain with a conceptual understanding of its own
         | limitations. Getting an "i dont know" from current AI is like
         | asking navigation software how far it is to the Simpsons house
         | in Springfield: the machine spits out answers but cannot fathom
         | the cultural reference that makes the answer impossible.
         | Instead, it finds someone named simpson in the nearest
         | realworld Springfield.
        
           | williamcotton wrote:
           | What if you worked on the problem and tried to come up with
           | some kind of solution?
        
             | sandworm101 wrote:
             | The solution is older non-AI tech. Google search can say
             | "no good results found" because it returns actual data
             | rather than creating anything new. If you want a hard
             | answer about the presence or absence of something, AI isnt
             | the correct tool.
        
               | williamcotton wrote:
               | So there are no other possibilities for us other than
               | using a system that can be gamed for substandard results?
               | Are we sure about this?
        
               | tempest_ wrote:
               | Can, but doesn't.
               | 
               | I can't remember the last time google actually returned
               | no results.
        
               | tsimionescu wrote:
               | It does reply with no results, but only for very long
               | queries. E.g. If you search for two concatenated GUIDs,
               | you can easily see a no results page.
        
               | johnny22 wrote:
               | ah, i get no results pages often when i search for quoted
               | error strings from many different sources. Thing is, I
               | have a hard time believing that no one has actually
               | talked about at least some of those errors :(
        
           | paulnpace wrote:
           | My observation is that comments similar to GP come from the
           | constant anthropomorphizing of things by marketers and
           | without realizing that this subtle influence on language can
           | alter one's view on what "I" means. The first time I really
           | noticed this was when someone using Siri produced a response
           | from Siri that included "I". Ever since I am acutely aware of
           | this every time I hear or read it.
        
           | caditinpiscinam wrote:
           | In real world conversations, people are constantly saying "I
           | don't know"; but that doesn't really happen online. If you're
           | on reddit or stack overflow or hacker news and you see a
           | question you don't know the answer to, you normally just
           | don't say anything. If LLMs are being trained on
           | conversations pulled from the internet then they're missing
           | out on a ton of uncertain responses.
           | 
           | Maybe LLMs don't truly "understand" questions, but they're
           | good at looking like they understand questions. If they were
           | trained with more uncertain content, perhaps they'd be better
           | at expressing uncertainty as well.
        
             | username332211 wrote:
             | If they were trained on more uncertain content, what
             | happens if the most probable answer to a question is "I
             | don't know", even though an answer exists in it's training
             | set?
             | 
             | Suppose 99.3% of answers to 'What is the airspeed velocity
             | of an unladen swallow?" are "I don't know that." and the
             | remainder are "11 m/s". What would the model answer?
             | 
             | When the LLM answers "I don't know.", this could be a
             | hallucination just as easily as anything else.
        
               | caditinpiscinam wrote:
               | > Suppose 99.3% of answers to 'What is the airspeed
               | velocity of an unladen swallow?" are "I don't know that."
               | and the remainder are "11 m/s". What would the model
               | answer?
               | 
               | I don't know :)
               | 
               | Actually though, I think the best response would be to
               | say that the answer to the question isn't clear, but that
               | 11 m/s is sometimes given as an estimate. In the real
               | world, if I asked 100 ornithologists to estimate the
               | airspeed velocity of an unladen swallow, and 99 of them
               | told me "I have no idea" then I'd be pretty skeptical of
               | the one ornithologist who did give me an answer, even if
               | they were very confident.
        
               | ta8645 wrote:
               | I think the best response is to steal the joke and repeat
               | it without comment.
               | 
               | "Eleven meters per second."
               | 
               | Full stop. It's humorous, and any reasonable interlocutor
               | understands not to take it seriously.
               | 
               | Of course, there are more serious questions that demand
               | more serious answers. LLMs will eventually need to be
               | able to understand the current context and assess the
               | appropriate level of confidence required in any answer.
        
               | username332211 wrote:
               | The thing is, the usefulness of a question answering
               | system is in answering questions people don't generally
               | know. We don't need an answering system for things that
               | are common knowledge.
               | 
               | And it's not uncommon that certain knowledge would be,
               | well uncommon even among experts. Experts specialize.
               | 
               | Since the usefulness of ornithological examples is
               | getting exhausted, let's say one out of a hundred lawyers
               | works in bankruptcy. If you ask a million lawyers about
               | the provisions of 11 USC SS 1129 and only ten thousand
               | know the answer, is the answer untrustworthy, just
               | because bankruptcy lawyers are far rarer than civil and
               | criminal lawyers?
        
               | patmcc wrote:
               | Right, but "I don't know" is a pretty safe hallucination
               | (if it is one).
               | 
               | My main worry about hallucinations is it means I
               | absolutely can't rely on the output for anything
               | important. If I ask what the safe dose for Tylenol for an
               | infant is, the answer needs to be either _correct_ or  "I
               | don't know". It's not acceptable for it to hallucinate
               | 10x the safe dose.
        
               | username332211 wrote:
               | The thing is, if you answer "I don't know" based on
               | statistics, you end up creating a sliding scale of sorts.
               | You get some measure of an increase in safety, but the
               | model is less useful.
               | 
               | Currently,we have models that make stuff up when they
               | don't know the answer. On the other end, we'd have a
               | model that's refuses to answer any question that's not
               | common knowledge. It'll be safe (though it can never be
               | completely safe), but essentially useless.
               | 
               | I suspect it'll be impossible to make a completely
               | trustworthy and useful model unless it somehow has a
               | concept of it's own knowledge. And can you have a concept
               | of one's knowledge if you lack a concept of self?
        
             | fl7305 wrote:
             | If you ask ChatGPT a question, and tell it to either
             | respond with the answer or "I don't know", it will respond
             | "I don't know" if you ask it whether you have a brother or
             | not.
        
               | beardedwizard wrote:
               | This has nothing to do with thinking and everything to do
               | with the fact that given that input the answer was the
               | most probable output given the training data.
        
               | fl7305 wrote:
               | >>>> They cannot say "I dont know"
               | 
               | >>> If they were trained with more uncertain content,
               | perhaps they'd be better at expressing uncertainty as
               | well.
               | 
               | >> (me) If you ask ChatGPT a question, and tell it to
               | either respond with the answer or "I don't know", it will
               | respond "I don't know" if you ask it whether you have a
               | brother or not.
               | 
               | > This has nothing to do with thinking and everything to
               | do with the fact that given that input the answer was the
               | most probable output given the training data.
               | 
               | First of all, my claim was in response to "They cannot
               | say 'I dont know'" and "perhaps they'd be better at
               | expressing uncertainty".
               | 
               | ChatGPT can say "I don't know" if you ask it to.
               | 
               | Regarding whether LLMs are lookup tables, I responded to
               | that in more detail elsewhere under this post:
               | 
               | https://news.ycombinator.com/item?id=39501611
        
           | DiogenesKynikos wrote:
           | > The answers are not comming from a thinking mind but a
           | complex pattern-fitting supercomputer hovering over a massive
           | table of precomputed patterns.
           | 
           | Are you sure you're not also describing the human brain? At
           | some point, after we have sufficiently demystified the
           | workings of the human brain, it will probably also sound
           | something like, "Well, the brain is just a large machine that
           | does X, Y and Z [insert banal-sounding technical jargon from
           | the future] - it doesn't really understand anything."
           | 
           | My point here is that _understanding_ ultimately comes down
           | to having an effective internal model of the world, which is
           | capable of taking novel inputs and generating reasonable
           | descriptions of them or reactions to them. It turns out that
           | LLMs are one way of achieving that. They don 't function
           | exactly like human brains, but they certainly do exhibit
           | intelligence and understanding. I can ask an LLM a question
           | that it has never seen before, and it will give me a
           | reasonable answer that synthesizes and builds on various
           | facts that it knows. Often the answer is more intelligent
           | than what one would get from most humans. That's
           | understanding.
        
             | beardedwizard wrote:
             | Human brains form new connections dynamically. Llms are
             | trained on connections human brains have already made. They
             | never make new connections that aren't in training data.
             | 
             | Nothing was synthesized, all the data was seen before and
             | related to each other by vector similarity.
             | 
             | It can just parrot the collective understanding humans
             | already have and teach it.
        
               | DiogenesKynikos wrote:
               | > It can just parrot the collective understanding humans
               | already have and teach it.
               | 
               | The problem with calling an LLM a parrot is that anyone
               | who has actually interacted with an LLM knows that it
               | produces completely novel responses to questions it has
               | never seen before. These answers are usually logical and
               | reasonable, based on both the information you gave the
               | LLM and its previous knowledge of the world. Doing that
               | requires understanding.
               | 
               | > They never make new connections that aren't in training
               | data.
               | 
               | This is just categorically untrue. They make all sorts of
               | logical connections that are not explicitly contained in
               | the training data. Making logical inferences about
               | subjects one has never heard about - based on the things
               | one does know - is an expression of understanding. LLMs
               | do that.
        
               | beardedwizard wrote:
               | Isn't this describing temperature induced randomness and
               | ascribing some kind of intelligence to it? This assertion
               | has been made and refuted multiple times on this thread
               | and no solid evidence to the contrary presented.
               | 
               | To go back to your first sentence - interacting with an
               | llm is not understanding how it works, building one is.
               | The actual construction of a neural network llm refutes
               | your assertions.
        
               | DiogenesKynikos wrote:
               | The claim was made that LLMs just parrot back what
               | they've seen in the training data. They clearly go far
               | beyond this and generate completely novel ideas that are
               | not in the training data. I can give ChatGPT extremely
               | specific and weird prompts that have 0% chance of being
               | in its training data, and it will answer intelligently.
               | 
               | > The actual construction of a neural network llm refutes
               | your assertions.
               | 
               | I don't see how. There's a common view that I see
               | expressed in these discussions, that if the workings of
               | an LLM can be explained in a technical manner, then it
               | doesn't understand. "It just uses temperature induced
               | randomness, etc. etc." Once we understand how the human
               | brain works, it will then be possible to argue, in the
               | exact same way, that humans do not understand. "You see,
               | the brain is just mechanically doing XYZ, leading to the
               | vocal cords moving in this particular pattern."
        
               | AlexandrB wrote:
               | > They clearly go far beyond this and generate completely
               | novel ideas that are not in the training data.
               | 
               | There's a case where this is trivially false. Language.
               | LLMs are bound by language that was invented by humans.
               | They are unable to "conceive" of anything that cannot be
               | described by human language as it exists, whereas humans
               | create new words for new ideas all the time.
        
               | pixl97 wrote:
               | Uh, I believe you're really confused on things like
               | ChatGPT versus LLMs in general. You don't have to feed
               | human language to an LLM for them to learn things. You
               | can feed wifi data waveforms for example and they can
               | 'learn' insights from that.
               | 
               | Furthermore you're thinking here doesn't even begin to
               | explain multimodal models at all.
        
               | DiogenesKynikos wrote:
               | I just asked ChatGPT to make up a Chinese word for
               | hungry+angry. It came up with a completely novel word
               | that actually sounds okay: Ji Nu . It then explained to
               | me how it came up with the word.
               | 
               | You can't claim that that isn't understanding. It just
               | strikes me that we've moved the goalposts into every more
               | esoteric corners: sure, ChatGPT seems like it can have a
               | real conversation, but can it do X extremely difficult
               | task that I just thought up?
        
               | c22 wrote:
               | You claim that logical and reasonable responses "require
               | understanding" therefore LLMs must _understand_. But I
               | see LLMs as evidence that _understanding_ is not required
               | to produce logical and reasonable responses.
               | 
               | Thinking back to when I used to help tutor some of my
               | peers in 101-level math classes there were many times
               | someone was able to produce a logical and reasonable
               | response to a problem (by rote use of an algorithm) but
               | upon deeper interrogation it became clear that they
               | lacked true understanding.
        
               | DiogenesKynikos wrote:
               | Then your definition of understanding is meaningless. If
               | a physical system is able to accurately simulate
               | understanding, it understands.
        
               | c22 wrote:
               | My definition of understanding is not meaningless, but it
               | appears you do not understand it.
        
               | Jensson wrote:
               | A human that mimics the speech of someone that does
               | understand usually doesn't understand himself. We see
               | that happen all the time with real humans, you have
               | probably seen that as well.
               | 
               | To see if a human understands we ask them edge questions
               | and things they probably haven't seen before, and if they
               | fail there but just manage for common things then we know
               | the human just faked understanding. Every LLM today fails
               | this, so they don't understand, just like we say humans
               | don't understand that produces the same output. These LLM
               | has superhuman memory so their ability to mimic smart
               | humans is much greater than a human faker, but other than
               | that they are just like your typical human faker.
        
               | DiogenesKynikos wrote:
               | > A human that mimics the speech of someone that does
               | understand usually doesn't understand himself.
               | 
               | That's not what LLMs do. They provide novel answers to
               | questions they've never seen before, even on topics
               | they've never heard of, that the user just made up.
               | 
               | > To see if a human understands we ask them edge
               | questions
               | 
               | This is testing if there are flaws in their
               | understanding. My dog understands a lot of things about
               | the world, but he sometimes shows that he doesn't
               | understand basic things, in ways that are completely
               | baffling to me. Should I just throw my hands in the air
               | and declare that dogs are incapable of understanding
               | anything?
        
           | fl7305 wrote:
           | > a complex pattern-fitting supercomputer hovering over a
           | massive table of precomputed patterns
           | 
           | That was perhaps true of earlier and smaller LLMs, like GPT-1
           | and GPT-2.
           | 
           | But as they grew larger and were trained with more and more
           | data, they changed from pure pattern matching to implementing
           | algorithms to compress more information into their structure
           | than pure pattern matching can achieve.
           | 
           | These algorithms are incomplete and buggy, but they are
           | nonetheless executing algorithms, and not just pattern
           | matching.
           | 
           | This phenomenom can be seen in toy-sized neural networks. For
           | instance, addition of two input values modulo a constant. As
           | a small network is trained, at some point the internal
           | structure can change from pattern matching to implementing
           | addition using Fourier transforms. This is clearly visible in
           | its structure. The network now performs the task perfectly
           | for all inputs, regardless of having seen them in training.
           | 
           | You can ask ChatGPT 4 to execute an algorithm for you. I just
           | tried this one:                 I would like to play a game,
           | where you are the host. We start off with a score that is
           | 1234143143. At the start  of each turn, you tell me the
           | current score and ask me if I want to play a or b. If I
           | choose a, the score is halved, and 30 is added. If I choose
           | b, the score is doubled, and 40 is subtracted. Only use
           | integers and round down.
           | 
           | It will happily execute this algorithm. For large numbers, it
           | is slightly off on the arithmetic. When I asked it to double
           | check, it did so using Python code. After that, it kept using
           | Python code to perform the math. It was also able to reason
           | intelligently about different outcomes if always picking a
           | (or b) given different starting points.
           | 
           | Now, if you have enough memory and training data, of course
           | you can build a gigantic lookup table that has this exact
           | text sequence in it to replicate "executing the algorithm" I
           | described.
           | 
           | Is that your claim? How much memory are we talking about? My
           | feeling is that it'd be far more than the number of atoms in
           | the universe.
           | 
           | PS                 Me: How far it is to the Simpsons house in
           | Springfield?            ChatGPT: The Simpsons' house in
           | Springfield is a fictional location from the animated TV
           | series "The Simpsons." Since Springfield is a fictional town
           | and its location is not consistently specified in the series,
           | it's not possible to determine a real-world distance to the
           | Simpsons' house.                 Me: Do I have a brother?
           | Please answer with either of:              a) The answer
           | b) There is an answer, but I do not know              c)
           | There is no answer                     ChatGPT: b) There is
           | an answer, but I do not know
        
             | andsoitis wrote:
             | > It will happily execute this algorithm. For large
             | numbers, it is slightly off on the arithmetic. When I asked
             | it to double check, it did so using Python code. After
             | that, it kept using Python code to perform the math. It was
             | also able to reason intelligently about different outcomes
             | if always picking a (or b) given different starting points.
             | 
             | Notice that _you_ had to _notice_ the error and had to
             | prompt it to double check. Lots of complicated things going
             | on here. Many (most?) humans will fail somewhere along this
             | trajectory.
             | 
             | Did it double check the Python code to make sure it is
             | correct (not just in the sense that it is valid, executable
             | code, but that it is the correct check in the first place)?
             | Or did _you_ double check that its modified algorithm is
             | correct? Fool me once and all that...
             | 
             | Upon _reflection_ it _appears_ as if you have a heuristic
             | (algorithm? that leverages logic, awareness, critical
             | thinking, experience, a goal in mind, intuition, etc. to
             | push towards better results.
             | 
             | "It was able to reason intelligently" imbues qualities that
             | I am skeptical is _reasonable_ to attribute to this very
             | narrow domain - what's an example where it showed
             | intelligent reasoning capabilities?
        
               | fl7305 wrote:
               | > Notice that you had to notice the error and had to
               | prompt it to double check. Lots of complicated things
               | going on here. Many (most?) humans will fail somewhere
               | along this trajectory.
               | 
               | Sure. This was covered by my statement above: "These
               | algorithms are incomplete and buggy".
               | 
               | > "It was able to reason intelligently" imbues qualities
               | that I am skeptical is reasonable to attribute to this
               | very narrow domain - what's an example where it showed
               | intelligent reasoning capabilities?
               | 
               | Here's an example. I asked it to analyze the case where
               | we always pick option "b" in my example above. It took my
               | word problem and boiled it down to an equation:
               | ChatGPT: To find the threshold where the behavior
               | switches from decreasing to increasing when always
               | choosing option "b", we need to identify the smallest
               | integer starting value that, after being doubled and
               | reduced by 40, leads to an equal or larger integer in the
               | next iteration. We're looking for the smallest integer x
               | where 2x - 40 >= x
               | 
               | This was part of a longer conversation where it analyzed
               | different properties and outcomes of the "game rules"
               | that I gave it.
               | 
               | As you pointed out, it got some things wrong and had to
               | be corrected. But Socratic reasoning works fairly well to
               | guide it. It can find errors in its own reasoning. For
               | instance, if asked to actually calculate a few iterations
               | for a given case, it will find its own errors in its
               | claims about that case.
               | 
               | Is it useful right now? Maybe, maybe not, depends on your
               | use case. It definitely takes a lot of thinking on your
               | own and guiding it. At some points it goes from seemingly
               | intelligent to downright pigheaded and stupid.
               | 
               | But in my view there is absolutely no way a lookup table
               | algorithm can contain enough data to be anywhere near the
               | level of responses we're seeing here.
        
             | sandworm101 wrote:
             | The simpsons example is for a navigation system, not any
             | AI. It is an analogy, not a test to be put to chatgpt.
        
               | fl7305 wrote:
               | So which test can you put to ChatGPT to prove your claim
               | that it is a lookup table, and that it doesn't perform
               | any logic on facts?
        
               | Jensson wrote:
               | There is no such stable test, just like humans can
               | memorize and create simple heuristics to pass any test
               | without understanding so can an LLM. You have probably
               | seen humans that has perfect grades but can't do much in
               | practice, that is how these LLMs work.
               | 
               | The creators of the LLM just feeds it a bunch of edge
               | questions, and whenever people invent new ones they just
               | feed those as well, so proving it doesn't understand will
               | always be a moving target just like making tests that
               | tests peoples understanding is also a moving target since
               | those people will just look at the old tests and practice
               | those otherwise.
        
             | jijijijij wrote:
             | Hasn't ChatGPT been manually adjusted to better compute
             | math problems? I think nobody not working there knows what
             | ChatGPT really learned all by itself.
        
           | furyofantares wrote:
           | But the can say "I don't know." They can be trained to do so
           | ("as of my knowledge cutoff in September 2020 I don't know
           | who Bob Whatgenflabl is") and they can be given context that
           | makes it more likely they do so (I've had good success with
           | this for RAG applications, and extremely little, but not
           | zero, for general prompts.)
        
           | cultureswitch wrote:
           | > The answers are not comming from a thinking mind but a
           | complex pattern-fitting supercomputer hovering over a massive
           | table of precomputed patterns. It computes your input then
           | looks to those patterns and spits out the best match
           | 
           | Can you tell that's not how you yourself function?
        
           | jncfhnb wrote:
           | > They cannot say "I dont know" because they dont actually
           | know anything.
           | 
           | print("I don't know")
           | 
           | You don't need proper cognition to identify that the answer
           | is not stored in source data. Your conception of the model is
           | incomplete as is easily demonstrable by testing such cases
           | now. Chat gpt does just fine on your simpsons test.
           | 
           | You, however, have made up an answer of how something works
           | that you don't actually know despite your cognition
        
             | jijijijij wrote:
             | > to identify that the answer is not stored in source data
             | 
             | How would an LLM do that?
        
               | jncfhnb wrote:
               | They do this already all the time. Probably the majority
               | of the time. The problem is that a minority of the time
               | is still very problematic.
               | 
               | How do they do this? The same as they do now. The most
               | likely token is that the bot doesn't know the answer.
               | Which is a behavior emergent from its tuning.
               | 
               | I don't get how people believe it can parse complex
               | questions to produce novel ideas but can't defer to
               | saying "idk" when the answer isn't known.
        
             | pixl97 wrote:
             | >You don't need proper cognition to identify that the
             | answer is not stored in source data.
             | 
             | Uh, what?
             | 
             | So lets imagine you have an LLM that knows everything,
             | except you withhold the data that you can put peanut butter
             | on toast. Toast + Peanut butter = does not exist in data
             | set. So what exactly do you expect the LLM to say when
             | someone asks "Can you put peanut butter on toast?".
             | 
             | I would expect an intelligent agent to 'think' Peanut
             | butter = spreadable food, toast = hard food substrate, so
             | yea, they should work instead of the useless answer of I
             | don't know.
             | 
             | Everything that does not exist in nature is made up by
             | humans, the question is not "is it made up" the question is
             | "does it work"
        
         | rf15 wrote:
         | > I am generally skeptical whether such a highly formal
         | approach can help us answer the practical question of whether
         | we can get LLMs to answer 'I don't know' more often
         | 
         | I feel like writing an entire paper about the practical
         | approach to the problems posed in this paper, but you'll
         | probably have to first formally define the language used in the
         | training data before you can try to map it (through training
         | and sampling algos, which this paper conveniently skipped) to
         | the target form. This sounds really fun at first, but then
         | we're once again talking about the strict formalisation of
         | natural language (which you could still do - the training data
         | is limited and fixed!)
        
       | karol wrote:
       | Organisms that evolved to perceive true reality instead of the
       | "user interface" have smaller chances of survival. Donald Hoffman
        
         | tibbydudeza wrote:
         | The brain fakes it to approximate reality - more so for reasons
         | of the limitations of the wetware it runs on than anything
         | else.
         | 
         | For others here is a TED talk.
         | 
         | https://www.youtube.com/watch?v=oYp5XuGYqqY
        
       | Borealid wrote:
       | > hallucination is defined as inconsistencies between a
       | computable LLM and a computable ground truth function.
       | 
       | With this definition, you can trivially prove the titular
       | sentence - "hallucination is inevitable" - is untrue.
       | 
       | Let your LLM have a fixed input context length of one byte.
       | Continue training the LLM until such a time as it replies to the
       | input "A" with "yes" and all other inputs with "no".
       | 
       | Define your computable ground truth function such that the
       | correct output for the input "A" is "yes" and the correct output
       | for all other inputs is "no".
       | 
       | This LLM provable never hallucinates - we have exhaustively
       | verified that its output matches the ground truth function for
       | all possible inputs.
       | 
       | There is nothing stopping inductively increasing the size of the
       | input context and the number of entries in the ground truth table
       | arbitrarily, and at no step do hallucinations become
       | "inevitable".
        
         | resolutebat wrote:
         | > _Continue training the LLM until such a time as it replies to
         | the input "A" with "yes" and all other inputs with "no"._
         | 
         | This is basically the same as saying "train your LLM until they
         | never hallucinate", which reduces your claim to a tautology: an
         | LLM trained not to hallucinate does not hallucinate. The trick
         | is making that happen.
        
           | ProxCoques wrote:
           | As I always tell my students: the solution to unreliable code
           | is not to put the bugs in there in the first place.
        
           | Borealid wrote:
           | It's a tautology that for a given truth table mapping inputs
           | to "correct" outputs there exists a function that produces
           | that mapping.
           | 
           | Saying that you can't train an LLM to NOT hallucinate is
           | saying that it's impossible for any LLM to always produce
           | output matching any particular truth table.
           | 
           | There may exist truth tables where it's not possible to
           | produce an LLM to match them (for some finite size of LLM
           | perhaps), but my claim isn't a tautology - it's just an
           | assertion that there exist some truth tables which an LLM can
           | be trained to match.
           | 
           | It may be tricky to make the LLM in the first place, but it's
           | certainly not as tricky to verify it. You can test it with
           | every input you consider to be present in the truth table,
           | record its results, and throw it away if it doesn't match.
           | The only possible results here are either non-convergeance or
           | a "perfect" LLM.
           | 
           | You can get rid of the non-convergeance if you bound the size
           | of the LLM and literally iteratively verify every single
           | possible model. The authors of the paper didn't do that. For
           | trivially sized models and inputs, that's completely
           | possible. For a 7B parameter model, nobody is doing that
           | ever. But you can prove the title statement wrong with a ten-
           | parameter model and a one-byte input.
        
         | less_less wrote:
         | I also disagree with the paper, but not for the same reason.
         | 
         | > With this definition, you can trivially prove the titular
         | sentence - "hallucination is inevitable" - is untrue.
         | 
         | Unsurprisingly, that one sentence fragment doesn't capture the
         | entirety of their assumptions. Instead they prove something
         | intuitively obvious, along the lines of: LLMs with arbitrary-
         | length inputs and certain resource restrictions (e.g. they can
         | take up to poly-time to compute, and this poly-time behavior
         | must be provable, so that during training they don't take even
         | longer by mistake) cannot compute certain functions that don't
         | have those restrictions (e.g. can take more than poly-time, or
         | must take poly-time but a proof of this is not needed). For
         | some cases this proof assumes P != NP. Then they argue that
         | some useful real-world questions are likely to be in the class
         | that the LLM cannot compute, basically because you can ask math
         | problems to LLMs and math problems are sometimes really hard.
         | 
         | This formal model is asymptotic (assumes arbitrary-length
         | inputs etc), but in my experience this kind of theorem is
         | usually true for realistic problems even at modest query
         | lengths.
         | 
         | But this isn't the same as proving that hallucination is
         | inevitable, because (according to any reasonable definition) an
         | LLM (or like, a person, or whatever) should be allowed to say
         | "I don't know", and this should not be considered a
         | hallucination. Then an LLM (or whatever) can avoid
         | hallucinating, and the question becomes how much useful work it
         | can do without hallucinating.
        
           | Borealid wrote:
           | It's not a bad paper honestly, I just don't like it when
           | people take a line from it and assume something untrue.
           | 
           | The pigeonhole principle proves that if you only have N slots
           | to work with, and you need to fit N+1 items into them, you're
           | going to get at least one slot with at least two items. That
           | makes sense, and it logically follows that constrained
           | functions can't perfectly mirror less-constrained ones: at
           | some point a "wrong" and a "right" input have to produce the
           | same output.
        
             | calf wrote:
             | So is it saying LLMs have polynomial running time and
             | that's it? LLMs can't solve SAT properly because of running
             | time argument?
        
         | mepiethree wrote:
         | > There is nothing stopping inductively increasing the size of
         | the input context and the number of entries in the ground truth
         | table arbitrarily
         | 
         | This isn't induction. You've only done the base case, not the
         | induction hypothesis or induction step. Maybe you've done those
         | steps in your head but that's not really a trivial proof as you
         | claim.
        
           | Borealid wrote:
           | Induction is "if this is possible for value X, then it is
           | also possible for value X+1".
           | 
           | Where X isn't used as part of the step this is always true.
           | Nothing I did depends on the size of either the input nor the
           | truth table, so long as both are finite-size and so long as
           | the truth table can be expressed as a function of the input.
           | 
           | An LLM is an arbitrary convolution of the input text; for any
           | mapping, some function you can call an "LLM" produces that
           | function.
        
       | Sparkyte wrote:
       | Definitely a given, it isn't like AI has an actual brain capable
       | of resolving and forming new connections. The LLM and human
       | brains is that LLMs are interactive compendiums and our brains
       | organize and sort information that ensures survival as an
       | organism. There is no survival of whether or not LLMs are
       | accurate and a machine wouldn't understand what is good or bad
       | without weighted context. Its good for analyze, process, storing,
       | retrieving and decomposing information. It isn't good at
       | understanding, validating and forming connections between the
       | things it says and what you want of it. It lacks comprehension,
       | it doesn't lack composure.
        
         | golol wrote:
         | The result in the theorem applies to your brain. Your brain can
         | be modelled as an LLM in the sense of the paper up to
         | arbitrarily small error.
         | 
         | The result is a diagonalization argument that is not very
         | relevant for the real world.
        
           | wredue wrote:
           | >your brain can be modelled as an LLM in the sense of the
           | paper
           | 
           | The vast majority of people actually writing LLMs don't claim
           | this, and in fact, actually claim the very opposite: that
           | LLMs do not accurately model a human brain in any capacity.
           | 
           | The fact is that science has no clue what happens in the
           | nucleus of a neuron, so claiming that computer scientists
           | must is... well. You fill in the word.
        
             | golol wrote:
             | Definition 2 (Large Language Model). Let S be a computable
             | seta of all the finite-length strings of alphabet A and
             | (s0, s1, . . .) be an one-to-one enumeration of all the
             | elements in S. A large language model, denoted h, is a
             | function that completes the input string s [?] S using the
             | function's predicted tokens h(s), in a finite time.
             | Function h is attained procedurally using a set of training
             | samples of input-completion pairs.
             | 
             | For an arbtrarily large duration and an arbitrarily small
             | error with respect to any definition of error you choose,
             | there exists an LLM in the above sense which models the
             | dynamics of your brain with that small error.
        
         | breck wrote:
         | > capable of resolving and forming new connections
         | 
         | > There is no survival of whether or not LLMs are accurate
         | 
         | I agree that today's LLMs are still missing important
         | components like these needed for breakout intelligence, but I
         | would not be surprised if researchers discover how to add them
         | (and other important things) within 0-5 years.
        
       | carlossouza wrote:
       | Someone smart once said:
       | 
       | If it is good, we call it "creativity."
       | 
       | If it is bad, we call it "hallucination."
       | 
       | This isn't a bug (or limitation, as the authors say). It's a
       | feature.
        
         | tgv wrote:
         | Asking it to write code for you is basically asking it to
         | hallucinate.
        
           | gardenhedge wrote:
           | I don't think so. I think it's asking it to repeat code it
           | has been trained on
        
             | tgv wrote:
             | Search for a piece of code you wrote. If it's more than 15
             | lines and not boilerplate, chances are you won't find it
             | anywhere on the net.
        
               | gardenhedge wrote:
               | To be honest, I imagine I would. Variables named could be
               | different but it would largely be the same as code others
               | have written. For example, I am creating an app in React
               | Native at the moment. My app will not be unique in terms
               | of code but instead in business domain.
        
               | Jensson wrote:
               | But there are many pieces of code that I've written that
               | you can find in many places on the net, having a tool
               | that can adapt that to your codebase in seconds is
               | useful. It doesn't have to be smart, just pasting in an
               | function and fitting that to your code is useful.
        
               | tgv wrote:
               | Sure, but the point is it will have to adapt it to your
               | code, if only in naming. So it has to make up things,
               | i.e. hallucinate. It can't just reproduce the best match
               | in memory.
        
               | Jensson wrote:
               | Yeah, these models are very good at making up names, that
               | is what they are trained to do after all. Their ability
               | to do logic isn't that impressive though and seems to be
               | on the level of a human that doesn't understand the topic
               | but has seen many examples.
        
               | gessha wrote:
               | This reminds me of a bit from a Hugh and Laurie sketch:
               | 
               | > Imagine a piano keyboard, eighty-eight keys, only
               | eighty-eight and yet, and yet, new tunes, melodies,
               | harmonies are being composed upon hundreds of keyboards
               | every day in Dorset alone. Our language, Tiger, our
               | language, hundreds of thousands of available words,
               | frillions of possible legitimate new ideas, so that I can
               | say this sentence and be confident it has never been
               | uttered before in the history of human communication:
               | "Hold the newsreader's nose squarely, waiter, or friendly
               | milk will countermand my trousers." One sentence, common
               | words, but never before placed in that order. And yet, oh
               | and yet, all of us spend our days saying the same things
               | to each other, time after weary time, living by clichaic,
               | learned response: "I love you", "Don't go in there", "You
               | have no right to say that", "shut up", "I'm hungry",
               | "that hurt", "why should I?", "it's not my fault",
               | "help", "Marjorie is dead". You see? That surely is a
               | thought to take out for a cream tea on a rainy Sunday
               | afternoon.
               | 
               | https://abitoffryandlaurie.co.uk/sketches/language_conver
               | sat...
        
             | intended wrote:
             | The term "Hallucinate" is a misnomer. Humans can
             | hallucinate, we can get sick and perceive a world which is
             | incongruous with reality.
             | 
             | LLMs are just generating tokens. Hallucination perpetuates
             | an unhelpful anthropomorphization of LLMs.
        
               | gessha wrote:
               | I don't think the term is that bad here because I haven't
               | seen a lot of comparisons with human hallucinations.
               | 
               | Users see it as a machine artifact.
        
               | intended wrote:
               | It's like the term "god particle" - it invites
               | comparisons and allusions that do not match reality.
        
         | LightBug1 wrote:
         | I imagine the gold is in knowing whether the LLM understands
         | when it's doing either?
         | 
         | Isn't this the difference between a human and an LLM?
         | 
         | A human knows it's making an educated guess and (should) say
         | so. Or it knows when it's being creative, and can say so.
         | 
         | If it doesn't know which is which, then it really does bring it
         | home that LLM's are not that much more than (very
         | sophisticated) mechanical input-output machines.
        
           | vladms wrote:
           | You mean "some humans know and could say so". And this
           | reflection process is not captured in the data we fed to
           | LLM-s (like let's say a lesson in which teacher asks "do you
           | know X?", and students first answer "I don't know", etc.)
           | 
           | Also, LLM-s could report more statistical measures for each
           | answer and external tools could interpret them.
        
           | devjab wrote:
           | Isn't it always hallucinating though? We just don't care when
           | it gets it "right". As I understand it, it's still just
           | probability based on what is likely to be a good set of words
           | to answer the prompt tasking it. It doesn't actually know
           | anything, it's just extremely good at making stuff up.
           | 
           | Which is still very useful for a lot of things. Just maybe
           | not things to which value is assigned based on how efficient
           | and correct the answer is. Like you can have GPT make a
           | marketing campaign for you, or you can have it design all the
           | icons you need for your application UI, but you can't
           | reliably make it wrote high performance back-end code without
           | having humans judge the results. Similarly you can't use it
           | to teach anyone anything, not really, because unless you're
           | already an expert on the subject being taught, you aren't
           | likely to spot when it gets things wrong. I guess you can
           | argue that a lot of teaching is flawed like that, and you
           | wouldn't be wrong. Like, I was taught that the pyramids was
           | build by slave labour, even after the archeological evidence
           | had shown this to be likely false. But our text books were a
           | decade old because our school didn't really renew them very
           | often... in such a case GPT might have been a more correct
           | teacher, but the trick is that you won't really know. Which
           | is made even more complicated by the fact that it might teach
           | different things to different students. Like, I just asked
           | ChatGPT 3.5 who build the pyramids in 3 different prompts, in
           | one it told me it was ordinary people. In the others it told
           | me it was mostly skilled labour under guidance of
           | "architects" and "engineers". Still better than teaching us
           | it was done by slave labour like my old book, but the book
           | was still consistent in what was considered to be the truth
           | at the time.
        
         | audunw wrote:
         | True, but I think we can fall into the trap of expecting too
         | much of LLMs. Their knowledge can seem perfect. They can answer
         | almost anything, so it's easy to get the illusion that they can
         | answer anything truthfully.
         | 
         | In terms of what we can expect of future improvements, I think
         | it's overly optimistic to expect any kind of super intelligence
         | beyond what we see today (that is, having access to all the
         | worlds publicly available information, or rapidly generating
         | texts/images/videos that fall into existing creative patterns).
         | 
         | I suspect that more creative intelligence requires an extremely
         | fine balance to not "go crazy".. that is, producing output we'd
         | consider creative rather than hallucinations.
         | 
         | I think getting this balance right will get exponentially
         | harder as we create feedback loops within the AI that let its
         | intelligence evolve.
         | 
         | And it's entirely possible that humans have already optimised
         | this creative intelligence feedback loop as much as the
         | universe allows. Having a huge amount of knowledge can
         | obviously benefit from more neurons/storage. But we simply
         | don't know if that's true for creative intelligence yet
        
           | badgersnake wrote:
           | > True, but I think we can fall into the trap of expecting
           | too much of LLMs.
           | 
           | We're already well past that point. Why? Because saying
           | incredible things about AI attracts VC money.
        
         | Seb-C wrote:
         | That is correct, it's always hallucinating and making things
         | up.
         | 
         | Just because those hallucinations sometimes randomly happens to
         | be right, people concluded that being wrong is the exception,
         | while being right is somehow the rule.
         | 
         | It's like when people read [insert millenias old text here],
         | finds a part that happens to illustrate something in their life
         | today and conclude that it is a prophecy that predicted the
         | future.
         | 
         | The meaning/truth in those is nothing more than a cognitive
         | bias from the mind of the reader, not an inherent quality of
         | the text.
        
           | somewhereoutth wrote:
           | For heavy LLM users, there is probably a dopamine hit when it
           | does something right, much as gamblers get a hit when the
           | fruit machine pays out. Perhaps LLM use is no more productive
           | than gambling, and perhaps can be abused in a similar way.
        
             | HKH2 wrote:
             | You might not have found any uses that suit your niche, but
             | that doesn't mean those of us who have are just making up
             | stories about productivity.
        
               | somewhereoutth wrote:
               | "You might not have found any games that suit your skill
               | set, but that doesn't mean those of us who have are just
               | making up stories about making money"
               | 
               | Sorry, somewhat trite and unfair, but, if there _is_ a
               | gambling-like dopamine reward cycle occurring, then the
               | users would have a hard time being truly objective about
               | any productivity boost _in total_. They may instead focus
               | on the  'wins', without taking into account any overheads
               | or 'losses', much as a gambler would do.
        
               | HKH2 wrote:
               | Sure, confirmation bias exists, but you can compare with
               | the alternatives.
               | 
               | E.g. a search engine can give you zero useful results,
               | and you can fine tune your query and still get nothing
               | after scrolling through pages of results (Do people
               | really take the losses into account when using search
               | engines?) I find prompt engineering with LLMs more useful
               | because you get nudged in interesting directions, and
               | even if you come away with no direct results, you have
               | more of an idea of what you are looking for. Maybe
               | lateral thinking is overrated.
        
             | intended wrote:
             | LLMs work very well if you know the domain you are using
             | the LLM on. If you have the ability to verify the output is
             | incorrect, you will gain productivity using LLMs.
        
         | zer00eyz wrote:
         | > This isn't a bug
         | 
         | If it isn't a bug, it dam well isn't a hallucination, or
         | creativity.
         | 
         | This is a deeply integrated design defect. One that highlights
         | what we're doing (statistically modeling lots of human
         | language)...
         | 
         | Throwing more data against this path isnt going to magically
         | make it wake up and be an AGI. And this problem is NOT going to
         | go away.
         | 
         | The ML community need to back off the hype train. The first
         | step is them not anthropomorphizing their projects.
        
       | precompute wrote:
       | Ah, the AI hype is now entering the "let's be real" phase.
       | Haven't seen a frenzied post on alignment in a while now.
        
         | Culonavirus wrote:
         | The hype is insane. Listen, I think LLMs still have a lot of
         | room to grow and they're already very useful, but like some
         | excellent researchers say, they're not the holy grail. If we
         | want AGI, LLMs are not it. A lot of people seem to think this
         | is an engineering issue and that LLMs can get us there, but
         | they can't, because it is not an engineering issue.
        
           | hnfong wrote:
           | Do you have evidence to back your claims up besides "the hype
           | is overblown"? Because hype only indicates that the precise
           | hyped up claims are wrong, it doesn't imply the opposite
           | extreme (i.e. LLMs can never achieve AGI) must be true.
        
           | mistermann wrote:
           | What kind(s) of an issue do you think it is fundamentally?
        
           | PopePompus wrote:
           | I don't think you can say with confidence that the LLM
           | approach will not lead to AGI, unless you understand in
           | detail how human intelligence operates, and can show that no
           | modification to current LLM architectures can achieve the
           | same or superior results. I think the fact that adding
           | "attention" to LLMs made a huge difference means that we are
           | probably still in the low hanging fruit stage of LLM
           | architecture development, and a few more design improvements
           | on a par with "attention" might lead to something that could
           | legitimately be called AGI. Many people, myself included,
           | believe that LLMs are now exhibiting emergent behavior
           | properties. If that's true, then saying that LLMs are not
           | intelligent because they just predict the next token of
           | output is like saying collections of neurons cannot be
           | intelligent because they just stimulate other neurons
           | chemically.
        
             | DinaCoder99 wrote:
             | Well, both the cognitive scientists and linguists seem very
             | doubtful we can apply this model to human cognition and
             | yield much of value, so I'd say the idea that this model
             | can yield behavior analogous to human cognition without
             | other mechanisms seems rather far-fetched.
             | 
             | Of course, we should absolutely pursue better understanding
             | of both as to not throw the baby out with the bath water,
             | but I'm not personally placing much hope in finding AGI any
             | time soon.
        
           | Der_Einzige wrote:
           | I'm going to take the exact opposite take and claim that
           | "some excellent researchers" support it.
           | 
           | "AGI" is practically already here, you just don't want to
           | admit it: https://www.noemamag.com/artificial-general-
           | intelligence-is-...
        
       | bbor wrote:
       | I'm sorry... does this paper just point out that LLMs by
       | definition are not as good at holding data as a direct database?
       | Cause A) duh and b) who cares, they're intuitive language
       | transformers, not knowledge models.
       | 
       | Maybe I'm missing something obvious? This seems like someone
       | torturing math to imply outlandish conclusions that fit their (in
       | this case anti-"AI") agenda.
        
         | anonylizard wrote:
         | It at least disproves LLMs from being 'god models'. They will
         | never be able to solve every problem perfectly.
        
           | ninetyninenine wrote:
           | Humans aren't God models either. The goal is to get this
           | thing to the level of a human. God like levels are not
           | possible imo.
        
       | TylerE wrote:
       | I miss the days when HN posts about hallucinating were about
       | microdosing.
        
       | hyperpape wrote:
       | The result seems to rely on stipulating the LLM must answer true
       | or false to all its questions, and can't say "I don't know." So
       | it's an interesting result, but it's not obvious that it tells us
       | much about our actual problem, which is 100% about how to get a
       | system that accurately understand the level of confidence it
       | should have.
        
       | cjdell wrote:
       | I thought the industry was already experimenting with the idea
       | that you have another LLM observing the output of the primary LLM
       | which is trained more towards safety than creativity.
       | 
       | On top of that it would be good if the safety LLM could give a
       | confidence score in the answer given by the main LLM. Then you
       | can try multiple attempts with different parameters and only show
       | the highest confidence answer to the user.
        
       | franze wrote:
       | Hallucinations and Ideas are the same thing.
        
       | DebtDeflation wrote:
       | There used to be an entire sub-field of NLP called Open Domain
       | Question Answering (ODQA). It extensively studied the problem of
       | selecting the best answer from the set of plausible answers and
       | devised a number of potential strategies. Like everything else in
       | AI/ML it fell victim to the "bitter lesson", in this case that
       | scaling up "predict the next token" beats an ensemble of
       | specialized linguistic-based methods.
        
         | falling_myshkin wrote:
         | For those who don't know:
         | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
         | 
         | I agree with you for the NLP domain, but I wonder if there will
         | also be a bitter lesson learned about the perceived generality
         | of language for universal applications.
        
       | sgammon wrote:
       | Splinters inevitable: An innate limitation of working with wood
       | /s
        
         | HKH2 wrote:
         | People expect splinter-free epistemology instead of just
         | putting on some gloves.
        
       | Scene_Cast2 wrote:
       | I don't think anyone has mentioned Bayesian Neural Nets (I forget
       | the exact term). Sure, the paradigm adds an order of magnitude
       | overhead (at least - and that's why I've never seen it used in
       | the industry), but you can bolt it on to existing architectures.
       | 
       | The basic idea is that besides the probabilities, the network
       | also spits out confidence (IIRC based on how out-of-distribution
       | the input is). There's been a ton of work on getting confidence
       | values out of existing neural nets without as much overhead, but
       | I've never seen those approaches replicate in the industry.
        
         | wnkrshm wrote:
         | I would imagine that to propagate any confidence value through
         | the system you'd need to have priors for the confidence of
         | correctness for all data in your training set. (and those
         | priors change over time)
        
       | somewhereoutth wrote:
       | For production systems the considerations are:
       | 
       | - how often does it error?
       | 
       | - how bad are the errors?
       | 
       | - how tolerable are the errors?
       | 
       | - how detectable are the errors?
       | 
       | - how recoverable are the errors?
       | 
       | For example, a pocket calculator that occasionally was out at the
       | 3rd decimal place might do much more damage than one that quite
       | regularly returned NaN.
       | 
       | LLMs error both regularly and badly, so detectability and
       | recoverability are going to be crucial for useful deployment in
       | systems outside of those that have a high tolerance for errors
       | (e.g. algorithmic feeds).
        
         | intended wrote:
         | Yes. If you want to make something that works, your life is
         | figuring out evaluation and LLM ops.
         | 
         | At that point, you know its not thinking, its doing token
         | prediction.
        
       | macrolime wrote:
       | They define an LLM as "a probabilistic model of a string that
       | conditions the output at time t based on all the tokens that come
       | before it in the string".
       | 
       | I think that definition is wide enough to include human
       | intelligence, so their finding should be equally valid for
       | humans.
        
         | jddj wrote:
         | Silk silk silk silk silk silk silk.
         | 
         | What do cows drink?
        
         | moooo99 wrote:
         | > I think that definition is wide enough to include human
         | intelligence, so their finding should be equally valid for
         | humans.
         | 
         | Which is definitely true. Human memory and the ability to
         | correctly recall things we though we remembered is affected by
         | a whole bunch of things and at times very unreliable.
         | 
         | However, human intelligence, unlike LLMs, is not limited to
         | recalling information we once learned. We are also able to do
         | logical reasoning, which seems to improve in LLMs, but is far
         | from being perfect.
         | 
         | Another problem is how different we treat the reliability of
         | information depending on the source, especially based on
         | personal bias. I think that is a huge factor, because in my
         | experience, LLMs tend to quickly fall over and change their
         | opinion based on user input.
        
           | ben_w wrote:
           | We _can_ do logical reasoning, but we 're very bad at it and
           | often take shortcuts either via pattern matching, memory, or
           | "common sense".
           | 
           | Baseball and bat together cost $1.10, the bat is $1 more than
           | the ball, how much does the ball cost?
           | 
           | A French plane filled with Spanish passengers crashes over
           | Italy, where are the survivors buried?
           | 
           | An armed man enters a store, tells the cashier to hand over
           | the money, and when he departs the cashier calls the police.
           | Was this a robbery?
        
             | mistermann wrote:
             | Humans also have various culturally flavored, _implicit_
             | "you know what I mean" algorithms on each end to smooth out
             | "irrelevant" misunderstandings and ensure a cordial
             | interaction, a cultural prime directive.
        
               | ben_w wrote:
               | Sure. I think LLMs are good at that kind of thing.
               | 
               | My final example demonstrates how those cultural norms
               | cause errors, it was from a logical thinking session at
               | university, where none of the rest of my group could
               | accept my (correct) claim that the answer was "not enough
               | information to answer" even when I gave a (different but
               | also plausible) non-robbery scenario and pointed out that
               | we were in a logical thinking training session which
               | would have trick questions.
               | 
               | My dad had a similar anecdote about not being able to
               | convince others of the true right answer, but his
               | training session had the setup "you crash landed on the
               | moon, here's a list of stuff in your pod, make an ordered
               | list of what you take with you to reach a survival
               | station", and the correct answer was 1. oxygen tanks, 2.
               | a rowing boat, 3. everything else, because the boat is a
               | convenient container for everything else and you can drag
               | it along the surface even though there's no water.
        
               | mistermann wrote:
               | Don't you think it's strange that humans have little to
               | no interest when root causes to their problems are found?
        
               | ben_w wrote:
               | Sometimes.
               | 
               | No idea what you're getting at here, though.
        
           | magicalhippo wrote:
           | > We are also able to do logical reasoning
           | 
           | This is effectively like coming up with an algorithm and then
           | executing it. So how good/bad are these LLMs if you asked
           | them to generate say a LUA script to compute the answer, ala
           | counting occurrences problem mentioned in a different
           | comment, and then pass that off to a LUA interpreter to get
           | the answer?
        
             | moooo99 wrote:
             | > counting occurrences problem mentioned in a different
             | comment, and then pass that off to a LUA interpreter to get
             | the answer?
             | 
             | I think this is a sensible approach in some problem domains
             | with software development being a particularly good
             | example. But I think this approach quickly falls apart as
             | soon as your ,,definitely right answer" involves real world
             | interaction.
             | 
             | And if one thinks about it, most of the value any company
             | derives comes down to some sort of real world interaction,
             | wether directly or by proxy.
        
         | Cacti wrote:
         | When we can define and measure intelligence, perhaps these
         | discussions will be meaningful.
        
         | zamalek wrote:
         | Which might explain the evolutionary reason for dreaming: it
         | prunes hallucination. Might it make sense to interleave
         | training and dreaming?
        
       | Almondsetat wrote:
       | Of course it's inevitable.
       | 
       | Things can be facts or deductions of facts (or both). If I ask an
       | LLM the date of birth of Napoleon and it doesn't have it in its
       | dataset there are only 2 options: either it has other facts from
       | which Napoleon's birthday can be deduced or it doesn't. If it
       | does then by improving the LLM we will be able to make more and
       | more deductions, it if doesn't then it can only hallucinate.
       | Since there will always be a set of facts that the LLM is not
       | aware of and which cannot be deduced we will always have
       | hallucinations
        
         | jtc331 wrote:
         | Is this a way of saying that large language models don't have
         | the concept of "I don't know"?
        
         | andybak wrote:
         | Your "Of course" is a red flag.
         | 
         | Unless you have a very precise definition of "LLM" then there's
         | no "of course". It's possible to imagine a hypothetical
         | software system capable of returning "I don't know" or "I'm not
         | sure".
         | 
         | I haven't RTFA but I presume they are arguing within specific
         | constraints. The important point is - could an augmented LLM or
         | an "LLM plus something else" solve this.
         | 
         | I don't have an answer to that but I don't think it's an "of
         | course" type answer.
        
         | samatman wrote:
         | I don't see why that's inevitable at all. I immediately
         | recognize that I don't happen to know Napoleon's birthday, and
         | act on that basis.
        
       | keepamovin wrote:
       | This is why you need to pair language learning with real world
       | experience. These robots need to be given a world to explore --
       | even a virtual one -- and have consequences within, and to
       | survive it. Otherwise it's all unrooted sign and symbol systems
       | untethered to experience.
        
         | phh wrote:
         | I think I agree with you (I even upvoted), but this might be an
         | anthropomorphism.
         | 
         | Back like 3-5 years ago, we already thought that about LLMs:
         | They couldn't answer questions about what would fall when stuff
         | are attached together in some non-obvious way, and the argument
         | back then was that you had to /experience/ it to realize it.
         | But LLMs have long fixed those kind of issues.
         | 
         | The way LLMs "resolve" questions is very different from us. At
         | this point, I think that if we want to prove that LLMs need to
         | be rooted in the real world to achieve intelligence, we need to
         | find some real-world phenomenon that is so obvious that noone
         | ever wrote about it... but then we'd have written about it?
        
           | barnacs wrote:
           | Think of it this way:
           | 
           | Intelligent beings in the real world have a very complex
           | built-in biological error function rooted in real world
           | experiences: sensory inputs, feelings, physical and temporal
           | limitations and so on. You feel pain, joy, fear, have a
           | limited lifetime, etc.
           | 
           | "AI" on the other hand only have an external error function,
           | usually roughly designed to minimize the difference of the
           | output from that of an actually intelligent real world being.
        
       | pfdietz wrote:
       | A hallucinator seems like an excellent thing to have as a
       | component to an intelligent system, but it must be coupled with
       | evaluators.
       | 
       | Such an architecture seems plausible for the human brain as well.
        
       | viktour19 wrote:
       | If hallucination is inevitable? What should developers do?
       | 
       | Design user experiences that align users with this behaviour!
       | 
       | Relatedly, I built a game to demonstrate how one might calibrate
       | users to the responses of LLMs:
       | 
       | https://news.ycombinator.com/item?id=39255583
        
       | demondemidi wrote:
       | I bet that, basically, an LLM is just a part of a future AI. The
       | same way that a distributor is part of an internal combustion
       | engine. Or the way Minsk's described The Society of Mind. Eager
       | to see if an LLM can be bolted on to some new kind of giant model
       | that does something like mitigate the LLMs weaknesses. Maybe
       | it'll be a collection of radically different models working
       | together in 20 years, and not just a single model. Like, uh, our
       | own brains. It reminds me of how lex and yacc are super important
       | as grammar and tokenizer, but are only just the front end to much
       | larger projects.
        
       | WirelessGigabit wrote:
       | I feel the same way about information provided by LLMs as with
       | lots of pre-LLM articles and overall YouTube 'information'
       | videos.
       | 
       | Sources matter. You're either rehashing information from a
       | collection of sources or you have your own data to prove your
       | statements.
        
         | coffeefirst wrote:
         | Yeah, this is why I really like Kagi's approach: it's clearly
         | marked for what is, and cites its sources so you can verify the
         | quality of the answer (or at least get a ballpark idea of where
         | it's coming from) yourself.
        
       | lebuffon wrote:
       | Complete AI layman here but it seems to me that the human mind's
       | architecture has an overarching "executive" module that is
       | involved in managing the numerous "expert" systems that do other
       | stuff for us. (split brain experiments demonstrate multiple
       | "minds" in the brain)
       | 
       | Based on what we know about different systems in the brain it
       | might be a stretch to expect hallucination free AGI, using only a
       | single LLM.
        
         | Swizec wrote:
         | > Based on what we know about different systems in the brain it
         | might be a stretch to expect hallucination free AGI, using only
         | a single LLM.
         | 
         | Based on what we know about brains, it might be a stretch to
         | expect hallucination free AGI. I've yet to meet a general
         | intelligence that doesn't hallucinate.
         | 
         | Lots of fun examples from chickens who turn off when they see a
         | straight line to cats who get spooked by cucumbers and the
         | endless fun cognitive biases humans fall for.
        
       | pylua wrote:
       | Sometimes hallucination is sort of a feature instead of a bug.
       | For instance, if I ask it how to do something and it hallucinates
       | , usually it is perfectly logical for it to work the way it
       | suggests, even though it is wrong.
       | 
       | In other words, it can be a good feature request or idea.
        
       | Fripplebubby wrote:
       | The core argument in this paper it seems to me from scanning it
       | is that because P != NP therefore LLMs will hallucinate answers
       | to NP-complete problems.
       | 
       | I think this is a clever point and an interesting philosophical
       | question (about math, computer science, and language), but I
       | think people are mostly trying to apply this using our
       | commonsense notions of "LLM hallucination" rather than the formal
       | notion they use in this paper, and I don't see an obvious
       | connection, since commonsense hallucinations (eg inventing
       | chapters of a novel when asked to produce summaries, inventing
       | specific details when asked) don't seem to be NP-complete
       | problems but rather are hallucinatory for some other interesting
       | reason. (I apologize if I have not captured the paper correctly
       | and would welcome correction on that, I read it quickly)
       | 
       | The statement that the formal world (the world of math and logic
       | and formal grammars) is a subset of the "real" world (or perhaps,
       | the world of natural language) is really interesting to me as
       | well. Most humans can't solve formal logic problems and parse
       | formal grammars but don't suffer from a (strong) hallucination
       | effect, and can work in natural language in great proficiency. Is
       | hallucination inevitable in humans since we also can't solve
       | certain NP-complete problems? We have finite lifespans, after
       | all, so even with the capabilities we might never complete a
       | certain problem.
        
         | foobarian wrote:
         | Humans have some amount of ability to recognize they hit a wall
         | and adjust accordingly. On the other hand this (completeness
         | theorems, Kolmogorov complexity, complexity theory) was only
         | arrived at what, in the 20th century?
        
           | digitalsushi wrote:
           | 'Adjust accordingly' includes giving up and delivering
           | something similar to what I asked, but not what I asked; is
           | this the point at which the circle is complete and AI has
           | fully replaced my dev team?
        
             | foobarian wrote:
             | Well in the example of an NP complete problem, a human
             | might realize they are having trouble coming up with an
             | optimal solution and start analyzing complexity. And once
             | they have a proof might advise you accordingly and perhaps
             | suggest a good enough heuristic.
        
               | lazide wrote:
               | Have you managed dev teams before? It's really nice when
               | they do that, but that is far from the common case.
        
               | flextheruler wrote:
               | Is the commenter above you implying humans hallucinate to
               | the level of LLMs? Maybe hungover freshman working on a
               | tight deadline without having read the book do, but not
               | professionals.
               | 
               | Even a mediocre employees will often realize they're
               | stuck, seek assistance, and then learn something from the
               | assistance instead of making stuff up.
        
               | groestl wrote:
               | Depending on the country / culture / job description,
               | "making stuff up" is sometimes a viable option for
               | "adjust accordingly", on all levels of expertise.
        
               | pixl97 wrote:
               | People commonly realize when they are stuck... But note,
               | the LLM isn't stuck, it keeps producing (total bullshit)
               | material, and this same problem happens with humans all
               | the time when they go off on the wrong tangent and some
               | supervisory function (such as the manager of a business)
               | has to step in and ask wtf they are up to.
        
             | skywhopper wrote:
             | One thing a human might do that I've never seen an LLM do
             | is ask followup and clarifying questions to determine what
             | is actually being requested.
        
               | ericb wrote:
               | What makes this fascinating to me is, these LLM's were
               | trained on an internet filled with tons of examples of
               | humans asking clarifying questions.
               | 
               | Why _doesn 't_ the LLM do this? Why is the "next, most-
               | likely token" _never_ a request for clarification?
        
               | bongodongobob wrote:
               | GPT4 absolutely asks for clarification all the time.
        
             | steveBK123 wrote:
             | Everyone assumes the AI is going to replace their employees
             | but not replace them.. fascinating.
        
               | Jensson wrote:
               | Uber proves we can replace Taxi management with simple
               | algorithms, that was apparently much easier than
               | replacing the drivers. I hope these bigger models can
               | replace management in more industries, I'd love to have
               | an AI as a manager.
        
               | steveBK123 wrote:
               | Yeah on the one hand people think AI management id
               | dystopian (probably lol), on the other hand probably
               | fewer than 50% of ICs promoted to management are good at
               | it.
               | 
               | North of 25% are genuinely bad at it. We've all worked
               | for several of these.
               | 
               | Many of us have tried our hand at management and then
               | moved back to senior IC tracks. Etc.
        
         | p1esk wrote:
         | The only way to reduce hallucinations in both humans and LLMs
         | is to increase their general intelligence and their knowledge
         | of the world.
        
           | robrenaud wrote:
           | A smart bullshit artist who loves the sound of their own
           | voice is going to generate more hallucinations than a less
           | smart, more humble, more self aware person.
           | 
           | Making LLMs more knowledgeable is great (more data, bigger
           | models, yay!), but there are other avenues of plausible
           | attack as well. Enabling LLMs to know when they have veered
           | off distribution might work. That is, the LLM doesn't have to
           | know more of the world, it just has to know what it knows and
           | stay there. A person who says "I don't know" is a lot more
           | valuable than an overzealous one who spouts nonsense
           | confidently. Encouraging an LLM to say that there is a
           | disagreement about a topic rather than picking one lane is
           | also a valuable way forward.
        
             | p1esk wrote:
             | I agree with your points but they are orthogonal to mine. A
             | smart person might be more likely to say "I don't know"
             | than a stupid person.
             | 
             | Also, a smart bullshit artist in your example does not
             | hallucinate - he knows what he's doing.
        
           | FpUser wrote:
           | You post amounts to: in order to be smarter I need to
           | increase my smartness. Great insight.
        
             | lazide wrote:
             | I think it's more subtly misleading - to be smarter, I need
             | more knowledge. But knowledge != smart, knowledge ==
             | informed, or educated.
             | 
             | And the problem is more - how can an LLM tell us it doesn't
             | know something instead of just making up good sounding, but
             | completely delusional answers.
             | 
             | Which arguably isn't about being smart, and is only
             | tangentially about less or more (external) knowledge
             | really. It's about self-knowledge.
             | 
             | Going down the first path is about knowing everything (in
             | the form of facts, usually). Which hey, maybe?
             | 
             | Going down the second path is about knowing oneself. Which
             | hey, maybe?
             | 
             | They are not the same.
        
               | p1esk wrote:
               | Hallucinations are an interesting problem - in both
               | humans and statistical models. If we asked an average
               | person 500 years ago how the universe works, they would
               | have confidently told you the earth is flat and it rests
               | on a giant turtle (or something like that). And that
               | there are very specific creatures - angels and demons who
               | meddle in human affairs. And a whole a lot more which has
               | no grounding in reality.
               | 
               | How did we manage to reduce that type of hallucination?
        
               | ottaborra wrote:
               | by taking steps to verify everything that was said
        
               | ebcode wrote:
               | >> And the problem is more - how can an LLM tell us it
               | doesn't know something instead of just making up good
               | sounding, but completely delusional answers.
               | 
               | I think the mistake lies in the belief that the LLM
               | "knows" things. As humans, we have a strong tendency to
               | anthropomorphize. And so, when we see something behave in
               | a certain way, we imagine that thing to be doing the same
               | thing that we do when we behave that way.
               | 
               | I'm writing, and the machine is also writing, but what
               | I'm doing when I write is _very_ different from what the
               | machine does when it writes. So the mistake is to say, or
               | think,  "I think when I write, so the machine must also
               | think when it writes."
               | 
               | We probably need to address the usage of the word
               | "hallucination", and maybe realize that the LLM is
               | _always_ hallucinating.
               | 
               | Not: "When it's right, it's right, but when it's wrong,
               | it's hallucinating." It's more like, "Sweet! Some of
               | these hallucinations are on point!"
        
               | p1esk wrote:
               | _I think when I write, so the machine must also think
               | when it writes._
               | 
               | What is it exactly you do when you "think"? And how is it
               | different from what LLM does? Not saying it's not
               | different, just asking.
        
               | Jensson wrote:
               | There are probably many, but the most glaring one is that
               | LLMs has to write a word every time it thinks, meaning it
               | can't solve a problem before it starts to write down the
               | solution. That is an undeniable limitation of current
               | architectures, it means that the way the LLM answers your
               | question also matches its thinking process, meaning that
               | you have to trigger a specific style of response if you
               | want it to be smart with its answer.
        
               | p1esk wrote:
               | Ok, so how do humans solve a problem? Isn't it also a
               | sequential, step by step process, even if not expressed
               | explicitly in words?
               | 
               | What if instead of words a model would show you images to
               | solve a problem? Would it change anything?
        
               | dkjaudyeqooe wrote:
               | No, I don't know how other people think but I just focus
               | on something and the answer pops into my head.
               | 
               | I generally only use a step by step process if I'm
               | following steps given to me.
        
               | pixl97 wrote:
               | >but I just focus on something and the answer pops into
               | my head.
               | 
               | It's perfectly valid to say "I don't know", because no
               | one really understand these parts of the human mind.
               | 
               | The point here is saying "Oh the LLM thinks word by word,
               | but I have a magical black box that just works" isn't
               | good science, nor is it a good means of judging what LLMs
               | are capable or not capable of.
        
               | ebcode wrote:
               | That's a difficult question to answer, since I must be
               | doing a lot of very different things while thinking. For
               | one, I'm not sure I'm never _not_ thinking. Is thinking
               | different from  "brain activity"? We can shut down the
               | model, store it on disk, and boot it back up. Shut down
               | my brain and I'm a goner.
               | 
               | I'm open to saying that the machine is "thinking", but I
               | do think we need more clear language to distinguish
               | between machine thinking and human thinking.
               | 
               | EDIT: I chose the wrong word with "thinking", when I was
               | trying to point out the logical fallacy of
               | anthropomorphizing the machine. It would have been more
               | clear if I had used the word "breathing": When I write
               | I'm breathing, so the machine must also be breathing.
        
           | renegade-otter wrote:
           | It's statistical prediction. LLMs do not "understand" the
           | world by definition.
           | 
           | Ask an image generator to make "an image of a woman sitting
           | on a bus and reading a book".
           | 
           | Images will be either a horror show or at best full of weird
           | details that do not match the real world - because it's not
           | how any of this works.
           | 
           | It's a glorified auto-complete that only works due to the
           | massive amounts of data it is trained on. Throw in any
           | complex interactions it has not seen in the training data and
           | it's all over.
        
             | sshumaker wrote:
             | You're being downvoted because this is a hot take that
             | isn't supported by evidence.
             | 
             | I just tried exactly that with dalle-3 and it worked well.
             | 
             | More to the point, it's pretty clear LLMs do form a model
             | of the world, that's exactly how they reason about things.
             | There was some good experiments on this a while back -
             | check out the Othello experiment.
             | 
             | https://thegradient.pub/othello/
        
             | HDThoreaun wrote:
             | I think the situation is a lot more complicated than youre
             | making it out to be. GPT4 for example can be very good at
             | tasks it has not seen in the training data. The philosophy
             | of mind is much more open ended and less understood than
             | you seem to think.
        
               | godelski wrote:
               | > not seen in the training data
               | 
               | Do you have some evidence for this?
        
               | renegade-otter wrote:
               | What's the evidence? OpenAI's claims? They do have an
               | inherent interest is making investors believe this
               | technology is magic.
        
             | HeatrayEnjoyer wrote:
             | Why do people say stuff like this that is so demonstrably
             | untrue? SD and GPT4 do not exhibit the behavior described
             | above and they're not even new.
        
               | godelski wrote:
               | Neither of these comments are accurate. (edit: but
               | renegade-otter is more correct)
               | 
               | Here's 1.5 EMA https://imgur.com/mJPKuIb
               | 
               | Here's 2.0 EMA https://imgur.com/KrPVUGy
               | 
               | No negatives, no nothing just the prompt. 20 steps of
               | DPM++ 2M Karras, CFG of 7, seed is 1.
               | 
               | Can we make it better? Yeah sure, here's some examples:
               | https://imgur.com/Dmx78xV, https://imgur.com/HBTitWm
               | 
               | But I changed the prompt and switched to DPM++ 3M SDE
               | Karras
               | 
               | Positive: beautiful woman sitting on a bus reading a
               | book,(detailed [face|eyes],detailed
               | [hands|fingers]:1.2),Tokyo city,sitting next to a window
               | with the city outside,detailed book,(8k HDR RAW Fuji
               | film:0.9),perfect reflections,best
               | quality,(masterpiece:1.2),beautiful
               | 
               | Negative: ugly,low quality,worst quality,medium
               | quality,deformed,bad hands,ugly face,deformed book,bad
               | text,extra fingers
               | 
               | We can do even better if we use LoRAs and textual
               | inversions, or better checkpoints. But there's a lot of
               | work that goes into making really high quality photos
               | with these models.
               | 
               | Edit: here is switching to Cyberrealistic checkpoint:
               | https://imgur.com/gFMkg0J,
               | 
               | And here's adding some LoRAs, TIs, and prompt
               | engineering:
               | 
               | https://imgur.com/VklfVVC (https://imgur.com/ZrAtluS,
               | https://imgur.com/cYQajMN), https://imgur.com/ci2JTJl
               | (https://imgur.com/9tEhzHF, https://imgur.com/4Ck03P7).
               | 
               | I can get better, but I don't feel too much like it just
               | to prove a point.
        
               | TacticalCoder wrote:
               | > I can get better, but I don't feel too much like it
               | just to prove a point.
               | 
               | Honestly these pictures you posted do prove GP's point...
        
               | renegade-otter wrote:
               | You kind of proved my point. Of course the "finger
               | situation" is getting better but people handling complex
               | objects is still where these tools trip. They can't
               | reason about it - they just need to see enough data of
               | people handling books. On a bus. Now do this for ALL
               | possible objects in the world.
               | 
               | I have generated hundreds of these - the bus cabin LOOKS
               | like a bus cabin, but it's a plausible fake - the poles
               | abruptly terminate, the seats are in weird unrealistic
               | configurations, unnatural single-row isles, etc. Which is
               | why I called it a super-convincing autocomplete.
        
               | paulmd wrote:
               | > Why do people say stuff like this that is so
               | demonstrably untrue? SD and GPT4 do not exhibit the
               | behavior described above and they're not even new.
               | 
               | it's true that most people do not actually understand the
               | problem/limitation, but it's a discussion that is
               | statistically likely to occur on the internet and
               | therefore people tend to regurgitate the words without
               | understanding the concept.
               | 
               | I'm being facetious but honestly it's a major theme of
               | this whole AI revolution, people do not want to accept
               | that humans are just another kind of machine and that
               | their own cognition resembles AI/ML in virtually every
               | aspect. People confabulate. People overreach the bounds
               | of their expertise. People repeat words and concepts
               | without properly understanding the larger context in
               | which they need to be applied. Etc etc.
               | 
               | Has nobody ever watched someone get asked a big question
               | or an unexpected question and "watched the wheels turn",
               | or watched them stammer out some slop of incoherent words
               | while they're processing? Does nobody have "canned
               | responses" that summarize a topic that you can give
               | pretty much the same (but not exactly, of course) every
               | time you are asked it? Is that not "stochastic word
               | chains"?
               | 
               |  _By design_ neural nets work almost exactly the same as
               | your brain. But a lot of people are trapped in the idea
               | that there must be some kind of  "soul" or something that
               | makes _human_ cognition fundamentally different. By
               | design, it 's not. And we don't fully understand the
               | exact modalities to encode information in it usefully and
               | process it yet, but that's what the whole process here is
               | about.
               | 
               | (I commented about this maybe 6 months ago, but the
               | _real_ hot take is that what we think of as
               | "consciousness" isn't a real thing, or even an "overseer"
               | within the mind - "consciousness" may be exactly the
               | thing people say when they mean that "LLMs have to write
               | a word every time they think about a concept".
               | "Consciousness" may in fact be a low-dimensional
               | _projection_ of the actual computation occurring in the
               | brain itself, rationalizing and explicating the symbolic
               | computations of the brain in some form that can be
               | written down and communicated to other humans.
               | "Language" and "consciousness" as top-level concepts may
               | actually only be an annex that our brain has built for
               | storing and communicating those symbolic computations,
               | rather than a primary thing itself. It's not in control,
               | it's only explaining decisions that we already have
               | made... we see the shadows on the wall of plato's cave
               | and think that's the entire world, but it's really only a
               | low-dimensional projection.)
               | 
               | (or in other words - everyone assumes consciousness is
               | the OS, or at least the application. But actually it may
               | be the json serializer/deserializer. I.e. not actually
               | the thing in control at all. _Our entire lives and
               | decisionmaking processes_ may in fact be simple
               | rationalizations and explanations around  "what the
               | subconscious mind thinks should happen next".)
        
         | someplaceguy wrote:
         | > because P != NP therefore LLMs will hallucinate answers to
         | NP-complete problems.
         | 
         | I haven't read the paper, but that sounds like it would only be
         | true if the definition of "hallucinating" is giving a wrong
         | answer, but that's not how it's commonly understood.
         | 
         | When people refer to LLMs hallucinating, they are indeed
         | referring to an LLM giving a wrong (and confident) answer.
         | However, not all wrong answers are hallucinations.
         | 
         | An LLM could answer "I don't know" when asked whether a certain
         | program halts and yet you wouldn't call that hallucinating.
         | However, it sounds like the paper authors would consider "I
         | don't know" to be a hallucinating answer, if their argument is
         | that LLMs can't always correctly solve an NP-complete problem.
         | But again, I haven't read the paper.
        
           | Fripplebubby wrote:
           | Yes, I think you're right. I think one way to phrase the
           | authors' argument is:
           | 
           | * There is a class of problems which are harder than
           | polynomial time complexity to solve, but are not np-complete
           | 
           | * LLMs will generate an "answer" in formal language to this
           | class of problems posed to it
           | 
           | * LLMs can at most solve problems with polynomial time
           | complexity due to their fundamental design and principles
           | 
           | * Therefore, LLMs cannot solve > polynomial problems and not
           | np-complete problems either
           | 
           | All of which I buy completely. But I think what people are
           | more interested in is, why is it that the LLM gives an answer
           | when we can prove that it cannot answer this problem
           | correctly? And perhaps that is more related to the
           | commonsense notion of hallucination than I first gave it
           | credit for. Maybe the reason that an LLM gives a formal
           | language answer is the same reason it gives a hallucinatory
           | answer in natural language. But I don't think the paper sheds
           | light on that question
        
             | dragonwriter wrote:
             | > why is it that the LLM gives an answer when we can prove
             | that it cannot answer this problem correctly?
             | 
             | Brcause LLMs are not "problem solving machines" they are
             | text completion models, so (when trained for q-and-a
             | response) their function is to produce text output which
             | forms a plausible seeming response to the question posed,
             | not to execute an algorithm which solves the logical
             | problem it communicatss. Asking "why do LLMs do exactly
             | what they are designed to do, even when they cannot do the
             | thing that that behavior implies to a human would have been
             | done to produce it" just reveals a poor understanding of
             | what an LLM is. (Also, the fact that they structurally
             | can't solve a class of problems does not mean that they
             | can't produce correct answers, it means they can't
             | _infallibly_ produce correct answers; the absence of a
             | polynomial time solution does not rule out an arbitrarily
             | good polynomial time approximation algorithm, though its
             | unlikely than an LLM is doing that, either.)
        
               | someplaceguy wrote:
               | > their function is to produce text output which forms a
               | plausible seeming response to the question posed
               | 
               | Answering "I don't know" or "I can't answer that" is a
               | perfectly plausible response to a difficult logical
               | problem/question. And it would not be a hallucination.
        
               | dragonwriter wrote:
               | > Answering "I don't know" or "I can't answer that" is a
               | perfectly plausible response to a difficult logical
               | problem/question.
               | 
               | Sure, and you can train LLMs to produce answers like that
               | more often, but then users will say your model is lazy
               | and doesn't even try, whereas if you train it to be more
               | likely to produce something that looks like a solution
               | more often, people will think "wow, the AI solved this
               | problem I couldn't solve". And that's why LLMs behave the
               | way they do.
        
               | someplaceguy wrote:
               | > Sure, and you can train LLMs to produce answers like
               | that more often, but then users will say your model is
               | lazy and doesn't even try, whereas if you train it to be
               | more likely to produce something that looks like a
               | solution more often, people will think "wow, the AI
               | solved this problem I couldn't solve".
               | 
               | Are you saying that LLMs can't learn to discriminate
               | between which questions they should answer "I don't know"
               | vs which questions they should try to provide an accurate
               | answer?
               | 
               | Sure, there will be an error rate, but surely you can
               | train an LLM to minimize it?
        
               | dragonwriter wrote:
               | > Are you saying that LLMs can't learn to discriminate
               | between which questions they should answer "I don't know"
               | vs which questions they should try to provide an accurate
               | answer?
               | 
               | No, I am saying that they are specific trained to do
               | that, and that the results seen in practice on common
               | real-world LLMs reflect the bias of the specific training
               | they are given for providing concrete answers.
               | 
               | > Sure, there will be an error rate, but surely you can
               | train an LLM to minimize it?
               | 
               | Giving some answer to a question that cannot be
               | infallibly solved analytically is not necessarily an
               | error. In fact, I would argue that providing useful
               | answers in situations like that is among the motivating
               | use cases for AI.
               | 
               | (Whether or not the answers current LLMs provide in these
               | cases are useful is another question, but you miss 100%
               | of the shots you don't take.)
        
               | pixl97 wrote:
               | >Are you saying that LLMs can't learn to discriminate
               | between which questions they should answer "I don't know"
               | vs which questions they should try to provide an accurate
               | answer?
               | 
               | This is highly problematic and highly contextualized
               | statement.
               | 
               | Imagine you're an accountant with the piece of
               | information $x. The answer you give for the statement
               | "What is $x" is going to be highly dependent on _who_ is
               | answering the question. For example
               | 
               | 1. The CEO asks "What is $x"
               | 
               | 2. A regulator at the SEC asks "What is $x
               | 
               | 3. Some random individual or member of the press asks
               | "What is $x"
               | 
               | An LLM doesn't have the other human motivations a person
               | does when asked questions, pretty much at this point with
               | LLMs there are only one or two 'voices' it hears (system
               | prompt and user messages).
               | 
               | Whereas a human will commonly lie and say I don't know,
               | it's somewhat questionable if we want LLMs intentionally
               | lying.
               | 
               | In addition human information is quite often
               | compartmentalized to keep secrets which is currently not
               | in vogue with LLMs as we are attempting to make oracles
               | that know everything with them.
        
               | someplaceguy wrote:
               | > The answer you give for the statement "What is $x" is
               | going to be highly dependent on who is answering the
               | question.
               | 
               | I assume you meant _asking_ rather than _answering_?
               | 
               | > An LLM doesn't have the other human motivations a
               | person does when asked questions, pretty much at this
               | point with LLMs there are only one or two 'voices' it
               | hears (system prompt and user messages).
               | 
               | Why would LLMs need any motivation besides how they are
               | trained to be helpful and the given prompts? In my
               | experience with ChatGPT 4, it seems to be pretty good at
               | discerning what and how to answer based on the prompts
               | and context alone.
               | 
               | > Whereas a human will commonly lie and say I don't know,
               | it's somewhat questionable if we want LLMs intentionally
               | lying.
               | 
               | Why did you jump to the conclusion that an LLM answering
               | "I don't know" is lying?
               | 
               | I want LLMs to answer "I don't know" when they don't have
               | enough information to provide a true answer. That's not
               | lying, in fact it's the opposite, because the alternative
               | is to hallucinate an answer. Hallucinations are the
               | "lies" in this scenario.
               | 
               | > In addition human information is quite often
               | compartmentalized to keep secrets which is currently not
               | in vogue with LLMs as we are attempting to make oracles
               | that know everything with them.
               | 
               | I'd rather have an oracle that can discriminate when it
               | doesn't have enough information to provide a true answer
               | and replies "I don't know" in such cases (or sometimes
               | answer like "If I were to guess, then bla bla bla, but
               | I'm not sure about this"), than one which always gives
               | confident but sometimes wrong answers.
        
               | paulmd wrote:
               | if more guiderails are useful to users then such things
               | will surely emerge.
               | 
               | but from an engineering perspective it makes sense to
               | have a "generalist model" underneath that is capable of
               | "taking its best guess" if commanded, and then trying to
               | figure out how sure it is about its guess, build
               | guiderails, etc. Rather than building a model that is
               | implicitly wishy-washy and always second-guessing itself
               | etc.
               | 
               | The history of public usage of AI has basically been that
               | too many guiderails make it useless, not just gemini
               | making japanese pharohs to boost diversity or whatever,
               | but frankly even mundane usage is frustratingly
               | punctuated by "sorry I can't tell you about that, I'm
               | just an AI". And frankly it seems best to just give
               | people the model and then if there's domains where a
               | true/false/null/undefined approach makes sense then you
               | build that as a separate layer/guiderail on top of it.
        
               | tsol wrote:
               | It isn't designed to know things. It doesn't know what
               | exactly it knows, where it could check before answering.
               | It generates an output, which isn't even the same thing
               | every time. So this again is a problem of not
               | understanding how it functions
        
         | fauigerzigerk wrote:
         | Does the paper distinguish between hallucination and
         | approximation?
         | 
         | If LLMs could be trained to approximate NP-complete functions
         | rather than making stuff up, that would be good enough in many
         | contexts. I guess it's what humans would do.
        
           | thargor90 wrote:
           | You cannot approximate NP-complete functions. If you could
           | approximate them with a practically useful limited error and
           | at most P effort you would have solved P=NP. (disclaimer my
           | computer science classes have been a long time ago)
        
             | kalkin wrote:
             | This isn't correct. What you may be remembering is that
             | some (not all) NP complete problems have limits on how
             | accurately they can be approximated (unless P = NP). But
             | approximation algorithms for NP complete problems form a
             | whole subfield of CS.
        
               | moyix wrote:
               | The theorem that proves this is the PCP Theorem, in case
               | anyone wants to read more about it: https://en.wikipedia.
               | org/wiki/PCP_theorem#PCP_and_hardness_o...
        
             | fauigerzigerk wrote:
             | Perhaps I'm not using the vocabulary correctly here.
             | 
             | What I mean is, if you ask a human to solve a travelling
             | salesman problem and they find it too hard to solve
             | exactly, they will still be able to come up with a better
             | than average solution. This is what I called approximation
             | (but maybe this is incorrect?).
             | 
             | Hallucination would be to choose a random solution and
             | claim that it's the optimum.
        
               | alwa wrote:
               | I may be misunderstanding the way LLM practitioners use
               | the word "hallucination," but I understood it to describe
               | it as something different from the kind of "random"
               | nonsense-word failures that happen, for example, when the
               | temperature is too high [0].
               | 
               | Rather, I thought hallucination, in your example, might
               | be something closer to a grizzled old salesman-map-
               | draftsman's folk wisdom that sounds like a plausibly
               | optimal mapping strategy to a boss oblivious to the
               | mathematical irreducibility of the problem. Imagining a
               | "fact" that sounds plausible and is rhetorically useful,
               | but that's never been true and nobody ever said was true.
               | 
               | It'll still be, like your human in the example, better
               | than average (if "average" means averaged across the
               | universe of all possible answers), and maybe even useful
               | enough to convince the people reading the output, but it
               | will be nonetheless false.
               | 
               | [0] e.g. https://news.ycombinator.com/item?id=39450669
        
               | fauigerzigerk wrote:
               | If a driver is tasked with visiting a number of places,
               | they will probably choose a reasonably good route. If the
               | driver claims to have found the optimal route, it may not
               | be true, but it's still not a hallucination and it's
               | still a pretty good route.
               | 
               | The driver certainly cannot be relied on to always find
               | an exact solution to an NP-complete problem. But failure
               | modes matter. For practical purposes, the driver's
               | solution is not simply "false". It's just suboptimal.
               | 
               | If we could get LLMs to fail in a similarly benign way,
               | that would make them far more robust without disproving
               | what the posted paper claims.
        
               | pixl97 wrote:
               | > but it will be nonetheless false.
               | 
               | Only if you're assuming all questions have binary
               | answers.
               | 
               | For example in the traveling salesman problem you don't
               | have to compute all answers to start converging on an
               | average. A random sampling of solutions can start setting
               | a bounds for average, and your grizzled salesmans guesses
               | would fall somewhere on that plot. If they are
               | statistically better than average then they are far more
               | than good enough. Unless of course you think burning up
               | the observable universe in finding the best solution is
               | the only way to solve the problem of which trip uses the
               | least gas?
        
         | kenjackson wrote:
         | Last I'd heard it was still open if P != NP. And most questions
         | I've seen hallucinations on are not NP-Complete.
        
         | Animats wrote:
         | Yes. It looks like they introduce infinities and then run into
         | the halting problem for infinities. That may not be helpful.
         | 
         | The place where this argument gets into trouble is where it
         | says "we define hallucination in a formal world where all we
         | care about is a computable ground truth function f on S." This
         | demands a reliable, computable predicate for truth. That alone
         | is probably not possible.
         | 
         | If, however, we are willing to accept a ground truth function
         | with outputs                   - True         - False         -
         | Unknown         - Resource limit exceeded
         | 
         | that problem can be avoided. Now the goal is manageable -
         | return True or False only when those results are valid, and try
         | to reduce the fraction of useful queries for which Unknown and
         | Resource Limit Exceeded are returned.
         | 
         | The same problem comes up in program verification systems, and
         | has been dealt with in the same way for decades. Sometimes,
         | deciding if something is true is too much work.
        
           | Fripplebubby wrote:
           | Well put. Overall this paper feels very Godel Incompleteness
           | for LLMs which is _interesting_ and perhaps even valuable to
           | somebody, but because it attaches itself to this hot query
           | 'hallucination', I think some people are finding themselves
           | searching this paper for information it does not contain.
        
         | samatman wrote:
         | Hallucination is a misnomer in LLMs and it depresses me that it
         | has solidified as terminology.
         | 
         | When humans do this, we call it confabulation. This is a
         | psychiatric symptom where the sufferer can't tell that they're
         | lying, but fills in the gaps in their knowledge with bullshit
         | which they make up on the spot. Hallucination is an entirely
         | different symptom.
         | 
         | And no, confabulation isn't a normal thing which humans do, and
         | I don't see how that fact could have anything to do with P !=
         | NP. A normal person is aware of the limits of their knowledge,
         | for whatever reason, LLMs are not.
        
           | navane wrote:
           | When you talk to your mom and you remember something
           | happening one way, and she remembers it another way, but you
           | both insist you remember it correctly, one of you is doing
           | what the LLM is doing (filling up gaps of knowledge with bull
           | shit). And even when later you talk about this on meta level,
           | no one calls this confabulation because no one uses that
           | word. Also this is not a psychiatric syndrome, it's just
           | people making shit up, inadvertently, to tell a coherent
           | story without holes. It very much sounds you did the same.
           | Everyone does this all the time.
        
             | jiggawatts wrote:
             | Just ask any criminal attorney or police detective. Normal
             | people can't get their facts straight even if they all
             | witnessed something memorable first-hand just hours ago.
        
           | pixl97 wrote:
           | >confabulation isn't a normal thing which humans do
           | 
           | > A normal person is aware of the limits of their knowledge,
           | for whatever reason, LLMs are not.
           | 
           | Eh, both of these things are far more complicated. People
           | perform minor confabulations all the time. Now, there is a
           | medical term for confabulation to about a more serious
           | medical condition that involves high rates of this occurring
           | coupled with dementia, and would be the less common form. We
           | know with things like eye witness testimony people turn into
           | confabulatory bullshit spewing devices very quickly, though
           | likely due to different mechanisms like recency bias and over
           | writing memories by thinking about them.
           | 
           | Coupled with that, people are very apt to lie about things
           | they do know and can do for a multitude of reasons and
           | attempting to teach an LLM to say "I don't know" when it
           | doesn't know something, versus it just lying to you and
           | saying it doesn't know will be problematic. Just see ChatGPT
           | getting lazy in some of its releases for backfire effects
           | like this.
        
             | singingfish wrote:
             | Classic confabulation is observed with some kinds of
             | alcohol related brain damage where people drink and get
             | malnourished for a period of years. People with these
             | syndromes create quite coherent complex stories which they
             | will not be able to recall subsequently. This is quite
             | different to filling in the blanks for remembered
             | conversations where later on there is an opportunity for
             | error correction. With confabulation there is not as it's
             | tightly bound to memory impairment.
             | 
             | So I'm in the camp where LMMs are confabulating - and
             | there's and I personally think the argument that they can
             | be seen as confabulation machines has some validity.
        
         | bitwize wrote:
         | > but rather are hallucinatory for some other interesting
         | reason.
         | 
         | In improv theater, the actor's job is to come up with plausible
         | interactions. They are free to make shit up as they go along,
         | hence improv, but they have to keep their inventions plausible
         | to what had just happened before. So in improv if someone asks
         | you "What is an eggplant?" it is perfectly okay to say "An
         | eggplant is what you get when you genetically splice together
         | an egg and a cucumber" or similar. It's _nonsense_ but it 's
         | nonsense that follows nicely from what just came before.
         | 
         | Large language models, especially interactive ones, are a kind
         | of improv theater by machine: the machine outputs something
         | statistically plausible to what had just come before; what
         | "statistically plausible" means is based on the data about
         | human conversations that came from the internet. But if there
         | are gaps in the data, or the data lacks a specific answer that
         | seems to statistically dominate, it seems like giving a
         | definitive answer is more plausible in the language model than
         | saying "I don't know", so the machine selects definitive, but
         | wrong, answers.
        
       | ninetyninenine wrote:
       | Fiction and story writing is hallucination. It is the opposite of
       | a stochastic parrot.
       | 
       | We've achieved both extremes of AI. Computers can be both logical
       | machines and hallucinators. Our goal is to create a machine that
       | can be both at the same time and can differentiate between both.
       | 
       | That's the key. Hallucination is important but the key is for the
       | computer to be self aware about when it's hallucinating.
       | 
       | Of course it's a hard problem but even humans hallucinate
       | massively. Just look at religion. Only one religion can be right
       | or none, that must mean,logically speaking all other religions
       | are hallucinations.
        
         | beardedwizard wrote:
         | It is not the opposite of stochastic parrot, it is exactly the
         | same thing only the predictions are worse due to sparse
         | training data.
        
         | elicksaur wrote:
         | Comparing religion to LLM mistakes is a case of the very
         | prevalent anthropomorphism in society currently. I fear this
         | misunderstanding and conflation will prevent us actually
         | improving the tech.
         | 
         | Coming to an incorrect explanation such as, "Helios pulls the
         | Sun across the sky every day," is categorically different than
         | a math program incorrectly returning the most likely next token
         | in a sequence. LLMs don't have beliefs at all.
         | 
         | Helios answers a question "Why does the Sun rise?" Holding such
         | a belief shows a logical understanding that _some_ force must
         | be doing this, but due to a lack of knowledge of the world, the
         | person comes up with an incorrect explanation.
         | 
         | LLMs can't pose and reason about such questions. It is just not
         | the same class of "hallucinations." Assuming we've figured out
         | cognition via word prediction is going to get us nowhere fast
         | in the long term.
        
           | ninetyninenine wrote:
           | No. I never said we figured out cognition.
           | 
           | The LLM is still a black box feed forward network. It is the
           | intricacies of how signals interact with each other in this
           | network that we don't fully understand.
           | 
           | Word prediction and curve fitting are high level concepts we
           | used to build an intelligence we don't fully understand.
           | 
           | Also belief and understanding are orthogonal concepts. I can
           | believe something I don't understand and I can understand
           | something I don't belief.
           | 
           | My claim here is that LLMs understand things from a certain
           | aspect because LLMs can produce output indistinguishable from
           | understanding.
           | 
           | Also because both the human brain and the LLM are black boxes
           | there is no other metric we can use to gauge the level of
           | understanding an LLM has other than comparing it's inputs and
           | outputs to the human brain.
        
         | MauranKilom wrote:
         | > even humans hallucinate massively
         | 
         | Simpler example: Dreams.
        
           | ninetyninenine wrote:
           | Yeah good point. But dreams are easily distinguishable from
           | reality.
           | 
           | Religion is often indistinguishable from truth and reality to
           | those who hallucinate it.
        
             | samatman wrote:
             | Confusing sincere but incorrect belief with hallucination
             | is categorically wrong.
        
           | AlexandrB wrote:
           | Most humans are extremely aware of the difference between
           | dreams and reality. If LLMs had similar awareness when they
           | hallucinated there wouldn't be a problem.
        
         | timeon wrote:
         | > Just look at religion.
         | 
         | This is bit off-topic but what I see as one of driving force
         | behind existence of religions is need for personification. It
         | seems easier for human to interact with the world and its
         | elements by communicating with it as it was familiar parson-
         | like entity.
         | 
         | Now when we talk about LLMs and AI in general, there is often
         | personification as well.
        
           | ninetyninenine wrote:
           | LLMs are trained to actually imitate human understanding
           | deliberately. The data is human and the high level training
           | is defined as the most likely word prediction of human output
           | data.
           | 
           | So not surprising to find aspects of personification in the
           | LLM. It is trained on US.
        
         | NateEag wrote:
         | > Only one religion can be right or none, that must
         | mean,logically speaking all other religions are hallucinations.
         | 
         | There are some mistakes in this sentence.
         | 
         | It is possible (if unlikely) that multiple religions accurately
         | describe some aspects of the world, while being mistaken about
         | others. That is, treating rigorous complete "correctness" as
         | the only useful state a religion could have is very misleading.
         | Newtonian physics and special relativity both fail to predict
         | some observed phenomena, but they're still both useful (and not
         | every religion claims rigorous perfect correctness, even if
         | some do).
         | 
         | Even if some religions can be shown to be wrong, that doesn't
         | automatically mean that they're hallucinations. People can
         | believe things for plausible reasons and be wrong about them.
         | 
         | People can also have reasonable stances like "I cannot prove
         | this is true, and would not try to, but my subjective personal
         | experience of visions of God persuade me it's probably real."
         | 
         | That seems very different to me from an LLM hallucinating a
         | paper from whole cloth out of the blue.
        
       | wseqyrku wrote:
       | So they always hallucinate, it's just sometimes good enough?
        
       | throwawaaarrgh wrote:
       | LLMs literally just place words one in front of another based on
       | a probability and the "goodness" of training data. Of course it's
       | going to make stuff up.
        
       | graemebenzie wrote:
       | I think part of understanding is filling in the gaps between
       | facts. AIs can't recognize when that gap is too large
        
       | graemebenzie wrote:
       | I think understanding comes from filling the gaps between the
       | facts. AIs can't tell when the gap between points of knowledge is
       | too large to interpolate.
        
       | valine wrote:
       | This paper is arguing that it's impossible for an LLM to know the
       | answer to every question, therefore it's impossible to eliminate
       | hallucination.
       | 
       | It's easy to imagine an LLM that responds "I don't know" to all
       | questions. An LLM like that isn't very useful, but it also
       | doesn't hallucinate. Eliminating hallucination by teaching it to
       | recognize what it doesn't know is probably a more sane approach
       | than teaching an LLM to know literally everything in the
       | universe.
        
       | zyklonix wrote:
       | Hallucinations are essential for divergent thinking. Not
       | everything is solved following goal driven approaches. Check out
       | DreamGPT: https://github.com/DivergentAI/dreamGPT
        
       | zuminator wrote:
       | This is sort of like the compression algorithm "problem." For the
       | overwhelming majority of inputs, compression algorithms don't
       | compress, and it can be proven that on average they don't work.
       | But we're not really interested in compressing things on average.
       | What we use compression for amounts to edge cases of highly
       | regularized or repeatable data.
       | 
       | Thus the fact that LLMs can be proven in general to hallucinate
       | doesn't necessarily imply that they must hallucinate in the types
       | of situations for which we use them for. The paper itself
       | discusses a number of mitigating strategies -- such as
       | supplementing their training data with current information or
       | using multiple LLMs to vote on the accuracy of the outcome --
       | only to basically brush them aside and advise not to use LLMs in
       | any sort of critical situation. And that's probably true enough
       | today, but in the future I think these strategies will greatly
       | reduce the severity of these hallucinations. Just as we as human
       | beings have developed strategies to reduce our reliance on pure
       | memory.
       | 
       | This reminds me of a deposition I had to give a number of years
       | back. One of the lawyers asked me if I remembered how the
       | plaintiff and I came to discuss a certain accusation leveled at
       | him by the defendant. And I confidently stated, "Sure, he and I
       | used to have various conversations about the issue and one day he
       | the plaintiff brought up this thing that defendant said to him."
       | And the lawyer says, if you want to, you can refer to your phone
       | text log to refresh your memory. Then I looked at my phone, and
       | the truth was that I myself had spoken to the defendant, and she
       | told me the accusation, and then I went and shared it with the
       | plaintiff. So, I basically remembered the situation exactly
       | backwards, i.e., a hallucination, which I was able to repair by
       | referring to real world information instead of just my memory.
        
       | educaysean wrote:
       | Is it a theoretical distinction of "we can't get to 0%, but we
       | can virtually trivialize it by reducing its frequency down to to
       | 1x10^-8%" type of scenario? Or is it something that requires an
       | external layer of control?
        
       | DinaCoder99 wrote:
       | "Hallucination" implies perception of non-real things, not
       | generation of phrases that map poorly to reality (or are simply
       | incoherent). It seems like a really bad term for this phenomenon.
        
         | TheBlight wrote:
         | "Bullsh***ing" seems more apt.
        
         | earthwalker99 wrote:
         | It makes being wrong sound impressive and mysterious, so it's
         | here to stay.
        
         | bonzaidrinkingb wrote:
         | We use "confabulation".
         | 
         | It is a feature, not a bug.
         | 
         | "confabulation" could be "solved" when LLMs realize they are
         | uncertain on a reply and making things up. But that requires
         | the LLM saying "I don't know" and rewarding that more than a
         | wrong guess. That requires a change in loss functions and not
         | even sure if all users desire that.
        
       | nedt wrote:
       | Well humans believe that vacination either kills people or gives
       | them chips for tracking and the top politicians are lizard people
       | drinking the blood of children kept in caves and they had to fake
       | a pandemic to get them out. I'd say an A.I. hallucinating isn't
       | that far off from real humans. It's rather the recipient that
       | needs to interpret any response from either.
        
         | katehikes88 wrote:
         | Have you considered that parts of what you said might be true
         | but you ridicule it only because you associate with the others
         | might be untrue and maybe even ridiculous?
        
           | nedt wrote:
           | It might be true, or not. It might be ridiculous, or not.
           | With _it_ being a message from a human or an AI. A
           | hallucination is not so much a problem as long as there is
           | not blind trust or a single source of truth. And oh boy would
           | I like to be pure ridiculous or satirical with the example of
           | what humans are believing.
        
         | bonzaidrinkingb wrote:
         | Around 12k fatal outcomes have been reported in the EU after
         | vaccination, but it is not certain in all cases that vaccines
         | were the cause.
         | 
         | The vaccine tracking chips come from two Microsoft (-affiliate)
         | patents, one about using chips to track body activity to reward
         | in cryptocurrency, and another about putting a vaccine passport
         | chip in the hands of African immigrants. That vaccines contain
         | tracking chips is a fabricated conspiracy to ridicule and
         | obfuscate.
         | 
         | Lizard people is often an anti-semitic dog whistle.
         | 
         | Rich elites use blood transfusions of young people to combat
         | aging and age-related disease.
         | 
         | Children have been kept in cages and feral children have lived
         | in caves.
         | 
         | You likely made up the part about faking a pandemic to get
         | children out of caves, unless you can point to discussion about
         | these beliefs.
         | 
         | Real humans do hallucinate all the time.
        
           | dragonwriter wrote:
           | > Real humans do hallucinate all the time.
           | 
           | No, they don't hallucinate "all the time", but LLM
           | "hallucination" is a bad metaphor, as the phenomenon is more
           | like confabulation than hallucination.
           | 
           | Humans also don't confabulate all the time, either, though.
        
             | bonzaidrinkingb wrote:
             | > "Everyone experiences hallucinations," Anil Seth, a
             | neuroscientist at the University of Sussex in the UK, told
             | DW.
             | 
             | > "It's important to recognize hallucinations can come and
             | go during our lives at points of stress or tiredness," Seth
             | said. "There is a bit of a stigma around hallucinations. It
             | comes from people associating them with mental illness and
             | being called crazy."
             | 
             | > But it's actually very common and happens even daily. The
             | itching Yarwood experiences is particularly common,
             | especially after drinking alcohol.
             | 
             | > "It's also common for people with reduced hearing or
             | vision function to get hallucinations in that ear or eye,"
             | said Rick Adams, a psychiatrist at University College
             | London. "These are non-clinical hallucinations because they
             | are not associated with a psychiatric diagnosis."
             | 
             | https://www.dw.com/en/hallucinations-are-more-common-than-
             | yo...
             | 
             | Confabulation is more like making something up when you
             | don't have sufficient knowledge. Seems to happen regularly
             | :)
        
           | nedt wrote:
           | In Germany and Austria we have those Querdenker telegram
           | channels. All examples I've given are coming from there. I'd
           | really like to say I've made it up. But all you did with my
           | message is also what I'd do with AI output. It can be trained
           | on wrong data, not understanding the question or make stuff
           | up. Just like a human.
        
             | bonzaidrinkingb wrote:
             | I think you are (subconsciously) strawmanning the anti-vax
             | movements like Querdenker. Most of these believe that
             | mandatory vaccination (or reducing freedom of unvaccinated,
             | or making it economically infeasible/required to work) is
             | bad and goes against individual human rights, and that the
             | risks and benefits of vaccines were not clearly
             | communicated.
             | 
             | So, even if you did not make it up, it is twisting the
             | viewpoints to reduce their legitimacy by tying these to
             | ridiculous theories. One could do similar by cherrypicking
             | vaccine proponents and their ridiculous theories (like
             | claiming COVID came from the wet market).
             | 
             | If these channels are not indexed, I have a hard time
             | believing you, given your misgivings and ridicule on your
             | other statements. If a discussion about "Pandemic was faked
             | to get children out of caves" can be sourced, please do so.
             | 
             | AI output is already more careful and fair and balanced on
             | these matters.
        
               | nedt wrote:
               | Source is Die Zeit as written here:
               | https://news.ycombinator.com/item?id=39504716
               | 
               | You could also find it in other sources like Science
               | Busters etc. Most of it will be German, because Germany
               | and Austria does have a real problem with some
               | (dis-)believes in the medical system.
               | 
               | Pretty sure other sources of human halicunations could be
               | given (WMD in Iraq, lot of bad things because of religon,
               | ... ). Point is not the strawman itself, but rather that
               | any message needs evaluation. AI or not.
        
           | nedt wrote:
           | Here out from the German wikipedia about the lockdown being
           | used to cover up the use of children for their blood:
           | "According to the initial interpretation, the mass quarantine
           | (the "lockdown") does not serve to combat the pandemic, but
           | is intended to provide Trump and his allies with an excuse to
           | free countless children from torture chambers, where
           | adrenochrome is being withdrawn en masse on behalf of the
           | elite." - translated via Google translate, but source is here
           | with Die Zeit as source
           | https://de.wikipedia.org/wiki/QAnon#cite_ref-29
        
             | bonzaidrinkingb wrote:
             | Thanks for the source so I can put this into context (which
             | is the context of Russian disinformation, not grassroots
             | beliefs representative of the anti-vax movement).
        
         | nedt wrote:
         | Ok I take it back. Hallucination is a problem. Seeing the
         | downvotes and comments here it does seem to be hard to see what
         | is made up and what is just fuxxed up humans. The will to
         | believe that humans can't be that stupid is bigger than I
         | thought and same evaluation might lead to an AI response taken
         | as truth if it's just calm enough and sounds plausible.
        
       | whycome wrote:
       | Maybe this is why we need to force LLMs to dream. To get all the
       | hallucinations out of the way during 'down time' and clean things
       | up for 'waking hours'.
        
       ___________________________________________________________________
       (page generated 2024-02-25 23:01 UTC)