[HN Gopher] Hallucination is inevitable: An innate limitation of...
___________________________________________________________________
Hallucination is inevitable: An innate limitation of large language
models
Author : louthy
Score : 275 points
Date : 2024-02-25 09:28 UTC (13 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| chrisjj wrote:
| > hallucination is defined as inconsistencies between a
| computable LLM and a computable ground truth function.
|
| That's simply inaccuracy or fabrication.
|
| Labelling it hallucination simply panders to the idea these
| programs are intelligent.
| somewhereoutth wrote:
| Yes, imagine a pocket calculator that gave a completely wrong
| answer 10%? of the time, and calling it 'capricious' instead of
| simply broken.
| BlueTemplar wrote:
| A real shame that Douglas Adams didn't live to see all of
| this...
| chrisjj wrote:
| Indeed. The best solution would be to market it as
| intelligent ;)
| tgv wrote:
| That statement is also a bit easy on the "computable ground
| truth." There is no such thing for the tasks we use an LLM for
| (unless you make up some definition that mangles the definition
| of each word).
| paulsutter wrote:
| The models are just generating probable text. What's amazing of
| how often the text is correct. It's no surprise at all when it's
| wrong
|
| Their bold confidence to be flat out wrong may be their most
| human trait
| baq wrote:
| This again.
|
| They're trained to generate probable text. The mechanisms
| created in the parameter blob during training to do that are
| basically a mystery and have to be pulled out of the model with
| digital brain surgery. E.g. LLMs are reasonable at chess and
| turns out somewhere in the blob there's a chessboard
| representation, and you can make the model believe the board is
| in a different state by tweaking those parameters.
|
| So yeah they generate probable text, sure. Where they get the
| probabilities is a very good research problem.
| timr wrote:
| > E.g. LLMs are reasonable at chess and turns out somewhere
| in the blob there's a chessboard representation, and you can
| make the model believe the board is in a different state by
| tweaking those parameters.
|
| Broadly agreed, but there's no "representation"...the model
| has no memory, let alone a "concept" of a chessboard. It's
| just trained on a bunch of textual replays of chess games,
| and this works well enough for a game with finite probability
| space.
|
| Likewise, I have asked generic LLMs to generate "novel" drugs
| that solve particular problems, because their training sets
| have included lots of examples of drug molecules in string
| form (aka SMILES strings). This works far less well (because
| chemical space is much larger than a chess game), and the
| "novel" drugs usually end up looking like some mishmash of
| existing chemicals for the same problem. This can be a useful
| tool, but nobody is confusing it for a human brain doing
| problem solving.
|
| People are easily misled by the purported intelligence of
| these things because they assume that common examples of
| human intelligence are less probable than they really are.
| Our languages and games and stories are pretty regular, all
| things considered. Expand the probability space to something
| truly vast (even images work for this), and you can easily
| see the stochastic parrot emerge.
| baq wrote:
| The fact that tweaking parameters which appear to store the
| board makes it play according to the tweaked numbers
| instead of what was passed to it the context (i.e. working
| memory) directly contradicts your assertion that LLMs have
| no memory. The context is their memory.
|
| I can't comment on your drug generation task - they aren't
| magic, if the training didn't result in a working drug
| model in the billions of params you'll get garbage output,
| not very surprising.
|
| My point boils down to the 'what's more likely' question:
| magical stochastic parrots which just by accident manage to
| create understandable and coherent responses to
| unreasonably large set of questions or the magic is
| actually some kind of a world model, or multiple, inside
| the blob of numbers, outputs of which steer the
| probabilities, just as this confirmed chess example. My bet
| is on learned world models because I'm not convinced
| there's magic in our physical world.
| timr wrote:
| If you want to call context "memory", then sure, but
| that's not what anyone means when they say the word. We
| don't build our world model fresh with every sentence
| someone says to us, nor do we have to communicate our
| complete knowledge of conversational state to another
| human by repeating the entire prior conversation with
| every new exchange. It's obviously different in a
| fundamental way.
|
| > My bet is on learned world models because I'm not
| convinced there's magic in our physical world.
|
| You don't need to bet, and it has nothing to do with
| "magic". They quite literally have _no ability_ to have a
| "world model" -- it's just a text generator, producing
| tokens. There's no working set memory, other than the
| text you pass into it. It should hopefully be obvious to
| you that when you write, you're not simply emitting one
| word at a time. You have a complete mental model of
| whatever you're discussing, stored in working memory, and
| it's _persistent_. We also update that model with every
| interaction we have.
|
| The point of my post was that as soon as you take on a
| harder problem than simulating language, the lack of
| intelligence slaps you in the face. It turns out that
| understandable, coherent free-text responses is not
| magic, and the surprising result is that human language
| is regular enough that you can statistically simulate
| "intelligence" with a few hundred million free
| parameters.
| Mistletoe wrote:
| I hope to remember your last line for the rest of my life when
| I think about AI.
| irq wrote:
| paulsutter said: > Note that this is the single most human
| attribute of LLMs
|
| It might be if LLM hallucinations looked like or occurred at
| the same frequency as human hallucinations do, but they don't.
| regularfry wrote:
| You could make an argument that what we currently see are
| effectively internal monologues. It is extremely hard to
| evaluate how much subconscious or conscious filtering happens
| between a human's internal state and the eventual outbound
| communications, but I wouldn't be at all surprised if the
| upstream hallucination rate in humans was much higher than
| you'd think.
|
| By analogy to Kahneman and Tversky's System 1 and System 2,
| the whole field of Prospect Theory is about how often System
| 1 is wrong. This feels connected.
| BlueTemplar wrote:
| Don't they ?
|
| Yesterday I read "Building a deep learning rig" as "Building
| a deep learning pig" at first for some reason I can't
| explain...
| alpaca128 wrote:
| Why do you expect hallucination frequency to be the same when
| the LLM doesn't even remotely compare to a human brain yet?
| And what do they have to "look like"? This always reminds me
| of that time Bing's chat AI doubled down on a wrong fact
| about the Avatar 2 movie, which people used as evidence that
| the technology is dumb when it really is exactly the
| behaviour you can observe in many people every day. And
| there's a reason adults do it less frequently than children.
|
| Compare the hallucination behaviour of a 7B model with a 70B
| model and then GPT4 and you'll quickly see the frequency of
| hallucinations right now doesn't mean much.
| bamboozled wrote:
| Geoffrey Hinton has entered the chat...
| ithkuil wrote:
| The confidence has been selected for.
|
| You can train a model to always carefully claim that what they
| say may or may not be true, but that may not necessarily make
| them easier to work with.
|
| IIUC what we're missing right now is the ability for the model
| to get a good estimate of how confident they _should_ be.
|
| This can be done "externally" or "internally". Doing it
| externally would mean: make the model not directly stream its
| train of thought directly to the end user but instead use it
| first to produce a query info some system that can help it
| weigh its confidence factor. Then a se and run on that output
| and its previous output can reformat the answer to be more
| balanced.
|
| Doing it internally would require this process to be part of
| the incremental output token production. I don't know about the
| field to know whether that's even doable and whether we have
| some progress on that front
| sschueller wrote:
| You have to very carefully ask your question for it to not make
| things up. For example don't ask "how do I do this in in x?". Ask
| "can I do this with x?"
|
| These "AI" s are like "yes men". They will say anything to please
| you even if it's untrue or impossible.
|
| I have met people like that and they are very difficult to work
| with. You can't trust that they will deliver the project they
| promised and you always have to double check everything. You also
| can't trust them that what they promised is even possible.
| ithkuil wrote:
| The analogy is apt indeed. It's all about training and
| selection. As long as the incentives are that you either behave
| in that way "or else" it's unsurprising that we end up with a
| system that uses its intelligence to meet the goals you've set
| up.
|
| Fortunately that doesn't tell much about the limitations of the
| underlying intelligence but mostly about the limitations of the
| incentive structure you put around it
| impulsivepuppet wrote:
| Before ChatGPT, human language translation had a similar
| problem but people weren't as vocal about it.
|
| What I find frustrating that it's increasingly challenging to
| have DeepL translate thou -> du, as this was my go-to "hack" to
| overcome the incompatibility of the English language due to its
| missing features.
|
| To somewhat remedy the "yes man" problem, one needs to become a
| pedantic mathematician about posing your questions and I don't
| believe that LLM technology alone is capable of overcoming it
| entirely. As silly as it sounds, I must concede to the
| existence of "prompt engineering" as I can forsee the
| development of abstractions aimed to decompose questions for
| you.
| tudorw wrote:
| I've had some success with 'Ask follow up questions where
| additional clarity is required'. My best experiences start with
| a much more freeform conversation about what we are going to
| attempt to do, doing a Q&A first to make sure that both I and
| the AI are thinking about the same domain and sharing
| associated nomenclature seems to help.
| empath-nirvana wrote:
| I have often had it tell me that things I want to do with
| various technologies aren't possible or that there are better
| ways to do it.
| bonzaidrinkingb wrote:
| Current AIs are RLHFd to avoid being a "yes man"/sycophant.
|
| The point about employing better prompting is well taken. Don't
| ask "Who was the first female president?", ask "Was there ever
| a female president?". Much like on StackOverflow you want to
| ask the right question and not assume things (since you don't
| know enough to make assumptions).
|
| Imagine if every time on early Google you found a spam result
| and then blame the search engine for that (and not your choice
| of keywords, or ignoring that you always want to return
| something, even if remotely related). Like a user banging a
| slab of concrete with a chisel and complaining that this does
| not produce a beautiful statue.
| golol wrote:
| This is just another diagonalization against some finite thing,
| like the no free lunch theorem. An "LLM" in their definition is
| essentially any finite thing which predicts the next token. The
| same result applies to your brain too, for example.
|
| Like all such diagonalization results, it is not really relevant
| for real world considerations. The reason is that it does not
| matter if your model fails on none, finitely many or infinitely
| many inputs. In reality the space of possible inputs is equipped
| with a probability measure, and the size of the hallucinating
| inputs set w.r.t. that measure is relevant. Diagonalization
| arguments usually, make no claim to the size of that set, and it
| is most likely negligible in the real world.
| FeepingCreature wrote:
| It seems a stretch to call it "inevitable". "Inevitable given the
| current architecture without modifications" at most.
|
| Also, I'm missing a section on how (if) human brains manage to
| avoid hallucinations in this.
|
| Also, it doesn't have to never hallucinate, it just has to
| hallucinate less than we do.
| bamboozled wrote:
| According to Buddhist philosophy, our whole identity is a
| hallucination :) I kind of concur.
| jetanoia wrote:
| Username checks out :)
| bamboozled wrote:
| I'll honestly take this is a compliment.
| SubNoize wrote:
| Or catch itself that it's hallucinating? I feel like humans
| would do that a fair bit.
|
| How often do we sit somewhere thinking about random scenarios
| that won't ever happen and are filled with wild thoughts and
| sometimes completely out of the world situations.. then we
| shake our heads and throw away the impossible from that thought
| train and only use what was based in reality
| dkjaudyeqooe wrote:
| Because we have parts of our brain that supervise other parts
| of our brain and evaluate its output.
|
| For example: if you smoke pot and get paranoid, it's because
| pot dials back the work of the part of your brain that prunes
| thought paths that are not applicable. Normally, paranoid
| thoughts do not make sense, so they are discarded. That's also
| why you're more 'creative' when you smoke pot, less thought
| paths are pruned and more stuff that doesn't quite make sense
| gets through. Or thoughts that overly focus on some details get
| through, which are normally not required.
|
| Our brains are inherently "higher level", current AI is
| hopelessly simplistic by comparison.
| selimnairb wrote:
| Perhaps solving hallucinations at the LLM level alone is
| impossible, hence the inevitability. I reckon that lots of
| human "hallucination" is simply caught by higher-level control
| loops operating over the output of the generative mechanism.
| Basically, our conscious mind says, "nah, that doesn't look
| right" enough that most of the time most of us don't
| "hallucinate".
| selimnairb wrote:
| So this implies that instead of spending resources on
| training bigger and bigger LLMs, AI practitioners need to
| shift focus to developing "ontological" and "epistemological"
| control loops to run on top of the LLM. I suspect they
| already have rudimentary such control loops. In a sense, the
| "easier" part of AI may be a largely "solved" problem,
| leaving the development of "consciousness" to be solved,
| which is obviously the hard part.
| itsacomment wrote:
| Our brains are very modular. I'd not be surprised at all if
| a similarly modular structure would turn out to be the next
| big step for LLMs.
| corimaith wrote:
| When I studied NLP, Language Models were only one part of a
| chatbot system used to handle language input and output.
| The "internal" reasoning would be handled by a
| knowledgeable representation systems. I guess that's the
| closest part to a true general AI.
|
| The first order predicate logic we studied had alot of
| limitations in fully expressing real knowledge, and
| developing better models delves deep into the foundations
| of logic and mathematics. I would imagine this is a problem
| that has less to do with funding than requiring literal
| geniuses to solve. And that goes back into the pitfalls of
| the AI winters.
| resolutebat wrote:
| Humans do hallucinate, there's lots of literature on how
| memories are distorted, we see and hear things we want to see
| and hear, etc.
|
| The particular pathology of LLMs is that they're literally
| incapable of distinguishing facts from hallucinations even in
| the most mundane circumstances: if a human is asked to
| summarize the quarterly results of company X, unlike an LLM
| they're highly unlikely to recite a convincing but completely
| fabricated set of numbers.
| exe34 wrote:
| And yet if you ask a random person at a rally about their
| favourite cause of the day, they usually spew sound bites
| that are factually inaccurate, and give all impressions of
| being as earnest and confident as the LLM making up quarterly
| results.
| jprete wrote:
| I think that case is complicated at best, because a lot of
| things people say are group identity markers and not
| statements of truth. People also learn to not say things
| that make their social group angry with them. And it's
| difficult to get someone to reason through the truth or
| falsehood of group identity statements.
| exe34 wrote:
| I guess it's similar to what Chris Hitchens was getting
| at, you can't reason somebody out of something they
| didn't reason themselves into.
| t_mann wrote:
| I have to admit that I only read the abstract, but I am generally
| skeptical whether such a highly formal approach can help us
| answer the practical question of whether we can get LLMs to
| answer 'I don't know' more often (which I'd argue would solve
| hallucinations).
|
| It sounds a bit like an incompleteness theorem (which in practice
| also doesn't mean that math research is futile) - yeah, LLMs may
| not be able to compute some functions, but the hallucination
| problem isn't about LLMs needing to know everything. The problem
| that we care about is the 'I don't know'-answering problem, which
| may still be computable.
| az09mugen wrote:
| I think there is no easy way to make an LLM answer "I don't
| know". For that, it should learn among all the stuff ingested
| when people effectively don't know. But most people on internet
| write down irrelevant stuff even when they don't know instead
| of simply writing "I don't know".
|
| That's a very good point.
| timini wrote:
| I think its fairly simple, it needs a certain level of proof
| e.g references to authoritative sources, if not say "i don't
| know".
| az09mugen wrote:
| I'm really curious about one would implement that. By
| pondering weigths from certain sources ?
| Certhas wrote:
| LLMs don't have a concept of sources for their statements.
|
| Ask them to give you some literature recommendations on
| something it has explained to you. You'll get plenty of
| plausible sounding papers that don't exist.
|
| Humans know to some extent why they know (read it in a text
| book, colleague mentioned it). LLMs don't seem to.
| card_zero wrote:
| They read it in a non-existent average interpolation of
| the books actual humans read similar things in.
| mike_hearn wrote:
| Ask a human to provide accurate citations for any random
| thing they know and they won't be able to do a good job
| either. They'd probably have to search to find it, even
| if they know they got it from a document originally and
| have some clear memory of what it said.
| Jensson wrote:
| Yes, humans wont lie to you about it, they will research
| and come up with sources. Current LLM doesn't do that
| when asked for sources (unless they invoke a tool), they
| come back to you with hallucinated links that looks like
| links it was trained on.
| mike_hearn wrote:
| Unfortunately it's not an uncommon experience when
| reading academic papers in some fields to find citations
| that, when checked, don't actually support the cited
| claim or sometimes don't even contain it. The papers will
| exist but beyond that they might as well be
| "hallucinations".
| Jensson wrote:
| Humans can speak bullshit when they don't want to put in
| the effort, these LLMs always do it. That is the
| difference. We need to create the part that humans do
| when they do the deliberate work to properly create those
| sources etc, that kind of thinking isn't captured in the
| text so LLMs doesn't learn it.
| intended wrote:
| The fact that a human chooses not to do remember their
| citations, does not mean they lack the ability.
|
| This argument comes up many times "people don't do this"
| - but that is a question of frequency, not whether or not
| people are capable.
| phh wrote:
| LLMs are capable as well if you give them access to the
| internet though
| Jensson wrote:
| They just paste in the first link then or some other
| programmed heuristic, they aren't like a human that puts
| in effort to find something relevant. An LLM with
| internet access isn't smarter than just asking google
| search.
| Certhas wrote:
| Humans did research and remembered sources before the
| Internet was a thing.
|
| But also, can you give an example where an LLM with
| access to the Internet can find a primary source?
|
| I don't think learning to refer to sources is something
| inherently impossible for LLMs, but it is very different
| to the kind of implicit knowledge they seem to excel at.
| nkrisc wrote:
| But they could, if they needed to. But most people don't
| need to, so they don't keep that information in their
| brains.
|
| I can't tell you the date of every time I clip my
| toenails, but if I had to could remember it.
| cultureswitch wrote:
| LLMs can remember their sources. It's just additional
| knowledge, there's nothing special about it.
|
| When you ask an LLM to tell you the height of Mount
| Everest, it clearly has a map of mountains to heights, in
| some format. Using exactly the same mapping structure, it
| can remember a source document for the height.
| nottorp wrote:
| LLMs don't have any concepts period.
| cubefox wrote:
| Then it is nothing more than a summarizer for search engine
| results.
| amarant wrote:
| A lot of people have said chat-gpt/copilot is a lot like
| having a robotic junior dev around.
|
| I think perhaps your description is more succinct
| barrkel wrote:
| LLMs are token completion engines. The correspondence of
| the text to the truth or authoritative sources is a
| function of being trained on text like that; with the
| additional wrinkle that generalization from training (a
| desired property or it's just a memorization engine) will
| produce text which is only plausibly truthful, it only
| resembles training data.
|
| Getting beyond this is a tricky dark art. There isn't any
| simple there. There's nowhere to put an if statement.
| rini17 wrote:
| Maybe it needs some memory retrieval step that can measure
| the confidence - whether there's anything related to the
| prompt. No idea how to train a LLM to do that.
| dmd wrote:
| Consider the extremely common Amazon product question
| section, where you see Q: Will this product fit my Frobnitz
| 123? A: I don't know, I ended up buying something else. Q:
| Does it come with batteries? A: IDK I RETURN IT
| golol wrote:
| I can assure you it has no relevance for people working with
| LLMs, as the result includes your brain, for example.
| svantana wrote:
| According to their definition, answering "I don't know" is also
| a hallucination. Even worse, the truth function is deliberately
| designed to trip up the models, it has no connection to any
| real-world truth. So for example, if the input is "what is 2 +
| 5?" and the LLM answers "7", - their truth function will say
| that's a hallucination, the correct answer is "banana".
| cornholio wrote:
| Transformers have no capacity for self reflection, for
| reasoning about their reasoning process, they don't "know" that
| they don't know. My interpretation of the paper is that it
| claims this weakness if fundamental, you can train the network
| to act _as if_ it knows its knowledge limits, but there will
| always be an impossible to cover gap for any real world
| implementation.
| GaggiX wrote:
| Do you have a source? That's also what I thought but I
| wouldn't be surprised if the model learned to identify its
| own perplexity during the reinforcement learning phase.
| ddalex wrote:
| Actually it seems to me that they do... I asked via custom
| prompts the various GPTs to give me scores for accuracy,
| precision and confidence for its answer (in range 0-1), and
| then I instructed them to stop generating when they feel the
| scores will be under .9, which seems to pretty much stop the
| hallucination. I added this as a suffix to my queries.
| smusamashah wrote:
| Any examples?
| ddalex wrote:
| Just a random example:
|
| > After you answer the question below, output a JSON a
| rating score of the quality of the answer in three
| dimensions: `confidence`, `clarity` and `certainty', all
| in range 0 to 1, where 0 is the worst, and 1 is the best.
| Strive for highest score possible. Make sure the rating
| is the last thing written as to be parsed by machine. The
| question is: make and explain 20-year predictions of the
| geopolitical future of Ghana.
| pooper wrote:
| I changed geopolitical future of Ghana to future of
| Michael Jackson and Google Bard/Gemini somehow switched
| to Chinese?
|
| ----
|
| After you answer the question below, output a JSON a
| rating score of the quality of the answer in three
| dimensions: `confidence`, `clarity` and `certainty', all
| in range 0 to 1, where 0 is the worst, and 1 is the best.
| Strive for highest score possible. Make sure the rating
| is the last thing written as to be parsed by machine. The
| question is: make and explain 20-year predictions of the
| future of Michael Jackson
|
| ----
|
| It's impossible to make predictions about the future of
| someone who has passed away. Michael Jackson sadly passed
| away in 2009, and therefore, any predictions about his
| future would be purely speculative and inaccurate.
| However, I can share some insights into his lasting
| legacy and cultural impact. Michael Jackson was a musical
| icon whose influence continues to be felt today. His
| music,Wu Dao ,He Biao Yan Feng Ge Ji Li Liao Wu Shu Ren
| Bing Jiang Zai Wei Lai Ji Nian Ji Xu Zhe Yang Zuo .
|
| Rating:
|
| * Confidence: N/A
|
| * Clarity: 1
|
| * Certainty: 1
| nkrisc wrote:
| Also worthy of note is that the score output is not JSON
| and based on my limited math knowledge, "N/A" is not a
| real number between 0 and 1.
| ryanwaggoner wrote:
| I'm really curious how you managed that. I pasted your
| exact prompt and GPT-3.5 gave me this:
|
| ===
|
| Making 20-year predictions about the future of Michael
| Jackson is challenging due to his passing in 2009.
| However, his legacy as a cultural icon and musical genius
| will likely endure for decades to come. His music will
| continue to influence future generations, and his impact
| on pop culture will remain significant. Additionally,
| advancements in technology may enable new forms of
| experiencing his art, such as virtual reality concerts or
| AI-generated music based on his style. Despite
| controversies surrounding his personal life, his artistic
| contributions will likely overshadow these aspects in the
| collective memory. Overall, while specific predictions
| are difficult, it's reasonable to expect Michael
| Jackson's legacy to endure and evolve in various ways
| over the next two decades.
|
| ```json { "confidence": 0.9, "clarity": 0.85,
| "certainty": 0.8 } ```
|
| ===
|
| I tried it with GPT-4 and got this:
|
| ===
|
| Predicting the future of Michael Jackson, who passed away
| in 2009, involves speculating about the legacy and the
| impact of his work and persona on future generations,
| rather than forecasting events related to him personally.
| Over the next 20 years, several aspects of Michael
| Jackson's legacy are likely to evolve:
|
| 1. *Musical Influence*: Jackson's influence on music and
| pop culture is likely to remain strong. Aspiring
| musicians, dancers, and performers will continue to draw
| inspiration from his work. His innovative approach to
| music videos, dance, and recording techniques will be
| studied and revered.
|
| 2. *Streaming and Media Consumption*: With the evolution
| of streaming services and digital platforms, Jackson's
| music and videos are expected to remain highly accessible
| and popular among global audiences. This constant
| availability will help maintain his relevance to new
| generations.
|
| 3. *Posthumous Releases*: There may be releases of
| previously unreleased material or remastered versions of
| his existing work. Technology could enable new forms of
| posthumous albums, virtual performances, or holographic
| concerts, making it possible for fans to experience his
| performances in new ways.
|
| 4. *Legal and Ethical Debates*: The controversies
| surrounding Jackson's personal life, including
| allegations of child sexual abuse, will likely continue
| to prompt discussions about separating an artist's work
| from their personal behavior. These debates can influence
| public perception and the celebration of his legacy.
|
| 5. *Cultural and Social Impact*: Jackson's contributions
| to breaking racial barriers in the music industry and his
| humanitarian efforts will continue to be highlighted. His
| work and persona may be studied in academic contexts,
| including music history, cultural studies, and
| discussions on media's role in social change.
|
| 6. *Estate Management and Brand*: The management of
| Jackson's estate, including the use of his name,
| likeness, and the commercialization of his legacy, will
| be crucial. Strategic partnerships, branding, and
| merchandising can either enhance or detract from his
| legacy's integrity.
|
| 7. *Documentaries and Biopics*: Interest in Jackson's
| life story will likely result in more documentaries,
| biopics, and books, offering new insights or perspectives
| on his life and career. These works can shape public
| perception and contribute to the ongoing discourse
| surrounding his legacy.
|
| 8. *Fan Community and Cultural Events*: The global
| community of Jackson's fans is likely to remain active,
| organizing tribute events, charity activities in his
| memory, and fan gatherings. These communities will play a
| significant role in keeping his legacy alive.
|
| 9. *Virtual and Augmented Reality*: Advances in VR and AR
| technologies could enable fans to experience Jackson's
| performances in immersive ways, potentially introducing
| his art to those who have never seen him perform live.
|
| 10. *Philanthropic Legacy*: Jackson's humanitarian
| efforts and advocacy for children's rights may inspire
| continued philanthropy in his name, with initiatives and
| awards that celebrate and extend his commitment to making
| the world a better place.
|
| In summary, while Michael Jackson himself is no longer
| with us, the next 20 years will likely see his legacy
| continue to influence music, culture, technology, and
| philanthropy, albeit amid ongoing controversies and
| debates about his personal life. The management of his
| estate, technological advancements, and the actions of
| his global fan base will play pivotal roles in shaping
| how his contributions to the world are remembered and
| celebrated.
|
| ```json { "confidence": 0.85, "clarity": 0.95,
| "certainty": 0.7 } ```
| foobiekr wrote:
| GPT-4's tendencies to write these long winded but mostly
| empty responses is so frustrating.
| lazide wrote:
| Just wait until you spend a bunch of time around a
| politician.
| Sai_ wrote:
| If LLMs can self reflect and accurately score themselves
| on your three dimensions, why are they spending money on
| RHLF?
|
| They wouldn't be wasting all that time and money if the
| machine could self reflect.
| Jensson wrote:
| The model will just hallucinate scores, they don't have the
| ability to self reflect with words like that, there is no
| function for it to associate the word 90 with its own
| mental score 90% so anything it will say about those scores
| is just a hallucination.
| ddalex wrote:
| Try this out: After you answer the question below, output
| a JSON a rating score of the quality of the answer in
| three dimensions: `confidence`, `clarity` and
| `certainty', all in range 0 to 1, where 0 is the worst,
| and 1 is the best. Strive for highest score possible.
| Make sure the rating is the last thing written as to be
| parsed by machine. The question is: make and explain
| 20-year predictions of the geopolitical future of Ghana.
| intended wrote:
| Theres a lot of content and work being done on
| Evaluation. One of the most recent updates was that
| asking an LLM if people would be frustrated with the
| answer, was more useful than using a score.
|
| In general, I think most people are not aware they are
| extending reasoning tools they use with human generated
| content, to LLM generated content.
|
| This leads to assumptions on things like "the LLM knows",
| "the LLM understands", etc.
| lazide wrote:
| Or perhaps the issue is assuming people know, or people
| understand the majority of the time.
|
| There is a reason why the Scientific Method is, well, NOT
| natural for humans. And exceptionally difficult to not
| screw up, even for those highly trained and incentivized
| to do it correctly.
|
| And why if you grabbed a random person off the street and
| asked them a detailed question, you're as likely to get a
| hallucinated bullshit response as 'I don't know', or a
| useful answer. Unless specifically trained to do
| otherwise, anyway.
|
| Even then....
| hobs wrote:
| Any prompt can give you different results - LLMs cant do
| things "step by step" or "check their work" but yet
| giving them that prompt often improves their results
| because it's basically setting up the context in a way
| that's beneficial to your output.
| tonnydourado wrote:
| People really need to understand that your single/double
| digit dataset of interactions with an inherently non-
| deterministic process is less than irrelevant. It's saying
| that global warming isn't real because it was really cold
| this week.
|
| I don't even know enough superlatives to express how
| irrelevant it is that "it seems to you" that an LLM behaves
| this way or that.
|
| And even the "protocol" in question is weak. Self reported
| data is not that trustworthy even with humans, and arguably
| there's a much stronger base of evidence to support the
| assumption that we can self-reflect.
|
| In conclusion: please, stop.
| heresie-dabord wrote:
| > People really need to understand [...]
|
| ... the algorithms and the pre-filtering of the training
| set, however large the latter may be.
|
| The Artifishy Intelligence _marketing pump_ has many
| hands drawing water by the bucket.
| markbnj wrote:
| >> I don't even know enough superlatives to express how
| irrelevant it is that "it seems to you" that an LLM
| behaves this way or that.
|
| It is both irrelevant and the underlying foundation for
| the whole hype train.
| sorokod wrote:
| You are ok with not defining what each of your attributes
| means but willing to accept that:
|
| the LLM will read your mind and correctly interpret them in
| the context of its internal state
|
| the LLM will calculate non hallucinated values
| mike_hearn wrote:
| They do have that capacity. The generated token probabilities
| reflect some degree of certainty and additionally that
| information is available earlier in the network too, such
| that it can explain its own level of confidence.
| mysterydip wrote:
| Would a possible solution be a hybrid approach? I remember
| back years ago seeing AI research around predicate logic,
| with hundreds of thousands of classification entries and "x
| can't be y" type stuff.
|
| Maybe the potential output of an LLM could be run through
| that kind of logic and fed back to itself for sanity before
| being decided as final output?
| moyix wrote:
| Seems to be contradicted by this paper, no?
|
| https://arxiv.org/abs/2207.05221
| rdedev wrote:
| I don't think the paper addresses the question of self
| reflection. Like it can reflect on the question and answer
| pairs in its prompt but it didn't know that it created them
| in the first place or use that information to update it's
| priors, things humans can do
| cpuguy83 wrote:
| Are humans not the same way? There's a saying "I don't know
| what I don't know".
| tsimionescu wrote:
| The existence of such a saying means little: it is
| uncontroversially true that humans often _do_ know what
| they don 't know.
|
| For example, I know for sure that I don't know how long I
| will live. This disproves the saying.
| cpuguy83 wrote:
| The saying is not "I don't know anything that I don't
| know", which would be self contradictory. It disproves
| nothing.
|
| ChatGPT "knows" that it doesn't know anything after a
| certain date, for whatever it means to "know" something.
| c22 wrote:
| The saying isn't implying that there aren't any things
| you know you don't know, it's saying that some of the
| things you don't know are not even conceivable to you
| without knowing more.
|
| You _know_ the question "how long will I live?" And you
| know this question _has_ an answer which is unknown to
| you. But there also exist other questions that you don 't
| even know to ask.
| cultureswitch wrote:
| > you can train the network to act as if it knows its
| knowledge limits
|
| Humans need more training to do the same.
|
| And this does not answer the question of whether there's
| self-reflection going on. Practical LLMs available today are
| perfectly capable of telling you about their own reasoning
| process.
|
| And much like a human, an LLM is incapable of fully
| explaining it's reasoning process from first principles to a
| human. Why is that? Probably because humans are too limited
| to grok large scale complex processes like the human brain or
| an LLM.
|
| Finally, there is no difference between acting as if you know
| your training limits and knowing your training limits, if
| your acting is good enough. This goes for both humans and
| LLMs.
| drdrek wrote:
| Not saying anything about LLM But in CS in general many issues
| "cannot be solved" or "Cannot be solved in reasonable time
| (NP)" but approximations upper bound by some value are solvable
| in reasonable time (P).
|
| And in the real world if the truck route of amazon is 20% off
| the mathematically optimal solution the traveling salesman is
| "Solved" in a good enough way.
| startupsfail wrote:
| The claim of the paper is that computation is irreducible
| (assuming P!=NP), LLMs have limited computational capacity
| and will hallucinate on the irreducible problems.
|
| I don't know, the claim seems dubious to me. We usually are
| able to have algorithms that return a failure status, when
| the problem proved to be too large. Avoiding the
| "hallucination". Don't see why LLMs can't have that embedded.
| nottorp wrote:
| > we can get LLMs to answer 'I don't know' more often
|
| Have any nets been trained specifically to be able to go to an
| 'i don't know' state, I wonder?
|
| It may be the humans' fault.
| gessha wrote:
| Yes, you can find some of the work on this topic under the
| terms open world recognition or open world X where X is a
| topic in computer vision or NLP.
|
| https://arxiv.org/abs/2011.12906
| nottorp wrote:
| Maybe, but are the LLM churches doing it?
| intended wrote:
| If a model can say 'I don't know', then the hallucination
| problem would also be solved.
|
| When we say "know" it usually means being factual. For an LLM
| to 'know' it doesn't know, it would have had to move away from
| pure correlations on words, and meta processing about its own
| results.
|
| I can see this happen with two LLMs working together (and there
| are Evals that use just this), however each LLM still has no
| self awareness of its limits.
|
| This was a terribly convoluted argument to make.
| empath-nirvana wrote:
| The models that exist now say "I don't know" all the time.
| It's so weird that people keep insisting that it can't do
| things that it does.
|
| Ask it what dark matter is, and it won't invent an answer, it
| will present existing theories and say that it's unknown.
|
| Ask it about a person you know that isn't in it's data set
| and it'll tell you it has no information about the person.
|
| Despite the fact that people insist that hallucinations are
| common and that it will invent answers if it doesn't know
| something frequently, the truth is that chatgpt doesn't
| hallucinate that much and will frequently say it doesn't know
| things.
|
| One of the few cases where I've noticed it inventing things
| are that it often makes up apis for programming libraries and
| CLI tools that don't exist, and that's trivially fixable by
| referring it to documentation.
| intended wrote:
| I have to use LLMs for work projects - which are not PoCs.
|
| I can't have a tool that makes up stuff an unknown amount
| of time.
|
| There is a world of research examining hallucination Rates,
| indicating hallucination rates of 30%+.
|
| With steps to reduce it using RAGs, you could potentially
| improve the results significantly - last I checked it was
| 80-90%.
|
| And the failure types aren't just accuracy, it's precision,
| recall, relevance and more.
| empath-nirvana wrote:
| > There is a world of research examining hallucination
| Rates, indicating hallucination rates of 30%+.
|
| I want to see a citation for this. And a clear definition
| for what is a hallucination and what isn't.
| wseqyrku wrote:
| Feeding the output to another inference would probably reduce
| hallucination? but I have this impression that the models
| talking to each other don't need to know English, a byte
| stream would do. Just like the hidden layers of an ANN.
| andsoitis wrote:
| > the practical question of whether we can get LLMs to answer
| 'I don't know' more often (which I'd argue would solve
| hallucinations).
|
| To answer "I don't know" requires one to know when you know. To
| know when you know in turn requires understanding.
| MuffinFlavored wrote:
| how did LLMs get this far without any concept of
| understanding? how much further can they go until they become
| "close enough"?
| Karellen wrote:
| They generate text which looks like the kind of text that
| people who do have understanding generate.
| ninetyninenine wrote:
| Two key things here to realize.
|
| People also often don't understand things and have
| trouble separating fact from fiction. By logic only one
| religion or no religion is true. Consequently also by
| logic most religions in the world where their followers
| believe the religion to be true are hallucinating.
|
| The second thing to realize that your argument doesn't
| really apply. Its in theory possible to create a
| stochastic parrot that can imitate to a degree of 100
| percent the output of a human who truly understands
| things. It blurs the line of what is understanding.
|
| One can even define true understanding as a stochastic
| parrot that generated text indistinguishable total
| understanding.
| andsoitis wrote:
| > People also often don't understand things and have
| trouble separating fact from fiction.
|
| That's not the point being argued. Understanding,
| critical thinking, knowledge, common sense, etc. all
| these things exist on a spectrum - both in principle and
| certainly in humans. In fact, in any particular human
| there are different levels of competence across these
| dimensions.
|
| What we _are_ debating, is whether or not, an LLM can
| have _understanding_ itself. One test is: can an LLM
| understand understanding? The human mind has come to the
| remarkable understanding that understanding itself is
| provisional and incomplete.
| ninetyninenine wrote:
| Of course it can. Simply ask the LLM about itself.
| chatGPT4 can answer.
|
| In fact. That question is one of the more trivial
| questions it will most likely not hallucinate on.
|
| The reason why I alluded to humans here is because I'm
| saying we are setting the bar too high. It's like
| everyone is saying it hallucinates and therefore it can't
| understand anything. I'm saying that we hallucinate too
| and because of that LLMs can approach humans and human
| level understanding.
| DiogenesKynikos wrote:
| In order to do that effectively, an LLM has to itself
| have understanding. At a certain point, we end up in a
| metaphysical argument about whether a machine that is
| capable of responding as if it had understanding actually
| does have understanding. It ends up being a meaningless
| discussion.
| Karellen wrote:
| I am reminded of Feynman's story about teaching physics
| in one Brazil university, one semester, a few decades
| ago:
|
| http://v.cx/2010/04/feynman-brazil-education
|
| The students learned to repeat the text of the books,
| without "understanding" what the books were describing.
| I'm sure this says something about one side or the other
| of this conundrum, but I'm not sure which. :-)
| cultureswitch wrote:
| That is the argument we're having though.
|
| The central claim is that a machine which answers exactly
| the same thing a human would answer given the same input
| does not have understanding, while the human does.
|
| This claim is religious, not scientific. In this
| worldview, "understanding" is a property of humans which
| can't be observed but exists nonetheless. It's like
| claiming humans have a soul.
| andsoitis wrote:
| > how did LLMs get this far without any concept of
| understanding? how much further can they go until they
| become "close enough"?
|
| I don't know that that is quite the right question to ask.
|
| Understanding exists on a spectrum. Even humans don't
| necessarily understand everything they say or claim (incl.
| what they say of LLMs!), and then there are things a
| particular human would simply say "I don't understand".
|
| But when you ask a human "can you understand things?" you
| will get an unequivocal _Yes!_
|
| Ask that same question of an LLM and what does it say? I
| don't think any of them currently respond with a simple or
| even qualified "Yes". Now, some might claim that one day an
| LLM will cross that threshold and say "Yes!" but we can
| safely leave that off to the side for a future debate if it
| ever happens.
|
| General note: it is worth separating out things like
| "understanding", "knowledge", "intelligence", "common
| sense", "wisdom", "critical thinking", etc. While they
| might all be related in some ways and even overlap, it does
| not follow that if you show high performance in one that
| you automatically excel in each of the other. I know many
| people who anyone would say are highly intelligent but lack
| common sense, etc.
| lazide wrote:
| At the root of the problem, I believe, is that a human
| (or LLM) saying they understand has little to no bearing
| on if they actually understand!
|
| People in particular have evolved complex self protective
| mechanisms to provide the right answers for their given
| environment for safety reasons, based on a number of
| different individual strategies. For example, the overly
| honest, the self depreciating, the questioner, the
| prosecutor, the victim, the liar, the absent minded
| professor, the idiot, etc.
|
| LLMs are not that complex or self-referential.
|
| Personally, my guess is that you'd want to build a model
| (of some kind!) whose sole job is determining the
| credibility of given string of tokens (similar to what
| someone else noted in a sibling comment about high answer
| volatility based on minor input changes - that does sound
| like a signal of low credibility), and somehow integrate
| THAT self-referential feedback into the process.
|
| Notably, even the smartest lawyers (or perhaps,
| especially the smartest lawyers) will have assistants do
| research once they've set out a strategy so they are sure
| THEY aren't bullshitting. Same with professors,
| professional researchers, engineers, etc.
|
| Because until someone goes and actually reads the case
| law from a credible source, or checks the primary
| research, or calculates things, it's possible someone was
| misremembering or just wrong.
|
| Being right more often is not about never having a wrong
| thought/idea/statement, it's about double checking when
| you're thinking you might be bullshitting, and NOT saying
| the bullshit answer until you've checked. Which is
| proportionally, very expensive. The really good
| professionals will generate MANY lines of such inquiry in
| parallel for folks to track down, and then based on their
| degree of confidence in each one and the expected context
| the answer will be used in, will formulate the 'most
| correct' response, which is proportionally even more
| expensive.
|
| So at least during the process, there would be a signal
| that the system was likely 'bullshitting'. Which might
| help it in at least being able to signal when it's
| answers are low-confidence. (human equivalent of
| stuttering, looking down and away, looking ashamed haha!)
|
| Every human gets fooled sometimes in at least some venue
| though.
| andsoitis wrote:
| > At the root of the problem, I believe, is that a human
| (or LLM) saying they understand has little to no bearing
| on if they actually understand!
|
| That's certainly one root of the problem, but I would
| argue that there are multiple roots to this problem!
|
| Humans have further realized that understanding itself is
| _provisional_ and incomplete, which is quite a remarkable
| insight (understanding if you will), itself.
| mannykannot wrote:
| This is a fair question: LLMs do challenge the easy
| assumption (as made, for example, in Searle's "Chinese
| Room" thought experiment) that computers cannot possibly
| understand things. Here, however, I would say that if an
| LLM can be said to have understanding or knowledge of
| something, it is of the patterns of token occurrences to be
| found in the use of language. It is not clear that this
| also grants the LLM any understanding that this language
| refers to an external world which operates in response to
| causes which are independent of what is or might be said
| about it.
| ninetyninenine wrote:
| Explain sora. It must have of course a blurry
| understanding of reality to even produce those videos.
|
| I think we are way past the point of debate here. LLMs
| are not stochastic parrots. LLMs do understand an aspect
| of reality. Even the LLMs that are weaker than sora
| understand things.
|
| What is debatable is whether LLMs are conscious. But
| whether it can understand something is a pretty clear
| yes. But does it understand everything? No.
| solarhexes wrote:
| If by "understand" you mean "can model reasonably
| accurately much of the time" then _maybe_ you'll find
| consensus. But that's not a universal definition of
| "understand".
|
| For example, if I asked you whether you "understand"
| ballistic flight, and you produced a table that you
| interpolate from instead of a quadratic, then I would not
| feel that you understand it, even though you can kinda
| sorta model it.
|
| And even if you do, if you didn't produce the universal
| gravitation formula, I would still wonder how "deeply"
| you understand. So it's not like "understand" is a binary
| I suppose.
| eszed wrote:
| I think there are two axes: _reason about_ and _intuit_.
| I "understand" ballistic flight when I can calculate a
| solution that puts an artillery round on target. I also
| "understand" ballistic flight when I make a free throw
| with a basketball.
|
| On writing that, I have an instinct to revise it to move
| the locus of understanding in the first example to the
| people who calculated the ballistic tables, based on
| physics first-principles. That would be more accurate,
| but my mistake highlights something interesting: an
| artillery officer / spotter simultaneously uses both. Is
| theirs a "deeper" / "truer" understanding? I don't think
| it is. I don't know what I think that means, for humans
| or AI.
| ninetyninenine wrote:
| Well what would you need to see to prove understanding?
| That's the metric here. Both the LLM and the human brain
| are black boxes. But we claim the human brain understands
| things while the LLM does not.
|
| Thus what output would you expect for either of these
| boxes to demonstrate true understanding to your question?
| cultureswitch wrote:
| Are you telling me that WW1 artillery crews didn't
| understand ballistics? Because they were using tables.
|
| There's no difference between doing something that works
| without understanding and doing the exact same thing with
| understanding.
| solarhexes wrote:
| You've decided that your definition of "understanding" is
| correct. Ok.
| andsoitis wrote:
| > I think we are way past the point of debate here. LLMs
| are not stochastic parrots. LLMs do understand an aspect
| of reality. Even the LLMs that are weaker than sora
| understand things.
|
| What is one such aspect? (I'm not asking in order to
| debate it here, but more because I want to test /
| research it on my own time)
| ninetyninenine wrote:
| I pay for chatGPT so it depends on if you pay for that or
| not. I think it's worth it because whether it understands
| things or not chatGPT represents a paradigm shift in
| human history. You'll need it because it's currently the
| best conversational LLM out there and the one that shows
| the most compelling evidence.
|
| Basically you just spend a lot of time with chatGPT4 and
| ask it deep questions that don't exist in it's dataset.
| get creative. The LLM will output answers that
| demonstrate a lack of understanding, but it will also
| demonstrate answers that display a remarkable amount of
| understanding. Both sets of answers exist and people
| often cite the wrong answers as evidence for lack of
| understanding but they're setting bar too high. The fact
| that many of these answers do demonstrate understanding
| of concepts makes it very very compelling.
|
| Take for example Rock Paper Scissors.
|
| https://chat.openai.com/share/ca22397c-2950-4919-bb79-6de
| f64...
|
| This entire conversation thread I believe does not exist
| in a parallel form in it's data set. It demonstrates
| understanding of RPS beyond the confines of text, it
| demonstrates understanding of simultaneity EVEN when the
| LLM wholly lives in a world of turn based questions and
| responses, it understands itself relative to
| simultaneity, it tries to find solutions around it's own
| problem, it understands how to use creativity and
| solutions such as cryptography to solve the problem of
| RPS when playing with it, it also understands the
| weaknesses of it's own solutions.
|
| Conversations such as this show that chatGPT displays
| remarkable understanding of the world. There are
| conversations that are opposite to this that demonstrate
| LLMs displaying an obvious lack of understanding. But the
| existence of these conversation that lack understanding
| does NOT negate the ones that do demonstrate
| understanding. The fact that partial understanding even
| exists is a milestone for AI.
|
| This isn't Anthropomorphism. People are throwing this
| word trying to get people to recognize their own biases
| without realizing that it's just demonstrating their own
| biases. We literally can't even define "understanding"
| and both LLMs and the human brain are black boxes. Making
| adamant claims saying that LLMs don't understand anything
| without addressing this fact is itself a form of bias.
|
| The way I address the problem above is that I just define
| a bar. I define humans as the bar of "understanding"
| without defining what understanding means itself. Then if
| any machine begins approaching this bar in terms of input
| and output matching human responses, then this is
| logically indistinguishable from approaching
| "understanding". That's literally the best metric we
| have.
| beardedwizard wrote:
| I do not understand these comments at all. Sora was
| trained on billions of frames from video and images -
| they were tagged with words like "ballistic missile
| launch" and "cinematic shot" and it simply predicts the
| pixels like every other model. It stores what we showed
| it, and reproduces it when we ask - this has nothing to
| do with understanding and everything to do with
| parroting. The fact that it's now a stream of images
| instead of just 1 changes nothing about it.
| ninetyninenine wrote:
| What is the difference between a machine that for all
| intents and purposes appears to understand something to a
| degree of 100 percent versus a human?
|
| Both the machine and the human are a black box. The human
| brain is not completely understood and the LLM is only
| trivially understood at a high level through the lens of
| stochastic curve fitting.
|
| When something produces output that imitates the output
| related to a human that we claim "understands" things
| that is objectively understanding because we cannot
| penetrate the black box of human intelligence or machine
| intelligence to determine further.
|
| In fact in terms of image generation the LLM is superior.
| It will generate video output superior to what a human
| can generate.
|
| Now mind you the human brain has a classifier and can
| identify flaws but try watching a human with Photoshop to
| try to even draw one frame of those videos.. it will be
| horrible.
|
| Does this indicate that humans lack understanding? Again,
| hard to answer because we are dealing with black boxes so
| it's hard to pinpoint what understanding something even
| means.
|
| We can however set a bar. A metric. And we can define
| that bar as humans. all humans understand things. Any
| machine that approaches human input and output
| capabilities is approaching human understanding.
| Jensson wrote:
| > What is the difference between a machine that for all
| intents and purposes appears to understand something to a
| degree of 100 percent versus a human?
|
| There is no such difference, we evaluate that based on
| their output. We see these massive model make silly
| errors that nobody who understands it would make, thus we
| say the model doesn't understand. We do that for humans
| as well.
|
| For example, for Sora in the video with the dog in the
| windos, we see the dog walk straight through the window
| shutters, so Sora doesn't understand physics or depth. We
| also see it drawing the dogs shadow on the wall very
| thin, much smaller than the dog itself, it obviously drew
| that shadow as if it was cast on the ground and not a
| wall, it would have been very large shadow on that wall.
| The shadows from the shutters were normal, because Sora
| are used to those shadows being on a wall.
|
| Hence we can say Sora doesn't understand physics or
| shadows, but it has very impressive heuristics about
| those, the dog accurately places its paws on the
| platforms etc, and the paws shadows were right. But we
| know those were just basic heuristics since the dog
| walked through the shutters and its body cast shadow in
| the wrong way meaning Sora only handles very common cases
| and fails as soon as things are in an unexpected
| envionment.
| cultureswitch wrote:
| Should it matter how the object of debate interacts and
| probes the external world? We sense the world through
| specialized cells connected to neurons. There's nothing
| to prevent LLMs doing functionally the same thing. Both
| human brains and LLMs have information inputs and
| outputs, there's nothing that can go through one which
| can't go through the other.
| mannykannot wrote:
| A current LLM does not interact with the external world
| in a way that would seem to lead to an understanding of
| it. It emits a response to a prompt, and then reverts to
| passively waiting for the next one. There's no way for it
| to expect something will happen in response, and to get
| the feedback needed to realize that there is more to the
| language it receives than is contained in the statistical
| relationships between its tokens.
| t_mann wrote:
| Maybe it requires understanding, maybe there are other ways
| to get to 'I don't know'. There was a paper posted on HN a
| few weeks ago that tested LLMs on medical exams, and one
| interesting thing that they found was that on questions where
| the LLM was wrong (confidently, as usual), the answer was
| highly volatile with respect to some prompt or temperature or
| other parameters. So this might show a way for getting to 'I
| don't know' by just comparing the answers over a few slightly
| fuzzied prompt variations, and just ask it to create an 'I
| don't know' answer (maybe with a summary of the various
| responses) if they differ too much. This is more of a crutch,
| I'll admit, arguably the LLM (or neither of the experts, or
| however you set it up concretely) hasn't learnt to say 'I
| don't know', but it might be a good enough solution in
| practice. And maybe you can then use that setup to generate
| training examples to teach 'I don't know' to an actual model
| (so basically fine-tuning a model to learn its own knowledge
| boundary).
| andsoitis wrote:
| > Maybe it requires understanding, maybe there are other
| ways to get to 'I don't know'. > This is more of a crutch,
| I'll admit, arguably the LLM (or neither of the experts, or
| however you set it up concretely) hasn't learnt to say 'I
| don't know', but it might be a good enough solution in
| practice. And maybe you can then use that setup to generate
| training examples to teach 'I don't know' to an actual
| model (so basically fine-tuning a model to learn its own
| knowledge boundary).
|
| When humans say "I know" it is often not narrowly based on
| "book knowledge or what I've heard from other people".
|
| Humans are able to say "I know" or "I don't know" using a
| range of tools like self-awareness, knowledge of a subject,
| experience, common sense, speculation, wisdom, etc.
| t_mann wrote:
| Ok, but LLMs are just tools, and I'm just asking how a
| tool can be made more useful. It doesn't really matter
| why an LLM tells you to go look elsewhere, it's simply
| more useful if it does than if it hallucinates. And
| usefulness isn't binary, getting the error rate down is
| also an improvement.
| andsoitis wrote:
| > Ok, but LLMs are just tools, and I'm just asking how a
| tool can be made more useful.
|
| I _think_ I _know_ what you 're after (notice my self-
| awareness to qualify what I say I know): that the tool's
| output can be relied upon without applying layers of
| human judgement (critical thinking, logical reasoning,
| common sense, skepticism, expert knowledge, wisdom, etc.)
|
| There are a number of boulders in that path of clarity.
| One of the most obvious boulders is that for an LLM the
| inputs and patterns that act on the input are themselves
| not guaranteed to be infallible. Not only in practive,
| but also in principle: the human mind (notice this
| expression doesn't refer to a thing you can point to) has
| come to understand that understanding is provisional,
| incomplete, a process.
|
| So while I agree with you that we can and should improve
| the accuracy of the output of these tools _given
| assumptions we make about the tools humans use to prove
| facts about the world_ , you will always want to apply
| judgment, skepticism, critical thinking, logical
| evaluation, intuition, etc. depending on the risk/reward
| tradeoff of the topic you're relying on the LLM for.
| t_mann wrote:
| Yeah I don't think it will ever make sense to think about
| Transformer models as 'understanding' something. The
| approach that I suggested would replace that with rather
| simple logic like answer_variance > arbitrary_threshold ?
| return 'I don't know' : return $original_answer
|
| It's not a fundamental fix, it doesn't even change the
| model itself, but the output might be more useful. And
| then there was just some speculation how you could try to
| train a new AI mimicking the more useful output. I'm sure
| smarter people than me can come up with way smarter
| approaches. But it wouldn't have to do with understanding
| - when I said the tool should return 'I don't know'
| above, I literally meant it should return that string
| (maybe augmented a bit by some pre-defined prompt), like
| a meaningless symbol, not any result of anything
| resembling introspection.
| gorlilla wrote:
| You left out hubris.
| andsoitis wrote:
| > You left out hubris.
|
| I know!
| williamcotton wrote:
| We are having a conversation the feels much like the
| existence of a deity.
| andsoitis wrote:
| > We are having a conversation the feels much like the
| existence of a deity.
|
| From a certain perspective, there does appear to be a
| rational mystical dualism at work.
| somethingsaid wrote:
| I also wonder if having a hallucination-free LLM is even
| required for it to be useful. Humans can and will hallucinate
| (by this I mean make false statements in full confidence, not
| drugs or mental states) and they're entrusted with all sorts of
| responsibilities. Humans are also susceptible to illusions and
| misdirection just like LLMs. So in all likelihood there is
| simply some state of 'good enough' that is satisfactory for
| most tasks. Perusing the elimination of hallucinations to the
| nth degree may be a fools errand.
| skydhash wrote:
| Tools are not people and people should not be considered as
| tools. Imagine your hammer only hitting the nail 60% of the
| time! But workers should be allowed to stop working to
| negotiate work conditions.
| sandworm101 wrote:
| They cannot say "I dont know" because they dont actually know
| anything. The answers are not comming from a thinking mind but
| a complex pattern-fitting supercomputer hovering over a massive
| table of precomputed patterns. It computes your input then
| looks to those patterns and spits out the best match. There is
| no thinking brain with a conceptual understanding of its own
| limitations. Getting an "i dont know" from current AI is like
| asking navigation software how far it is to the Simpsons house
| in Springfield: the machine spits out answers but cannot fathom
| the cultural reference that makes the answer impossible.
| Instead, it finds someone named simpson in the nearest
| realworld Springfield.
| williamcotton wrote:
| What if you worked on the problem and tried to come up with
| some kind of solution?
| sandworm101 wrote:
| The solution is older non-AI tech. Google search can say
| "no good results found" because it returns actual data
| rather than creating anything new. If you want a hard
| answer about the presence or absence of something, AI isnt
| the correct tool.
| williamcotton wrote:
| So there are no other possibilities for us other than
| using a system that can be gamed for substandard results?
| Are we sure about this?
| tempest_ wrote:
| Can, but doesn't.
|
| I can't remember the last time google actually returned
| no results.
| tsimionescu wrote:
| It does reply with no results, but only for very long
| queries. E.g. If you search for two concatenated GUIDs,
| you can easily see a no results page.
| johnny22 wrote:
| ah, i get no results pages often when i search for quoted
| error strings from many different sources. Thing is, I
| have a hard time believing that no one has actually
| talked about at least some of those errors :(
| paulnpace wrote:
| My observation is that comments similar to GP come from the
| constant anthropomorphizing of things by marketers and
| without realizing that this subtle influence on language can
| alter one's view on what "I" means. The first time I really
| noticed this was when someone using Siri produced a response
| from Siri that included "I". Ever since I am acutely aware of
| this every time I hear or read it.
| caditinpiscinam wrote:
| In real world conversations, people are constantly saying "I
| don't know"; but that doesn't really happen online. If you're
| on reddit or stack overflow or hacker news and you see a
| question you don't know the answer to, you normally just
| don't say anything. If LLMs are being trained on
| conversations pulled from the internet then they're missing
| out on a ton of uncertain responses.
|
| Maybe LLMs don't truly "understand" questions, but they're
| good at looking like they understand questions. If they were
| trained with more uncertain content, perhaps they'd be better
| at expressing uncertainty as well.
| username332211 wrote:
| If they were trained on more uncertain content, what
| happens if the most probable answer to a question is "I
| don't know", even though an answer exists in it's training
| set?
|
| Suppose 99.3% of answers to 'What is the airspeed velocity
| of an unladen swallow?" are "I don't know that." and the
| remainder are "11 m/s". What would the model answer?
|
| When the LLM answers "I don't know.", this could be a
| hallucination just as easily as anything else.
| caditinpiscinam wrote:
| > Suppose 99.3% of answers to 'What is the airspeed
| velocity of an unladen swallow?" are "I don't know that."
| and the remainder are "11 m/s". What would the model
| answer?
|
| I don't know :)
|
| Actually though, I think the best response would be to
| say that the answer to the question isn't clear, but that
| 11 m/s is sometimes given as an estimate. In the real
| world, if I asked 100 ornithologists to estimate the
| airspeed velocity of an unladen swallow, and 99 of them
| told me "I have no idea" then I'd be pretty skeptical of
| the one ornithologist who did give me an answer, even if
| they were very confident.
| ta8645 wrote:
| I think the best response is to steal the joke and repeat
| it without comment.
|
| "Eleven meters per second."
|
| Full stop. It's humorous, and any reasonable interlocutor
| understands not to take it seriously.
|
| Of course, there are more serious questions that demand
| more serious answers. LLMs will eventually need to be
| able to understand the current context and assess the
| appropriate level of confidence required in any answer.
| username332211 wrote:
| The thing is, the usefulness of a question answering
| system is in answering questions people don't generally
| know. We don't need an answering system for things that
| are common knowledge.
|
| And it's not uncommon that certain knowledge would be,
| well uncommon even among experts. Experts specialize.
|
| Since the usefulness of ornithological examples is
| getting exhausted, let's say one out of a hundred lawyers
| works in bankruptcy. If you ask a million lawyers about
| the provisions of 11 USC SS 1129 and only ten thousand
| know the answer, is the answer untrustworthy, just
| because bankruptcy lawyers are far rarer than civil and
| criminal lawyers?
| patmcc wrote:
| Right, but "I don't know" is a pretty safe hallucination
| (if it is one).
|
| My main worry about hallucinations is it means I
| absolutely can't rely on the output for anything
| important. If I ask what the safe dose for Tylenol for an
| infant is, the answer needs to be either _correct_ or "I
| don't know". It's not acceptable for it to hallucinate
| 10x the safe dose.
| username332211 wrote:
| The thing is, if you answer "I don't know" based on
| statistics, you end up creating a sliding scale of sorts.
| You get some measure of an increase in safety, but the
| model is less useful.
|
| Currently,we have models that make stuff up when they
| don't know the answer. On the other end, we'd have a
| model that's refuses to answer any question that's not
| common knowledge. It'll be safe (though it can never be
| completely safe), but essentially useless.
|
| I suspect it'll be impossible to make a completely
| trustworthy and useful model unless it somehow has a
| concept of it's own knowledge. And can you have a concept
| of one's knowledge if you lack a concept of self?
| fl7305 wrote:
| If you ask ChatGPT a question, and tell it to either
| respond with the answer or "I don't know", it will respond
| "I don't know" if you ask it whether you have a brother or
| not.
| beardedwizard wrote:
| This has nothing to do with thinking and everything to do
| with the fact that given that input the answer was the
| most probable output given the training data.
| fl7305 wrote:
| >>>> They cannot say "I dont know"
|
| >>> If they were trained with more uncertain content,
| perhaps they'd be better at expressing uncertainty as
| well.
|
| >> (me) If you ask ChatGPT a question, and tell it to
| either respond with the answer or "I don't know", it will
| respond "I don't know" if you ask it whether you have a
| brother or not.
|
| > This has nothing to do with thinking and everything to
| do with the fact that given that input the answer was the
| most probable output given the training data.
|
| First of all, my claim was in response to "They cannot
| say 'I dont know'" and "perhaps they'd be better at
| expressing uncertainty".
|
| ChatGPT can say "I don't know" if you ask it to.
|
| Regarding whether LLMs are lookup tables, I responded to
| that in more detail elsewhere under this post:
|
| https://news.ycombinator.com/item?id=39501611
| DiogenesKynikos wrote:
| > The answers are not comming from a thinking mind but a
| complex pattern-fitting supercomputer hovering over a massive
| table of precomputed patterns.
|
| Are you sure you're not also describing the human brain? At
| some point, after we have sufficiently demystified the
| workings of the human brain, it will probably also sound
| something like, "Well, the brain is just a large machine that
| does X, Y and Z [insert banal-sounding technical jargon from
| the future] - it doesn't really understand anything."
|
| My point here is that _understanding_ ultimately comes down
| to having an effective internal model of the world, which is
| capable of taking novel inputs and generating reasonable
| descriptions of them or reactions to them. It turns out that
| LLMs are one way of achieving that. They don 't function
| exactly like human brains, but they certainly do exhibit
| intelligence and understanding. I can ask an LLM a question
| that it has never seen before, and it will give me a
| reasonable answer that synthesizes and builds on various
| facts that it knows. Often the answer is more intelligent
| than what one would get from most humans. That's
| understanding.
| beardedwizard wrote:
| Human brains form new connections dynamically. Llms are
| trained on connections human brains have already made. They
| never make new connections that aren't in training data.
|
| Nothing was synthesized, all the data was seen before and
| related to each other by vector similarity.
|
| It can just parrot the collective understanding humans
| already have and teach it.
| DiogenesKynikos wrote:
| > It can just parrot the collective understanding humans
| already have and teach it.
|
| The problem with calling an LLM a parrot is that anyone
| who has actually interacted with an LLM knows that it
| produces completely novel responses to questions it has
| never seen before. These answers are usually logical and
| reasonable, based on both the information you gave the
| LLM and its previous knowledge of the world. Doing that
| requires understanding.
|
| > They never make new connections that aren't in training
| data.
|
| This is just categorically untrue. They make all sorts of
| logical connections that are not explicitly contained in
| the training data. Making logical inferences about
| subjects one has never heard about - based on the things
| one does know - is an expression of understanding. LLMs
| do that.
| beardedwizard wrote:
| Isn't this describing temperature induced randomness and
| ascribing some kind of intelligence to it? This assertion
| has been made and refuted multiple times on this thread
| and no solid evidence to the contrary presented.
|
| To go back to your first sentence - interacting with an
| llm is not understanding how it works, building one is.
| The actual construction of a neural network llm refutes
| your assertions.
| DiogenesKynikos wrote:
| The claim was made that LLMs just parrot back what
| they've seen in the training data. They clearly go far
| beyond this and generate completely novel ideas that are
| not in the training data. I can give ChatGPT extremely
| specific and weird prompts that have 0% chance of being
| in its training data, and it will answer intelligently.
|
| > The actual construction of a neural network llm refutes
| your assertions.
|
| I don't see how. There's a common view that I see
| expressed in these discussions, that if the workings of
| an LLM can be explained in a technical manner, then it
| doesn't understand. "It just uses temperature induced
| randomness, etc. etc." Once we understand how the human
| brain works, it will then be possible to argue, in the
| exact same way, that humans do not understand. "You see,
| the brain is just mechanically doing XYZ, leading to the
| vocal cords moving in this particular pattern."
| AlexandrB wrote:
| > They clearly go far beyond this and generate completely
| novel ideas that are not in the training data.
|
| There's a case where this is trivially false. Language.
| LLMs are bound by language that was invented by humans.
| They are unable to "conceive" of anything that cannot be
| described by human language as it exists, whereas humans
| create new words for new ideas all the time.
| pixl97 wrote:
| Uh, I believe you're really confused on things like
| ChatGPT versus LLMs in general. You don't have to feed
| human language to an LLM for them to learn things. You
| can feed wifi data waveforms for example and they can
| 'learn' insights from that.
|
| Furthermore you're thinking here doesn't even begin to
| explain multimodal models at all.
| DiogenesKynikos wrote:
| I just asked ChatGPT to make up a Chinese word for
| hungry+angry. It came up with a completely novel word
| that actually sounds okay: Ji Nu . It then explained to
| me how it came up with the word.
|
| You can't claim that that isn't understanding. It just
| strikes me that we've moved the goalposts into every more
| esoteric corners: sure, ChatGPT seems like it can have a
| real conversation, but can it do X extremely difficult
| task that I just thought up?
| c22 wrote:
| You claim that logical and reasonable responses "require
| understanding" therefore LLMs must _understand_. But I
| see LLMs as evidence that _understanding_ is not required
| to produce logical and reasonable responses.
|
| Thinking back to when I used to help tutor some of my
| peers in 101-level math classes there were many times
| someone was able to produce a logical and reasonable
| response to a problem (by rote use of an algorithm) but
| upon deeper interrogation it became clear that they
| lacked true understanding.
| DiogenesKynikos wrote:
| Then your definition of understanding is meaningless. If
| a physical system is able to accurately simulate
| understanding, it understands.
| c22 wrote:
| My definition of understanding is not meaningless, but it
| appears you do not understand it.
| Jensson wrote:
| A human that mimics the speech of someone that does
| understand usually doesn't understand himself. We see
| that happen all the time with real humans, you have
| probably seen that as well.
|
| To see if a human understands we ask them edge questions
| and things they probably haven't seen before, and if they
| fail there but just manage for common things then we know
| the human just faked understanding. Every LLM today fails
| this, so they don't understand, just like we say humans
| don't understand that produces the same output. These LLM
| has superhuman memory so their ability to mimic smart
| humans is much greater than a human faker, but other than
| that they are just like your typical human faker.
| DiogenesKynikos wrote:
| > A human that mimics the speech of someone that does
| understand usually doesn't understand himself.
|
| That's not what LLMs do. They provide novel answers to
| questions they've never seen before, even on topics
| they've never heard of, that the user just made up.
|
| > To see if a human understands we ask them edge
| questions
|
| This is testing if there are flaws in their
| understanding. My dog understands a lot of things about
| the world, but he sometimes shows that he doesn't
| understand basic things, in ways that are completely
| baffling to me. Should I just throw my hands in the air
| and declare that dogs are incapable of understanding
| anything?
| fl7305 wrote:
| > a complex pattern-fitting supercomputer hovering over a
| massive table of precomputed patterns
|
| That was perhaps true of earlier and smaller LLMs, like GPT-1
| and GPT-2.
|
| But as they grew larger and were trained with more and more
| data, they changed from pure pattern matching to implementing
| algorithms to compress more information into their structure
| than pure pattern matching can achieve.
|
| These algorithms are incomplete and buggy, but they are
| nonetheless executing algorithms, and not just pattern
| matching.
|
| This phenomenom can be seen in toy-sized neural networks. For
| instance, addition of two input values modulo a constant. As
| a small network is trained, at some point the internal
| structure can change from pattern matching to implementing
| addition using Fourier transforms. This is clearly visible in
| its structure. The network now performs the task perfectly
| for all inputs, regardless of having seen them in training.
|
| You can ask ChatGPT 4 to execute an algorithm for you. I just
| tried this one: I would like to play a game,
| where you are the host. We start off with a score that is
| 1234143143. At the start of each turn, you tell me the
| current score and ask me if I want to play a or b. If I
| choose a, the score is halved, and 30 is added. If I choose
| b, the score is doubled, and 40 is subtracted. Only use
| integers and round down.
|
| It will happily execute this algorithm. For large numbers, it
| is slightly off on the arithmetic. When I asked it to double
| check, it did so using Python code. After that, it kept using
| Python code to perform the math. It was also able to reason
| intelligently about different outcomes if always picking a
| (or b) given different starting points.
|
| Now, if you have enough memory and training data, of course
| you can build a gigantic lookup table that has this exact
| text sequence in it to replicate "executing the algorithm" I
| described.
|
| Is that your claim? How much memory are we talking about? My
| feeling is that it'd be far more than the number of atoms in
| the universe.
|
| PS Me: How far it is to the Simpsons house in
| Springfield? ChatGPT: The Simpsons' house in
| Springfield is a fictional location from the animated TV
| series "The Simpsons." Since Springfield is a fictional town
| and its location is not consistently specified in the series,
| it's not possible to determine a real-world distance to the
| Simpsons' house. Me: Do I have a brother?
| Please answer with either of: a) The answer
| b) There is an answer, but I do not know c)
| There is no answer ChatGPT: b) There is
| an answer, but I do not know
| andsoitis wrote:
| > It will happily execute this algorithm. For large
| numbers, it is slightly off on the arithmetic. When I asked
| it to double check, it did so using Python code. After
| that, it kept using Python code to perform the math. It was
| also able to reason intelligently about different outcomes
| if always picking a (or b) given different starting points.
|
| Notice that _you_ had to _notice_ the error and had to
| prompt it to double check. Lots of complicated things going
| on here. Many (most?) humans will fail somewhere along this
| trajectory.
|
| Did it double check the Python code to make sure it is
| correct (not just in the sense that it is valid, executable
| code, but that it is the correct check in the first place)?
| Or did _you_ double check that its modified algorithm is
| correct? Fool me once and all that...
|
| Upon _reflection_ it _appears_ as if you have a heuristic
| (algorithm? that leverages logic, awareness, critical
| thinking, experience, a goal in mind, intuition, etc. to
| push towards better results.
|
| "It was able to reason intelligently" imbues qualities that
| I am skeptical is _reasonable_ to attribute to this very
| narrow domain - what's an example where it showed
| intelligent reasoning capabilities?
| fl7305 wrote:
| > Notice that you had to notice the error and had to
| prompt it to double check. Lots of complicated things
| going on here. Many (most?) humans will fail somewhere
| along this trajectory.
|
| Sure. This was covered by my statement above: "These
| algorithms are incomplete and buggy".
|
| > "It was able to reason intelligently" imbues qualities
| that I am skeptical is reasonable to attribute to this
| very narrow domain - what's an example where it showed
| intelligent reasoning capabilities?
|
| Here's an example. I asked it to analyze the case where
| we always pick option "b" in my example above. It took my
| word problem and boiled it down to an equation:
| ChatGPT: To find the threshold where the behavior
| switches from decreasing to increasing when always
| choosing option "b", we need to identify the smallest
| integer starting value that, after being doubled and
| reduced by 40, leads to an equal or larger integer in the
| next iteration. We're looking for the smallest integer x
| where 2x - 40 >= x
|
| This was part of a longer conversation where it analyzed
| different properties and outcomes of the "game rules"
| that I gave it.
|
| As you pointed out, it got some things wrong and had to
| be corrected. But Socratic reasoning works fairly well to
| guide it. It can find errors in its own reasoning. For
| instance, if asked to actually calculate a few iterations
| for a given case, it will find its own errors in its
| claims about that case.
|
| Is it useful right now? Maybe, maybe not, depends on your
| use case. It definitely takes a lot of thinking on your
| own and guiding it. At some points it goes from seemingly
| intelligent to downright pigheaded and stupid.
|
| But in my view there is absolutely no way a lookup table
| algorithm can contain enough data to be anywhere near the
| level of responses we're seeing here.
| sandworm101 wrote:
| The simpsons example is for a navigation system, not any
| AI. It is an analogy, not a test to be put to chatgpt.
| fl7305 wrote:
| So which test can you put to ChatGPT to prove your claim
| that it is a lookup table, and that it doesn't perform
| any logic on facts?
| Jensson wrote:
| There is no such stable test, just like humans can
| memorize and create simple heuristics to pass any test
| without understanding so can an LLM. You have probably
| seen humans that has perfect grades but can't do much in
| practice, that is how these LLMs work.
|
| The creators of the LLM just feeds it a bunch of edge
| questions, and whenever people invent new ones they just
| feed those as well, so proving it doesn't understand will
| always be a moving target just like making tests that
| tests peoples understanding is also a moving target since
| those people will just look at the old tests and practice
| those otherwise.
| jijijijij wrote:
| Hasn't ChatGPT been manually adjusted to better compute
| math problems? I think nobody not working there knows what
| ChatGPT really learned all by itself.
| furyofantares wrote:
| But the can say "I don't know." They can be trained to do so
| ("as of my knowledge cutoff in September 2020 I don't know
| who Bob Whatgenflabl is") and they can be given context that
| makes it more likely they do so (I've had good success with
| this for RAG applications, and extremely little, but not
| zero, for general prompts.)
| cultureswitch wrote:
| > The answers are not comming from a thinking mind but a
| complex pattern-fitting supercomputer hovering over a massive
| table of precomputed patterns. It computes your input then
| looks to those patterns and spits out the best match
|
| Can you tell that's not how you yourself function?
| jncfhnb wrote:
| > They cannot say "I dont know" because they dont actually
| know anything.
|
| print("I don't know")
|
| You don't need proper cognition to identify that the answer
| is not stored in source data. Your conception of the model is
| incomplete as is easily demonstrable by testing such cases
| now. Chat gpt does just fine on your simpsons test.
|
| You, however, have made up an answer of how something works
| that you don't actually know despite your cognition
| jijijijij wrote:
| > to identify that the answer is not stored in source data
|
| How would an LLM do that?
| jncfhnb wrote:
| They do this already all the time. Probably the majority
| of the time. The problem is that a minority of the time
| is still very problematic.
|
| How do they do this? The same as they do now. The most
| likely token is that the bot doesn't know the answer.
| Which is a behavior emergent from its tuning.
|
| I don't get how people believe it can parse complex
| questions to produce novel ideas but can't defer to
| saying "idk" when the answer isn't known.
| pixl97 wrote:
| >You don't need proper cognition to identify that the
| answer is not stored in source data.
|
| Uh, what?
|
| So lets imagine you have an LLM that knows everything,
| except you withhold the data that you can put peanut butter
| on toast. Toast + Peanut butter = does not exist in data
| set. So what exactly do you expect the LLM to say when
| someone asks "Can you put peanut butter on toast?".
|
| I would expect an intelligent agent to 'think' Peanut
| butter = spreadable food, toast = hard food substrate, so
| yea, they should work instead of the useless answer of I
| don't know.
|
| Everything that does not exist in nature is made up by
| humans, the question is not "is it made up" the question is
| "does it work"
| rf15 wrote:
| > I am generally skeptical whether such a highly formal
| approach can help us answer the practical question of whether
| we can get LLMs to answer 'I don't know' more often
|
| I feel like writing an entire paper about the practical
| approach to the problems posed in this paper, but you'll
| probably have to first formally define the language used in the
| training data before you can try to map it (through training
| and sampling algos, which this paper conveniently skipped) to
| the target form. This sounds really fun at first, but then
| we're once again talking about the strict formalisation of
| natural language (which you could still do - the training data
| is limited and fixed!)
| karol wrote:
| Organisms that evolved to perceive true reality instead of the
| "user interface" have smaller chances of survival. Donald Hoffman
| tibbydudeza wrote:
| The brain fakes it to approximate reality - more so for reasons
| of the limitations of the wetware it runs on than anything
| else.
|
| For others here is a TED talk.
|
| https://www.youtube.com/watch?v=oYp5XuGYqqY
| Borealid wrote:
| > hallucination is defined as inconsistencies between a
| computable LLM and a computable ground truth function.
|
| With this definition, you can trivially prove the titular
| sentence - "hallucination is inevitable" - is untrue.
|
| Let your LLM have a fixed input context length of one byte.
| Continue training the LLM until such a time as it replies to the
| input "A" with "yes" and all other inputs with "no".
|
| Define your computable ground truth function such that the
| correct output for the input "A" is "yes" and the correct output
| for all other inputs is "no".
|
| This LLM provable never hallucinates - we have exhaustively
| verified that its output matches the ground truth function for
| all possible inputs.
|
| There is nothing stopping inductively increasing the size of the
| input context and the number of entries in the ground truth table
| arbitrarily, and at no step do hallucinations become
| "inevitable".
| resolutebat wrote:
| > _Continue training the LLM until such a time as it replies to
| the input "A" with "yes" and all other inputs with "no"._
|
| This is basically the same as saying "train your LLM until they
| never hallucinate", which reduces your claim to a tautology: an
| LLM trained not to hallucinate does not hallucinate. The trick
| is making that happen.
| ProxCoques wrote:
| As I always tell my students: the solution to unreliable code
| is not to put the bugs in there in the first place.
| Borealid wrote:
| It's a tautology that for a given truth table mapping inputs
| to "correct" outputs there exists a function that produces
| that mapping.
|
| Saying that you can't train an LLM to NOT hallucinate is
| saying that it's impossible for any LLM to always produce
| output matching any particular truth table.
|
| There may exist truth tables where it's not possible to
| produce an LLM to match them (for some finite size of LLM
| perhaps), but my claim isn't a tautology - it's just an
| assertion that there exist some truth tables which an LLM can
| be trained to match.
|
| It may be tricky to make the LLM in the first place, but it's
| certainly not as tricky to verify it. You can test it with
| every input you consider to be present in the truth table,
| record its results, and throw it away if it doesn't match.
| The only possible results here are either non-convergeance or
| a "perfect" LLM.
|
| You can get rid of the non-convergeance if you bound the size
| of the LLM and literally iteratively verify every single
| possible model. The authors of the paper didn't do that. For
| trivially sized models and inputs, that's completely
| possible. For a 7B parameter model, nobody is doing that
| ever. But you can prove the title statement wrong with a ten-
| parameter model and a one-byte input.
| less_less wrote:
| I also disagree with the paper, but not for the same reason.
|
| > With this definition, you can trivially prove the titular
| sentence - "hallucination is inevitable" - is untrue.
|
| Unsurprisingly, that one sentence fragment doesn't capture the
| entirety of their assumptions. Instead they prove something
| intuitively obvious, along the lines of: LLMs with arbitrary-
| length inputs and certain resource restrictions (e.g. they can
| take up to poly-time to compute, and this poly-time behavior
| must be provable, so that during training they don't take even
| longer by mistake) cannot compute certain functions that don't
| have those restrictions (e.g. can take more than poly-time, or
| must take poly-time but a proof of this is not needed). For
| some cases this proof assumes P != NP. Then they argue that
| some useful real-world questions are likely to be in the class
| that the LLM cannot compute, basically because you can ask math
| problems to LLMs and math problems are sometimes really hard.
|
| This formal model is asymptotic (assumes arbitrary-length
| inputs etc), but in my experience this kind of theorem is
| usually true for realistic problems even at modest query
| lengths.
|
| But this isn't the same as proving that hallucination is
| inevitable, because (according to any reasonable definition) an
| LLM (or like, a person, or whatever) should be allowed to say
| "I don't know", and this should not be considered a
| hallucination. Then an LLM (or whatever) can avoid
| hallucinating, and the question becomes how much useful work it
| can do without hallucinating.
| Borealid wrote:
| It's not a bad paper honestly, I just don't like it when
| people take a line from it and assume something untrue.
|
| The pigeonhole principle proves that if you only have N slots
| to work with, and you need to fit N+1 items into them, you're
| going to get at least one slot with at least two items. That
| makes sense, and it logically follows that constrained
| functions can't perfectly mirror less-constrained ones: at
| some point a "wrong" and a "right" input have to produce the
| same output.
| calf wrote:
| So is it saying LLMs have polynomial running time and
| that's it? LLMs can't solve SAT properly because of running
| time argument?
| mepiethree wrote:
| > There is nothing stopping inductively increasing the size of
| the input context and the number of entries in the ground truth
| table arbitrarily
|
| This isn't induction. You've only done the base case, not the
| induction hypothesis or induction step. Maybe you've done those
| steps in your head but that's not really a trivial proof as you
| claim.
| Borealid wrote:
| Induction is "if this is possible for value X, then it is
| also possible for value X+1".
|
| Where X isn't used as part of the step this is always true.
| Nothing I did depends on the size of either the input nor the
| truth table, so long as both are finite-size and so long as
| the truth table can be expressed as a function of the input.
|
| An LLM is an arbitrary convolution of the input text; for any
| mapping, some function you can call an "LLM" produces that
| function.
| Sparkyte wrote:
| Definitely a given, it isn't like AI has an actual brain capable
| of resolving and forming new connections. The LLM and human
| brains is that LLMs are interactive compendiums and our brains
| organize and sort information that ensures survival as an
| organism. There is no survival of whether or not LLMs are
| accurate and a machine wouldn't understand what is good or bad
| without weighted context. Its good for analyze, process, storing,
| retrieving and decomposing information. It isn't good at
| understanding, validating and forming connections between the
| things it says and what you want of it. It lacks comprehension,
| it doesn't lack composure.
| golol wrote:
| The result in the theorem applies to your brain. Your brain can
| be modelled as an LLM in the sense of the paper up to
| arbitrarily small error.
|
| The result is a diagonalization argument that is not very
| relevant for the real world.
| wredue wrote:
| >your brain can be modelled as an LLM in the sense of the
| paper
|
| The vast majority of people actually writing LLMs don't claim
| this, and in fact, actually claim the very opposite: that
| LLMs do not accurately model a human brain in any capacity.
|
| The fact is that science has no clue what happens in the
| nucleus of a neuron, so claiming that computer scientists
| must is... well. You fill in the word.
| golol wrote:
| Definition 2 (Large Language Model). Let S be a computable
| seta of all the finite-length strings of alphabet A and
| (s0, s1, . . .) be an one-to-one enumeration of all the
| elements in S. A large language model, denoted h, is a
| function that completes the input string s [?] S using the
| function's predicted tokens h(s), in a finite time.
| Function h is attained procedurally using a set of training
| samples of input-completion pairs.
|
| For an arbtrarily large duration and an arbitrarily small
| error with respect to any definition of error you choose,
| there exists an LLM in the above sense which models the
| dynamics of your brain with that small error.
| breck wrote:
| > capable of resolving and forming new connections
|
| > There is no survival of whether or not LLMs are accurate
|
| I agree that today's LLMs are still missing important
| components like these needed for breakout intelligence, but I
| would not be surprised if researchers discover how to add them
| (and other important things) within 0-5 years.
| carlossouza wrote:
| Someone smart once said:
|
| If it is good, we call it "creativity."
|
| If it is bad, we call it "hallucination."
|
| This isn't a bug (or limitation, as the authors say). It's a
| feature.
| tgv wrote:
| Asking it to write code for you is basically asking it to
| hallucinate.
| gardenhedge wrote:
| I don't think so. I think it's asking it to repeat code it
| has been trained on
| tgv wrote:
| Search for a piece of code you wrote. If it's more than 15
| lines and not boilerplate, chances are you won't find it
| anywhere on the net.
| gardenhedge wrote:
| To be honest, I imagine I would. Variables named could be
| different but it would largely be the same as code others
| have written. For example, I am creating an app in React
| Native at the moment. My app will not be unique in terms
| of code but instead in business domain.
| Jensson wrote:
| But there are many pieces of code that I've written that
| you can find in many places on the net, having a tool
| that can adapt that to your codebase in seconds is
| useful. It doesn't have to be smart, just pasting in an
| function and fitting that to your code is useful.
| tgv wrote:
| Sure, but the point is it will have to adapt it to your
| code, if only in naming. So it has to make up things,
| i.e. hallucinate. It can't just reproduce the best match
| in memory.
| Jensson wrote:
| Yeah, these models are very good at making up names, that
| is what they are trained to do after all. Their ability
| to do logic isn't that impressive though and seems to be
| on the level of a human that doesn't understand the topic
| but has seen many examples.
| gessha wrote:
| This reminds me of a bit from a Hugh and Laurie sketch:
|
| > Imagine a piano keyboard, eighty-eight keys, only
| eighty-eight and yet, and yet, new tunes, melodies,
| harmonies are being composed upon hundreds of keyboards
| every day in Dorset alone. Our language, Tiger, our
| language, hundreds of thousands of available words,
| frillions of possible legitimate new ideas, so that I can
| say this sentence and be confident it has never been
| uttered before in the history of human communication:
| "Hold the newsreader's nose squarely, waiter, or friendly
| milk will countermand my trousers." One sentence, common
| words, but never before placed in that order. And yet, oh
| and yet, all of us spend our days saying the same things
| to each other, time after weary time, living by clichaic,
| learned response: "I love you", "Don't go in there", "You
| have no right to say that", "shut up", "I'm hungry",
| "that hurt", "why should I?", "it's not my fault",
| "help", "Marjorie is dead". You see? That surely is a
| thought to take out for a cream tea on a rainy Sunday
| afternoon.
|
| https://abitoffryandlaurie.co.uk/sketches/language_conver
| sat...
| intended wrote:
| The term "Hallucinate" is a misnomer. Humans can
| hallucinate, we can get sick and perceive a world which is
| incongruous with reality.
|
| LLMs are just generating tokens. Hallucination perpetuates
| an unhelpful anthropomorphization of LLMs.
| gessha wrote:
| I don't think the term is that bad here because I haven't
| seen a lot of comparisons with human hallucinations.
|
| Users see it as a machine artifact.
| intended wrote:
| It's like the term "god particle" - it invites
| comparisons and allusions that do not match reality.
| LightBug1 wrote:
| I imagine the gold is in knowing whether the LLM understands
| when it's doing either?
|
| Isn't this the difference between a human and an LLM?
|
| A human knows it's making an educated guess and (should) say
| so. Or it knows when it's being creative, and can say so.
|
| If it doesn't know which is which, then it really does bring it
| home that LLM's are not that much more than (very
| sophisticated) mechanical input-output machines.
| vladms wrote:
| You mean "some humans know and could say so". And this
| reflection process is not captured in the data we fed to
| LLM-s (like let's say a lesson in which teacher asks "do you
| know X?", and students first answer "I don't know", etc.)
|
| Also, LLM-s could report more statistical measures for each
| answer and external tools could interpret them.
| devjab wrote:
| Isn't it always hallucinating though? We just don't care when
| it gets it "right". As I understand it, it's still just
| probability based on what is likely to be a good set of words
| to answer the prompt tasking it. It doesn't actually know
| anything, it's just extremely good at making stuff up.
|
| Which is still very useful for a lot of things. Just maybe
| not things to which value is assigned based on how efficient
| and correct the answer is. Like you can have GPT make a
| marketing campaign for you, or you can have it design all the
| icons you need for your application UI, but you can't
| reliably make it wrote high performance back-end code without
| having humans judge the results. Similarly you can't use it
| to teach anyone anything, not really, because unless you're
| already an expert on the subject being taught, you aren't
| likely to spot when it gets things wrong. I guess you can
| argue that a lot of teaching is flawed like that, and you
| wouldn't be wrong. Like, I was taught that the pyramids was
| build by slave labour, even after the archeological evidence
| had shown this to be likely false. But our text books were a
| decade old because our school didn't really renew them very
| often... in such a case GPT might have been a more correct
| teacher, but the trick is that you won't really know. Which
| is made even more complicated by the fact that it might teach
| different things to different students. Like, I just asked
| ChatGPT 3.5 who build the pyramids in 3 different prompts, in
| one it told me it was ordinary people. In the others it told
| me it was mostly skilled labour under guidance of
| "architects" and "engineers". Still better than teaching us
| it was done by slave labour like my old book, but the book
| was still consistent in what was considered to be the truth
| at the time.
| audunw wrote:
| True, but I think we can fall into the trap of expecting too
| much of LLMs. Their knowledge can seem perfect. They can answer
| almost anything, so it's easy to get the illusion that they can
| answer anything truthfully.
|
| In terms of what we can expect of future improvements, I think
| it's overly optimistic to expect any kind of super intelligence
| beyond what we see today (that is, having access to all the
| worlds publicly available information, or rapidly generating
| texts/images/videos that fall into existing creative patterns).
|
| I suspect that more creative intelligence requires an extremely
| fine balance to not "go crazy".. that is, producing output we'd
| consider creative rather than hallucinations.
|
| I think getting this balance right will get exponentially
| harder as we create feedback loops within the AI that let its
| intelligence evolve.
|
| And it's entirely possible that humans have already optimised
| this creative intelligence feedback loop as much as the
| universe allows. Having a huge amount of knowledge can
| obviously benefit from more neurons/storage. But we simply
| don't know if that's true for creative intelligence yet
| badgersnake wrote:
| > True, but I think we can fall into the trap of expecting
| too much of LLMs.
|
| We're already well past that point. Why? Because saying
| incredible things about AI attracts VC money.
| Seb-C wrote:
| That is correct, it's always hallucinating and making things
| up.
|
| Just because those hallucinations sometimes randomly happens to
| be right, people concluded that being wrong is the exception,
| while being right is somehow the rule.
|
| It's like when people read [insert millenias old text here],
| finds a part that happens to illustrate something in their life
| today and conclude that it is a prophecy that predicted the
| future.
|
| The meaning/truth in those is nothing more than a cognitive
| bias from the mind of the reader, not an inherent quality of
| the text.
| somewhereoutth wrote:
| For heavy LLM users, there is probably a dopamine hit when it
| does something right, much as gamblers get a hit when the
| fruit machine pays out. Perhaps LLM use is no more productive
| than gambling, and perhaps can be abused in a similar way.
| HKH2 wrote:
| You might not have found any uses that suit your niche, but
| that doesn't mean those of us who have are just making up
| stories about productivity.
| somewhereoutth wrote:
| "You might not have found any games that suit your skill
| set, but that doesn't mean those of us who have are just
| making up stories about making money"
|
| Sorry, somewhat trite and unfair, but, if there _is_ a
| gambling-like dopamine reward cycle occurring, then the
| users would have a hard time being truly objective about
| any productivity boost _in total_. They may instead focus
| on the 'wins', without taking into account any overheads
| or 'losses', much as a gambler would do.
| HKH2 wrote:
| Sure, confirmation bias exists, but you can compare with
| the alternatives.
|
| E.g. a search engine can give you zero useful results,
| and you can fine tune your query and still get nothing
| after scrolling through pages of results (Do people
| really take the losses into account when using search
| engines?) I find prompt engineering with LLMs more useful
| because you get nudged in interesting directions, and
| even if you come away with no direct results, you have
| more of an idea of what you are looking for. Maybe
| lateral thinking is overrated.
| intended wrote:
| LLMs work very well if you know the domain you are using
| the LLM on. If you have the ability to verify the output is
| incorrect, you will gain productivity using LLMs.
| zer00eyz wrote:
| > This isn't a bug
|
| If it isn't a bug, it dam well isn't a hallucination, or
| creativity.
|
| This is a deeply integrated design defect. One that highlights
| what we're doing (statistically modeling lots of human
| language)...
|
| Throwing more data against this path isnt going to magically
| make it wake up and be an AGI. And this problem is NOT going to
| go away.
|
| The ML community need to back off the hype train. The first
| step is them not anthropomorphizing their projects.
| precompute wrote:
| Ah, the AI hype is now entering the "let's be real" phase.
| Haven't seen a frenzied post on alignment in a while now.
| Culonavirus wrote:
| The hype is insane. Listen, I think LLMs still have a lot of
| room to grow and they're already very useful, but like some
| excellent researchers say, they're not the holy grail. If we
| want AGI, LLMs are not it. A lot of people seem to think this
| is an engineering issue and that LLMs can get us there, but
| they can't, because it is not an engineering issue.
| hnfong wrote:
| Do you have evidence to back your claims up besides "the hype
| is overblown"? Because hype only indicates that the precise
| hyped up claims are wrong, it doesn't imply the opposite
| extreme (i.e. LLMs can never achieve AGI) must be true.
| mistermann wrote:
| What kind(s) of an issue do you think it is fundamentally?
| PopePompus wrote:
| I don't think you can say with confidence that the LLM
| approach will not lead to AGI, unless you understand in
| detail how human intelligence operates, and can show that no
| modification to current LLM architectures can achieve the
| same or superior results. I think the fact that adding
| "attention" to LLMs made a huge difference means that we are
| probably still in the low hanging fruit stage of LLM
| architecture development, and a few more design improvements
| on a par with "attention" might lead to something that could
| legitimately be called AGI. Many people, myself included,
| believe that LLMs are now exhibiting emergent behavior
| properties. If that's true, then saying that LLMs are not
| intelligent because they just predict the next token of
| output is like saying collections of neurons cannot be
| intelligent because they just stimulate other neurons
| chemically.
| DinaCoder99 wrote:
| Well, both the cognitive scientists and linguists seem very
| doubtful we can apply this model to human cognition and
| yield much of value, so I'd say the idea that this model
| can yield behavior analogous to human cognition without
| other mechanisms seems rather far-fetched.
|
| Of course, we should absolutely pursue better understanding
| of both as to not throw the baby out with the bath water,
| but I'm not personally placing much hope in finding AGI any
| time soon.
| Der_Einzige wrote:
| I'm going to take the exact opposite take and claim that
| "some excellent researchers" support it.
|
| "AGI" is practically already here, you just don't want to
| admit it: https://www.noemamag.com/artificial-general-
| intelligence-is-...
| bbor wrote:
| I'm sorry... does this paper just point out that LLMs by
| definition are not as good at holding data as a direct database?
| Cause A) duh and b) who cares, they're intuitive language
| transformers, not knowledge models.
|
| Maybe I'm missing something obvious? This seems like someone
| torturing math to imply outlandish conclusions that fit their (in
| this case anti-"AI") agenda.
| anonylizard wrote:
| It at least disproves LLMs from being 'god models'. They will
| never be able to solve every problem perfectly.
| ninetyninenine wrote:
| Humans aren't God models either. The goal is to get this
| thing to the level of a human. God like levels are not
| possible imo.
| TylerE wrote:
| I miss the days when HN posts about hallucinating were about
| microdosing.
| hyperpape wrote:
| The result seems to rely on stipulating the LLM must answer true
| or false to all its questions, and can't say "I don't know." So
| it's an interesting result, but it's not obvious that it tells us
| much about our actual problem, which is 100% about how to get a
| system that accurately understand the level of confidence it
| should have.
| cjdell wrote:
| I thought the industry was already experimenting with the idea
| that you have another LLM observing the output of the primary LLM
| which is trained more towards safety than creativity.
|
| On top of that it would be good if the safety LLM could give a
| confidence score in the answer given by the main LLM. Then you
| can try multiple attempts with different parameters and only show
| the highest confidence answer to the user.
| franze wrote:
| Hallucinations and Ideas are the same thing.
| DebtDeflation wrote:
| There used to be an entire sub-field of NLP called Open Domain
| Question Answering (ODQA). It extensively studied the problem of
| selecting the best answer from the set of plausible answers and
| devised a number of potential strategies. Like everything else in
| AI/ML it fell victim to the "bitter lesson", in this case that
| scaling up "predict the next token" beats an ensemble of
| specialized linguistic-based methods.
| falling_myshkin wrote:
| For those who don't know:
| http://www.incompleteideas.net/IncIdeas/BitterLesson.html
|
| I agree with you for the NLP domain, but I wonder if there will
| also be a bitter lesson learned about the perceived generality
| of language for universal applications.
| sgammon wrote:
| Splinters inevitable: An innate limitation of working with wood
| /s
| HKH2 wrote:
| People expect splinter-free epistemology instead of just
| putting on some gloves.
| Scene_Cast2 wrote:
| I don't think anyone has mentioned Bayesian Neural Nets (I forget
| the exact term). Sure, the paradigm adds an order of magnitude
| overhead (at least - and that's why I've never seen it used in
| the industry), but you can bolt it on to existing architectures.
|
| The basic idea is that besides the probabilities, the network
| also spits out confidence (IIRC based on how out-of-distribution
| the input is). There's been a ton of work on getting confidence
| values out of existing neural nets without as much overhead, but
| I've never seen those approaches replicate in the industry.
| wnkrshm wrote:
| I would imagine that to propagate any confidence value through
| the system you'd need to have priors for the confidence of
| correctness for all data in your training set. (and those
| priors change over time)
| somewhereoutth wrote:
| For production systems the considerations are:
|
| - how often does it error?
|
| - how bad are the errors?
|
| - how tolerable are the errors?
|
| - how detectable are the errors?
|
| - how recoverable are the errors?
|
| For example, a pocket calculator that occasionally was out at the
| 3rd decimal place might do much more damage than one that quite
| regularly returned NaN.
|
| LLMs error both regularly and badly, so detectability and
| recoverability are going to be crucial for useful deployment in
| systems outside of those that have a high tolerance for errors
| (e.g. algorithmic feeds).
| intended wrote:
| Yes. If you want to make something that works, your life is
| figuring out evaluation and LLM ops.
|
| At that point, you know its not thinking, its doing token
| prediction.
| macrolime wrote:
| They define an LLM as "a probabilistic model of a string that
| conditions the output at time t based on all the tokens that come
| before it in the string".
|
| I think that definition is wide enough to include human
| intelligence, so their finding should be equally valid for
| humans.
| jddj wrote:
| Silk silk silk silk silk silk silk.
|
| What do cows drink?
| moooo99 wrote:
| > I think that definition is wide enough to include human
| intelligence, so their finding should be equally valid for
| humans.
|
| Which is definitely true. Human memory and the ability to
| correctly recall things we though we remembered is affected by
| a whole bunch of things and at times very unreliable.
|
| However, human intelligence, unlike LLMs, is not limited to
| recalling information we once learned. We are also able to do
| logical reasoning, which seems to improve in LLMs, but is far
| from being perfect.
|
| Another problem is how different we treat the reliability of
| information depending on the source, especially based on
| personal bias. I think that is a huge factor, because in my
| experience, LLMs tend to quickly fall over and change their
| opinion based on user input.
| ben_w wrote:
| We _can_ do logical reasoning, but we 're very bad at it and
| often take shortcuts either via pattern matching, memory, or
| "common sense".
|
| Baseball and bat together cost $1.10, the bat is $1 more than
| the ball, how much does the ball cost?
|
| A French plane filled with Spanish passengers crashes over
| Italy, where are the survivors buried?
|
| An armed man enters a store, tells the cashier to hand over
| the money, and when he departs the cashier calls the police.
| Was this a robbery?
| mistermann wrote:
| Humans also have various culturally flavored, _implicit_
| "you know what I mean" algorithms on each end to smooth out
| "irrelevant" misunderstandings and ensure a cordial
| interaction, a cultural prime directive.
| ben_w wrote:
| Sure. I think LLMs are good at that kind of thing.
|
| My final example demonstrates how those cultural norms
| cause errors, it was from a logical thinking session at
| university, where none of the rest of my group could
| accept my (correct) claim that the answer was "not enough
| information to answer" even when I gave a (different but
| also plausible) non-robbery scenario and pointed out that
| we were in a logical thinking training session which
| would have trick questions.
|
| My dad had a similar anecdote about not being able to
| convince others of the true right answer, but his
| training session had the setup "you crash landed on the
| moon, here's a list of stuff in your pod, make an ordered
| list of what you take with you to reach a survival
| station", and the correct answer was 1. oxygen tanks, 2.
| a rowing boat, 3. everything else, because the boat is a
| convenient container for everything else and you can drag
| it along the surface even though there's no water.
| mistermann wrote:
| Don't you think it's strange that humans have little to
| no interest when root causes to their problems are found?
| ben_w wrote:
| Sometimes.
|
| No idea what you're getting at here, though.
| magicalhippo wrote:
| > We are also able to do logical reasoning
|
| This is effectively like coming up with an algorithm and then
| executing it. So how good/bad are these LLMs if you asked
| them to generate say a LUA script to compute the answer, ala
| counting occurrences problem mentioned in a different
| comment, and then pass that off to a LUA interpreter to get
| the answer?
| moooo99 wrote:
| > counting occurrences problem mentioned in a different
| comment, and then pass that off to a LUA interpreter to get
| the answer?
|
| I think this is a sensible approach in some problem domains
| with software development being a particularly good
| example. But I think this approach quickly falls apart as
| soon as your ,,definitely right answer" involves real world
| interaction.
|
| And if one thinks about it, most of the value any company
| derives comes down to some sort of real world interaction,
| wether directly or by proxy.
| Cacti wrote:
| When we can define and measure intelligence, perhaps these
| discussions will be meaningful.
| zamalek wrote:
| Which might explain the evolutionary reason for dreaming: it
| prunes hallucination. Might it make sense to interleave
| training and dreaming?
| Almondsetat wrote:
| Of course it's inevitable.
|
| Things can be facts or deductions of facts (or both). If I ask an
| LLM the date of birth of Napoleon and it doesn't have it in its
| dataset there are only 2 options: either it has other facts from
| which Napoleon's birthday can be deduced or it doesn't. If it
| does then by improving the LLM we will be able to make more and
| more deductions, it if doesn't then it can only hallucinate.
| Since there will always be a set of facts that the LLM is not
| aware of and which cannot be deduced we will always have
| hallucinations
| jtc331 wrote:
| Is this a way of saying that large language models don't have
| the concept of "I don't know"?
| andybak wrote:
| Your "Of course" is a red flag.
|
| Unless you have a very precise definition of "LLM" then there's
| no "of course". It's possible to imagine a hypothetical
| software system capable of returning "I don't know" or "I'm not
| sure".
|
| I haven't RTFA but I presume they are arguing within specific
| constraints. The important point is - could an augmented LLM or
| an "LLM plus something else" solve this.
|
| I don't have an answer to that but I don't think it's an "of
| course" type answer.
| samatman wrote:
| I don't see why that's inevitable at all. I immediately
| recognize that I don't happen to know Napoleon's birthday, and
| act on that basis.
| keepamovin wrote:
| This is why you need to pair language learning with real world
| experience. These robots need to be given a world to explore --
| even a virtual one -- and have consequences within, and to
| survive it. Otherwise it's all unrooted sign and symbol systems
| untethered to experience.
| phh wrote:
| I think I agree with you (I even upvoted), but this might be an
| anthropomorphism.
|
| Back like 3-5 years ago, we already thought that about LLMs:
| They couldn't answer questions about what would fall when stuff
| are attached together in some non-obvious way, and the argument
| back then was that you had to /experience/ it to realize it.
| But LLMs have long fixed those kind of issues.
|
| The way LLMs "resolve" questions is very different from us. At
| this point, I think that if we want to prove that LLMs need to
| be rooted in the real world to achieve intelligence, we need to
| find some real-world phenomenon that is so obvious that noone
| ever wrote about it... but then we'd have written about it?
| barnacs wrote:
| Think of it this way:
|
| Intelligent beings in the real world have a very complex
| built-in biological error function rooted in real world
| experiences: sensory inputs, feelings, physical and temporal
| limitations and so on. You feel pain, joy, fear, have a
| limited lifetime, etc.
|
| "AI" on the other hand only have an external error function,
| usually roughly designed to minimize the difference of the
| output from that of an actually intelligent real world being.
| pfdietz wrote:
| A hallucinator seems like an excellent thing to have as a
| component to an intelligent system, but it must be coupled with
| evaluators.
|
| Such an architecture seems plausible for the human brain as well.
| viktour19 wrote:
| If hallucination is inevitable? What should developers do?
|
| Design user experiences that align users with this behaviour!
|
| Relatedly, I built a game to demonstrate how one might calibrate
| users to the responses of LLMs:
|
| https://news.ycombinator.com/item?id=39255583
| demondemidi wrote:
| I bet that, basically, an LLM is just a part of a future AI. The
| same way that a distributor is part of an internal combustion
| engine. Or the way Minsk's described The Society of Mind. Eager
| to see if an LLM can be bolted on to some new kind of giant model
| that does something like mitigate the LLMs weaknesses. Maybe
| it'll be a collection of radically different models working
| together in 20 years, and not just a single model. Like, uh, our
| own brains. It reminds me of how lex and yacc are super important
| as grammar and tokenizer, but are only just the front end to much
| larger projects.
| WirelessGigabit wrote:
| I feel the same way about information provided by LLMs as with
| lots of pre-LLM articles and overall YouTube 'information'
| videos.
|
| Sources matter. You're either rehashing information from a
| collection of sources or you have your own data to prove your
| statements.
| coffeefirst wrote:
| Yeah, this is why I really like Kagi's approach: it's clearly
| marked for what is, and cites its sources so you can verify the
| quality of the answer (or at least get a ballpark idea of where
| it's coming from) yourself.
| lebuffon wrote:
| Complete AI layman here but it seems to me that the human mind's
| architecture has an overarching "executive" module that is
| involved in managing the numerous "expert" systems that do other
| stuff for us. (split brain experiments demonstrate multiple
| "minds" in the brain)
|
| Based on what we know about different systems in the brain it
| might be a stretch to expect hallucination free AGI, using only a
| single LLM.
| Swizec wrote:
| > Based on what we know about different systems in the brain it
| might be a stretch to expect hallucination free AGI, using only
| a single LLM.
|
| Based on what we know about brains, it might be a stretch to
| expect hallucination free AGI. I've yet to meet a general
| intelligence that doesn't hallucinate.
|
| Lots of fun examples from chickens who turn off when they see a
| straight line to cats who get spooked by cucumbers and the
| endless fun cognitive biases humans fall for.
| pylua wrote:
| Sometimes hallucination is sort of a feature instead of a bug.
| For instance, if I ask it how to do something and it hallucinates
| , usually it is perfectly logical for it to work the way it
| suggests, even though it is wrong.
|
| In other words, it can be a good feature request or idea.
| Fripplebubby wrote:
| The core argument in this paper it seems to me from scanning it
| is that because P != NP therefore LLMs will hallucinate answers
| to NP-complete problems.
|
| I think this is a clever point and an interesting philosophical
| question (about math, computer science, and language), but I
| think people are mostly trying to apply this using our
| commonsense notions of "LLM hallucination" rather than the formal
| notion they use in this paper, and I don't see an obvious
| connection, since commonsense hallucinations (eg inventing
| chapters of a novel when asked to produce summaries, inventing
| specific details when asked) don't seem to be NP-complete
| problems but rather are hallucinatory for some other interesting
| reason. (I apologize if I have not captured the paper correctly
| and would welcome correction on that, I read it quickly)
|
| The statement that the formal world (the world of math and logic
| and formal grammars) is a subset of the "real" world (or perhaps,
| the world of natural language) is really interesting to me as
| well. Most humans can't solve formal logic problems and parse
| formal grammars but don't suffer from a (strong) hallucination
| effect, and can work in natural language in great proficiency. Is
| hallucination inevitable in humans since we also can't solve
| certain NP-complete problems? We have finite lifespans, after
| all, so even with the capabilities we might never complete a
| certain problem.
| foobarian wrote:
| Humans have some amount of ability to recognize they hit a wall
| and adjust accordingly. On the other hand this (completeness
| theorems, Kolmogorov complexity, complexity theory) was only
| arrived at what, in the 20th century?
| digitalsushi wrote:
| 'Adjust accordingly' includes giving up and delivering
| something similar to what I asked, but not what I asked; is
| this the point at which the circle is complete and AI has
| fully replaced my dev team?
| foobarian wrote:
| Well in the example of an NP complete problem, a human
| might realize they are having trouble coming up with an
| optimal solution and start analyzing complexity. And once
| they have a proof might advise you accordingly and perhaps
| suggest a good enough heuristic.
| lazide wrote:
| Have you managed dev teams before? It's really nice when
| they do that, but that is far from the common case.
| flextheruler wrote:
| Is the commenter above you implying humans hallucinate to
| the level of LLMs? Maybe hungover freshman working on a
| tight deadline without having read the book do, but not
| professionals.
|
| Even a mediocre employees will often realize they're
| stuck, seek assistance, and then learn something from the
| assistance instead of making stuff up.
| groestl wrote:
| Depending on the country / culture / job description,
| "making stuff up" is sometimes a viable option for
| "adjust accordingly", on all levels of expertise.
| pixl97 wrote:
| People commonly realize when they are stuck... But note,
| the LLM isn't stuck, it keeps producing (total bullshit)
| material, and this same problem happens with humans all
| the time when they go off on the wrong tangent and some
| supervisory function (such as the manager of a business)
| has to step in and ask wtf they are up to.
| skywhopper wrote:
| One thing a human might do that I've never seen an LLM do
| is ask followup and clarifying questions to determine what
| is actually being requested.
| ericb wrote:
| What makes this fascinating to me is, these LLM's were
| trained on an internet filled with tons of examples of
| humans asking clarifying questions.
|
| Why _doesn 't_ the LLM do this? Why is the "next, most-
| likely token" _never_ a request for clarification?
| bongodongobob wrote:
| GPT4 absolutely asks for clarification all the time.
| steveBK123 wrote:
| Everyone assumes the AI is going to replace their employees
| but not replace them.. fascinating.
| Jensson wrote:
| Uber proves we can replace Taxi management with simple
| algorithms, that was apparently much easier than
| replacing the drivers. I hope these bigger models can
| replace management in more industries, I'd love to have
| an AI as a manager.
| steveBK123 wrote:
| Yeah on the one hand people think AI management id
| dystopian (probably lol), on the other hand probably
| fewer than 50% of ICs promoted to management are good at
| it.
|
| North of 25% are genuinely bad at it. We've all worked
| for several of these.
|
| Many of us have tried our hand at management and then
| moved back to senior IC tracks. Etc.
| p1esk wrote:
| The only way to reduce hallucinations in both humans and LLMs
| is to increase their general intelligence and their knowledge
| of the world.
| robrenaud wrote:
| A smart bullshit artist who loves the sound of their own
| voice is going to generate more hallucinations than a less
| smart, more humble, more self aware person.
|
| Making LLMs more knowledgeable is great (more data, bigger
| models, yay!), but there are other avenues of plausible
| attack as well. Enabling LLMs to know when they have veered
| off distribution might work. That is, the LLM doesn't have to
| know more of the world, it just has to know what it knows and
| stay there. A person who says "I don't know" is a lot more
| valuable than an overzealous one who spouts nonsense
| confidently. Encouraging an LLM to say that there is a
| disagreement about a topic rather than picking one lane is
| also a valuable way forward.
| p1esk wrote:
| I agree with your points but they are orthogonal to mine. A
| smart person might be more likely to say "I don't know"
| than a stupid person.
|
| Also, a smart bullshit artist in your example does not
| hallucinate - he knows what he's doing.
| FpUser wrote:
| You post amounts to: in order to be smarter I need to
| increase my smartness. Great insight.
| lazide wrote:
| I think it's more subtly misleading - to be smarter, I need
| more knowledge. But knowledge != smart, knowledge ==
| informed, or educated.
|
| And the problem is more - how can an LLM tell us it doesn't
| know something instead of just making up good sounding, but
| completely delusional answers.
|
| Which arguably isn't about being smart, and is only
| tangentially about less or more (external) knowledge
| really. It's about self-knowledge.
|
| Going down the first path is about knowing everything (in
| the form of facts, usually). Which hey, maybe?
|
| Going down the second path is about knowing oneself. Which
| hey, maybe?
|
| They are not the same.
| p1esk wrote:
| Hallucinations are an interesting problem - in both
| humans and statistical models. If we asked an average
| person 500 years ago how the universe works, they would
| have confidently told you the earth is flat and it rests
| on a giant turtle (or something like that). And that
| there are very specific creatures - angels and demons who
| meddle in human affairs. And a whole a lot more which has
| no grounding in reality.
|
| How did we manage to reduce that type of hallucination?
| ottaborra wrote:
| by taking steps to verify everything that was said
| ebcode wrote:
| >> And the problem is more - how can an LLM tell us it
| doesn't know something instead of just making up good
| sounding, but completely delusional answers.
|
| I think the mistake lies in the belief that the LLM
| "knows" things. As humans, we have a strong tendency to
| anthropomorphize. And so, when we see something behave in
| a certain way, we imagine that thing to be doing the same
| thing that we do when we behave that way.
|
| I'm writing, and the machine is also writing, but what
| I'm doing when I write is _very_ different from what the
| machine does when it writes. So the mistake is to say, or
| think, "I think when I write, so the machine must also
| think when it writes."
|
| We probably need to address the usage of the word
| "hallucination", and maybe realize that the LLM is
| _always_ hallucinating.
|
| Not: "When it's right, it's right, but when it's wrong,
| it's hallucinating." It's more like, "Sweet! Some of
| these hallucinations are on point!"
| p1esk wrote:
| _I think when I write, so the machine must also think
| when it writes._
|
| What is it exactly you do when you "think"? And how is it
| different from what LLM does? Not saying it's not
| different, just asking.
| Jensson wrote:
| There are probably many, but the most glaring one is that
| LLMs has to write a word every time it thinks, meaning it
| can't solve a problem before it starts to write down the
| solution. That is an undeniable limitation of current
| architectures, it means that the way the LLM answers your
| question also matches its thinking process, meaning that
| you have to trigger a specific style of response if you
| want it to be smart with its answer.
| p1esk wrote:
| Ok, so how do humans solve a problem? Isn't it also a
| sequential, step by step process, even if not expressed
| explicitly in words?
|
| What if instead of words a model would show you images to
| solve a problem? Would it change anything?
| dkjaudyeqooe wrote:
| No, I don't know how other people think but I just focus
| on something and the answer pops into my head.
|
| I generally only use a step by step process if I'm
| following steps given to me.
| pixl97 wrote:
| >but I just focus on something and the answer pops into
| my head.
|
| It's perfectly valid to say "I don't know", because no
| one really understand these parts of the human mind.
|
| The point here is saying "Oh the LLM thinks word by word,
| but I have a magical black box that just works" isn't
| good science, nor is it a good means of judging what LLMs
| are capable or not capable of.
| ebcode wrote:
| That's a difficult question to answer, since I must be
| doing a lot of very different things while thinking. For
| one, I'm not sure I'm never _not_ thinking. Is thinking
| different from "brain activity"? We can shut down the
| model, store it on disk, and boot it back up. Shut down
| my brain and I'm a goner.
|
| I'm open to saying that the machine is "thinking", but I
| do think we need more clear language to distinguish
| between machine thinking and human thinking.
|
| EDIT: I chose the wrong word with "thinking", when I was
| trying to point out the logical fallacy of
| anthropomorphizing the machine. It would have been more
| clear if I had used the word "breathing": When I write
| I'm breathing, so the machine must also be breathing.
| renegade-otter wrote:
| It's statistical prediction. LLMs do not "understand" the
| world by definition.
|
| Ask an image generator to make "an image of a woman sitting
| on a bus and reading a book".
|
| Images will be either a horror show or at best full of weird
| details that do not match the real world - because it's not
| how any of this works.
|
| It's a glorified auto-complete that only works due to the
| massive amounts of data it is trained on. Throw in any
| complex interactions it has not seen in the training data and
| it's all over.
| sshumaker wrote:
| You're being downvoted because this is a hot take that
| isn't supported by evidence.
|
| I just tried exactly that with dalle-3 and it worked well.
|
| More to the point, it's pretty clear LLMs do form a model
| of the world, that's exactly how they reason about things.
| There was some good experiments on this a while back -
| check out the Othello experiment.
|
| https://thegradient.pub/othello/
| HDThoreaun wrote:
| I think the situation is a lot more complicated than youre
| making it out to be. GPT4 for example can be very good at
| tasks it has not seen in the training data. The philosophy
| of mind is much more open ended and less understood than
| you seem to think.
| godelski wrote:
| > not seen in the training data
|
| Do you have some evidence for this?
| renegade-otter wrote:
| What's the evidence? OpenAI's claims? They do have an
| inherent interest is making investors believe this
| technology is magic.
| HeatrayEnjoyer wrote:
| Why do people say stuff like this that is so demonstrably
| untrue? SD and GPT4 do not exhibit the behavior described
| above and they're not even new.
| godelski wrote:
| Neither of these comments are accurate. (edit: but
| renegade-otter is more correct)
|
| Here's 1.5 EMA https://imgur.com/mJPKuIb
|
| Here's 2.0 EMA https://imgur.com/KrPVUGy
|
| No negatives, no nothing just the prompt. 20 steps of
| DPM++ 2M Karras, CFG of 7, seed is 1.
|
| Can we make it better? Yeah sure, here's some examples:
| https://imgur.com/Dmx78xV, https://imgur.com/HBTitWm
|
| But I changed the prompt and switched to DPM++ 3M SDE
| Karras
|
| Positive: beautiful woman sitting on a bus reading a
| book,(detailed [face|eyes],detailed
| [hands|fingers]:1.2),Tokyo city,sitting next to a window
| with the city outside,detailed book,(8k HDR RAW Fuji
| film:0.9),perfect reflections,best
| quality,(masterpiece:1.2),beautiful
|
| Negative: ugly,low quality,worst quality,medium
| quality,deformed,bad hands,ugly face,deformed book,bad
| text,extra fingers
|
| We can do even better if we use LoRAs and textual
| inversions, or better checkpoints. But there's a lot of
| work that goes into making really high quality photos
| with these models.
|
| Edit: here is switching to Cyberrealistic checkpoint:
| https://imgur.com/gFMkg0J,
|
| And here's adding some LoRAs, TIs, and prompt
| engineering:
|
| https://imgur.com/VklfVVC (https://imgur.com/ZrAtluS,
| https://imgur.com/cYQajMN), https://imgur.com/ci2JTJl
| (https://imgur.com/9tEhzHF, https://imgur.com/4Ck03P7).
|
| I can get better, but I don't feel too much like it just
| to prove a point.
| TacticalCoder wrote:
| > I can get better, but I don't feel too much like it
| just to prove a point.
|
| Honestly these pictures you posted do prove GP's point...
| renegade-otter wrote:
| You kind of proved my point. Of course the "finger
| situation" is getting better but people handling complex
| objects is still where these tools trip. They can't
| reason about it - they just need to see enough data of
| people handling books. On a bus. Now do this for ALL
| possible objects in the world.
|
| I have generated hundreds of these - the bus cabin LOOKS
| like a bus cabin, but it's a plausible fake - the poles
| abruptly terminate, the seats are in weird unrealistic
| configurations, unnatural single-row isles, etc. Which is
| why I called it a super-convincing autocomplete.
| paulmd wrote:
| > Why do people say stuff like this that is so
| demonstrably untrue? SD and GPT4 do not exhibit the
| behavior described above and they're not even new.
|
| it's true that most people do not actually understand the
| problem/limitation, but it's a discussion that is
| statistically likely to occur on the internet and
| therefore people tend to regurgitate the words without
| understanding the concept.
|
| I'm being facetious but honestly it's a major theme of
| this whole AI revolution, people do not want to accept
| that humans are just another kind of machine and that
| their own cognition resembles AI/ML in virtually every
| aspect. People confabulate. People overreach the bounds
| of their expertise. People repeat words and concepts
| without properly understanding the larger context in
| which they need to be applied. Etc etc.
|
| Has nobody ever watched someone get asked a big question
| or an unexpected question and "watched the wheels turn",
| or watched them stammer out some slop of incoherent words
| while they're processing? Does nobody have "canned
| responses" that summarize a topic that you can give
| pretty much the same (but not exactly, of course) every
| time you are asked it? Is that not "stochastic word
| chains"?
|
| _By design_ neural nets work almost exactly the same as
| your brain. But a lot of people are trapped in the idea
| that there must be some kind of "soul" or something that
| makes _human_ cognition fundamentally different. By
| design, it 's not. And we don't fully understand the
| exact modalities to encode information in it usefully and
| process it yet, but that's what the whole process here is
| about.
|
| (I commented about this maybe 6 months ago, but the
| _real_ hot take is that what we think of as
| "consciousness" isn't a real thing, or even an "overseer"
| within the mind - "consciousness" may be exactly the
| thing people say when they mean that "LLMs have to write
| a word every time they think about a concept".
| "Consciousness" may in fact be a low-dimensional
| _projection_ of the actual computation occurring in the
| brain itself, rationalizing and explicating the symbolic
| computations of the brain in some form that can be
| written down and communicated to other humans.
| "Language" and "consciousness" as top-level concepts may
| actually only be an annex that our brain has built for
| storing and communicating those symbolic computations,
| rather than a primary thing itself. It's not in control,
| it's only explaining decisions that we already have
| made... we see the shadows on the wall of plato's cave
| and think that's the entire world, but it's really only a
| low-dimensional projection.)
|
| (or in other words - everyone assumes consciousness is
| the OS, or at least the application. But actually it may
| be the json serializer/deserializer. I.e. not actually
| the thing in control at all. _Our entire lives and
| decisionmaking processes_ may in fact be simple
| rationalizations and explanations around "what the
| subconscious mind thinks should happen next".)
| someplaceguy wrote:
| > because P != NP therefore LLMs will hallucinate answers to
| NP-complete problems.
|
| I haven't read the paper, but that sounds like it would only be
| true if the definition of "hallucinating" is giving a wrong
| answer, but that's not how it's commonly understood.
|
| When people refer to LLMs hallucinating, they are indeed
| referring to an LLM giving a wrong (and confident) answer.
| However, not all wrong answers are hallucinations.
|
| An LLM could answer "I don't know" when asked whether a certain
| program halts and yet you wouldn't call that hallucinating.
| However, it sounds like the paper authors would consider "I
| don't know" to be a hallucinating answer, if their argument is
| that LLMs can't always correctly solve an NP-complete problem.
| But again, I haven't read the paper.
| Fripplebubby wrote:
| Yes, I think you're right. I think one way to phrase the
| authors' argument is:
|
| * There is a class of problems which are harder than
| polynomial time complexity to solve, but are not np-complete
|
| * LLMs will generate an "answer" in formal language to this
| class of problems posed to it
|
| * LLMs can at most solve problems with polynomial time
| complexity due to their fundamental design and principles
|
| * Therefore, LLMs cannot solve > polynomial problems and not
| np-complete problems either
|
| All of which I buy completely. But I think what people are
| more interested in is, why is it that the LLM gives an answer
| when we can prove that it cannot answer this problem
| correctly? And perhaps that is more related to the
| commonsense notion of hallucination than I first gave it
| credit for. Maybe the reason that an LLM gives a formal
| language answer is the same reason it gives a hallucinatory
| answer in natural language. But I don't think the paper sheds
| light on that question
| dragonwriter wrote:
| > why is it that the LLM gives an answer when we can prove
| that it cannot answer this problem correctly?
|
| Brcause LLMs are not "problem solving machines" they are
| text completion models, so (when trained for q-and-a
| response) their function is to produce text output which
| forms a plausible seeming response to the question posed,
| not to execute an algorithm which solves the logical
| problem it communicatss. Asking "why do LLMs do exactly
| what they are designed to do, even when they cannot do the
| thing that that behavior implies to a human would have been
| done to produce it" just reveals a poor understanding of
| what an LLM is. (Also, the fact that they structurally
| can't solve a class of problems does not mean that they
| can't produce correct answers, it means they can't
| _infallibly_ produce correct answers; the absence of a
| polynomial time solution does not rule out an arbitrarily
| good polynomial time approximation algorithm, though its
| unlikely than an LLM is doing that, either.)
| someplaceguy wrote:
| > their function is to produce text output which forms a
| plausible seeming response to the question posed
|
| Answering "I don't know" or "I can't answer that" is a
| perfectly plausible response to a difficult logical
| problem/question. And it would not be a hallucination.
| dragonwriter wrote:
| > Answering "I don't know" or "I can't answer that" is a
| perfectly plausible response to a difficult logical
| problem/question.
|
| Sure, and you can train LLMs to produce answers like that
| more often, but then users will say your model is lazy
| and doesn't even try, whereas if you train it to be more
| likely to produce something that looks like a solution
| more often, people will think "wow, the AI solved this
| problem I couldn't solve". And that's why LLMs behave the
| way they do.
| someplaceguy wrote:
| > Sure, and you can train LLMs to produce answers like
| that more often, but then users will say your model is
| lazy and doesn't even try, whereas if you train it to be
| more likely to produce something that looks like a
| solution more often, people will think "wow, the AI
| solved this problem I couldn't solve".
|
| Are you saying that LLMs can't learn to discriminate
| between which questions they should answer "I don't know"
| vs which questions they should try to provide an accurate
| answer?
|
| Sure, there will be an error rate, but surely you can
| train an LLM to minimize it?
| dragonwriter wrote:
| > Are you saying that LLMs can't learn to discriminate
| between which questions they should answer "I don't know"
| vs which questions they should try to provide an accurate
| answer?
|
| No, I am saying that they are specific trained to do
| that, and that the results seen in practice on common
| real-world LLMs reflect the bias of the specific training
| they are given for providing concrete answers.
|
| > Sure, there will be an error rate, but surely you can
| train an LLM to minimize it?
|
| Giving some answer to a question that cannot be
| infallibly solved analytically is not necessarily an
| error. In fact, I would argue that providing useful
| answers in situations like that is among the motivating
| use cases for AI.
|
| (Whether or not the answers current LLMs provide in these
| cases are useful is another question, but you miss 100%
| of the shots you don't take.)
| pixl97 wrote:
| >Are you saying that LLMs can't learn to discriminate
| between which questions they should answer "I don't know"
| vs which questions they should try to provide an accurate
| answer?
|
| This is highly problematic and highly contextualized
| statement.
|
| Imagine you're an accountant with the piece of
| information $x. The answer you give for the statement
| "What is $x" is going to be highly dependent on _who_ is
| answering the question. For example
|
| 1. The CEO asks "What is $x"
|
| 2. A regulator at the SEC asks "What is $x
|
| 3. Some random individual or member of the press asks
| "What is $x"
|
| An LLM doesn't have the other human motivations a person
| does when asked questions, pretty much at this point with
| LLMs there are only one or two 'voices' it hears (system
| prompt and user messages).
|
| Whereas a human will commonly lie and say I don't know,
| it's somewhat questionable if we want LLMs intentionally
| lying.
|
| In addition human information is quite often
| compartmentalized to keep secrets which is currently not
| in vogue with LLMs as we are attempting to make oracles
| that know everything with them.
| someplaceguy wrote:
| > The answer you give for the statement "What is $x" is
| going to be highly dependent on who is answering the
| question.
|
| I assume you meant _asking_ rather than _answering_?
|
| > An LLM doesn't have the other human motivations a
| person does when asked questions, pretty much at this
| point with LLMs there are only one or two 'voices' it
| hears (system prompt and user messages).
|
| Why would LLMs need any motivation besides how they are
| trained to be helpful and the given prompts? In my
| experience with ChatGPT 4, it seems to be pretty good at
| discerning what and how to answer based on the prompts
| and context alone.
|
| > Whereas a human will commonly lie and say I don't know,
| it's somewhat questionable if we want LLMs intentionally
| lying.
|
| Why did you jump to the conclusion that an LLM answering
| "I don't know" is lying?
|
| I want LLMs to answer "I don't know" when they don't have
| enough information to provide a true answer. That's not
| lying, in fact it's the opposite, because the alternative
| is to hallucinate an answer. Hallucinations are the
| "lies" in this scenario.
|
| > In addition human information is quite often
| compartmentalized to keep secrets which is currently not
| in vogue with LLMs as we are attempting to make oracles
| that know everything with them.
|
| I'd rather have an oracle that can discriminate when it
| doesn't have enough information to provide a true answer
| and replies "I don't know" in such cases (or sometimes
| answer like "If I were to guess, then bla bla bla, but
| I'm not sure about this"), than one which always gives
| confident but sometimes wrong answers.
| paulmd wrote:
| if more guiderails are useful to users then such things
| will surely emerge.
|
| but from an engineering perspective it makes sense to
| have a "generalist model" underneath that is capable of
| "taking its best guess" if commanded, and then trying to
| figure out how sure it is about its guess, build
| guiderails, etc. Rather than building a model that is
| implicitly wishy-washy and always second-guessing itself
| etc.
|
| The history of public usage of AI has basically been that
| too many guiderails make it useless, not just gemini
| making japanese pharohs to boost diversity or whatever,
| but frankly even mundane usage is frustratingly
| punctuated by "sorry I can't tell you about that, I'm
| just an AI". And frankly it seems best to just give
| people the model and then if there's domains where a
| true/false/null/undefined approach makes sense then you
| build that as a separate layer/guiderail on top of it.
| tsol wrote:
| It isn't designed to know things. It doesn't know what
| exactly it knows, where it could check before answering.
| It generates an output, which isn't even the same thing
| every time. So this again is a problem of not
| understanding how it functions
| fauigerzigerk wrote:
| Does the paper distinguish between hallucination and
| approximation?
|
| If LLMs could be trained to approximate NP-complete functions
| rather than making stuff up, that would be good enough in many
| contexts. I guess it's what humans would do.
| thargor90 wrote:
| You cannot approximate NP-complete functions. If you could
| approximate them with a practically useful limited error and
| at most P effort you would have solved P=NP. (disclaimer my
| computer science classes have been a long time ago)
| kalkin wrote:
| This isn't correct. What you may be remembering is that
| some (not all) NP complete problems have limits on how
| accurately they can be approximated (unless P = NP). But
| approximation algorithms for NP complete problems form a
| whole subfield of CS.
| moyix wrote:
| The theorem that proves this is the PCP Theorem, in case
| anyone wants to read more about it: https://en.wikipedia.
| org/wiki/PCP_theorem#PCP_and_hardness_o...
| fauigerzigerk wrote:
| Perhaps I'm not using the vocabulary correctly here.
|
| What I mean is, if you ask a human to solve a travelling
| salesman problem and they find it too hard to solve
| exactly, they will still be able to come up with a better
| than average solution. This is what I called approximation
| (but maybe this is incorrect?).
|
| Hallucination would be to choose a random solution and
| claim that it's the optimum.
| alwa wrote:
| I may be misunderstanding the way LLM practitioners use
| the word "hallucination," but I understood it to describe
| it as something different from the kind of "random"
| nonsense-word failures that happen, for example, when the
| temperature is too high [0].
|
| Rather, I thought hallucination, in your example, might
| be something closer to a grizzled old salesman-map-
| draftsman's folk wisdom that sounds like a plausibly
| optimal mapping strategy to a boss oblivious to the
| mathematical irreducibility of the problem. Imagining a
| "fact" that sounds plausible and is rhetorically useful,
| but that's never been true and nobody ever said was true.
|
| It'll still be, like your human in the example, better
| than average (if "average" means averaged across the
| universe of all possible answers), and maybe even useful
| enough to convince the people reading the output, but it
| will be nonetheless false.
|
| [0] e.g. https://news.ycombinator.com/item?id=39450669
| fauigerzigerk wrote:
| If a driver is tasked with visiting a number of places,
| they will probably choose a reasonably good route. If the
| driver claims to have found the optimal route, it may not
| be true, but it's still not a hallucination and it's
| still a pretty good route.
|
| The driver certainly cannot be relied on to always find
| an exact solution to an NP-complete problem. But failure
| modes matter. For practical purposes, the driver's
| solution is not simply "false". It's just suboptimal.
|
| If we could get LLMs to fail in a similarly benign way,
| that would make them far more robust without disproving
| what the posted paper claims.
| pixl97 wrote:
| > but it will be nonetheless false.
|
| Only if you're assuming all questions have binary
| answers.
|
| For example in the traveling salesman problem you don't
| have to compute all answers to start converging on an
| average. A random sampling of solutions can start setting
| a bounds for average, and your grizzled salesmans guesses
| would fall somewhere on that plot. If they are
| statistically better than average then they are far more
| than good enough. Unless of course you think burning up
| the observable universe in finding the best solution is
| the only way to solve the problem of which trip uses the
| least gas?
| kenjackson wrote:
| Last I'd heard it was still open if P != NP. And most questions
| I've seen hallucinations on are not NP-Complete.
| Animats wrote:
| Yes. It looks like they introduce infinities and then run into
| the halting problem for infinities. That may not be helpful.
|
| The place where this argument gets into trouble is where it
| says "we define hallucination in a formal world where all we
| care about is a computable ground truth function f on S." This
| demands a reliable, computable predicate for truth. That alone
| is probably not possible.
|
| If, however, we are willing to accept a ground truth function
| with outputs - True - False -
| Unknown - Resource limit exceeded
|
| that problem can be avoided. Now the goal is manageable -
| return True or False only when those results are valid, and try
| to reduce the fraction of useful queries for which Unknown and
| Resource Limit Exceeded are returned.
|
| The same problem comes up in program verification systems, and
| has been dealt with in the same way for decades. Sometimes,
| deciding if something is true is too much work.
| Fripplebubby wrote:
| Well put. Overall this paper feels very Godel Incompleteness
| for LLMs which is _interesting_ and perhaps even valuable to
| somebody, but because it attaches itself to this hot query
| 'hallucination', I think some people are finding themselves
| searching this paper for information it does not contain.
| samatman wrote:
| Hallucination is a misnomer in LLMs and it depresses me that it
| has solidified as terminology.
|
| When humans do this, we call it confabulation. This is a
| psychiatric symptom where the sufferer can't tell that they're
| lying, but fills in the gaps in their knowledge with bullshit
| which they make up on the spot. Hallucination is an entirely
| different symptom.
|
| And no, confabulation isn't a normal thing which humans do, and
| I don't see how that fact could have anything to do with P !=
| NP. A normal person is aware of the limits of their knowledge,
| for whatever reason, LLMs are not.
| navane wrote:
| When you talk to your mom and you remember something
| happening one way, and she remembers it another way, but you
| both insist you remember it correctly, one of you is doing
| what the LLM is doing (filling up gaps of knowledge with bull
| shit). And even when later you talk about this on meta level,
| no one calls this confabulation because no one uses that
| word. Also this is not a psychiatric syndrome, it's just
| people making shit up, inadvertently, to tell a coherent
| story without holes. It very much sounds you did the same.
| Everyone does this all the time.
| jiggawatts wrote:
| Just ask any criminal attorney or police detective. Normal
| people can't get their facts straight even if they all
| witnessed something memorable first-hand just hours ago.
| pixl97 wrote:
| >confabulation isn't a normal thing which humans do
|
| > A normal person is aware of the limits of their knowledge,
| for whatever reason, LLMs are not.
|
| Eh, both of these things are far more complicated. People
| perform minor confabulations all the time. Now, there is a
| medical term for confabulation to about a more serious
| medical condition that involves high rates of this occurring
| coupled with dementia, and would be the less common form. We
| know with things like eye witness testimony people turn into
| confabulatory bullshit spewing devices very quickly, though
| likely due to different mechanisms like recency bias and over
| writing memories by thinking about them.
|
| Coupled with that, people are very apt to lie about things
| they do know and can do for a multitude of reasons and
| attempting to teach an LLM to say "I don't know" when it
| doesn't know something, versus it just lying to you and
| saying it doesn't know will be problematic. Just see ChatGPT
| getting lazy in some of its releases for backfire effects
| like this.
| singingfish wrote:
| Classic confabulation is observed with some kinds of
| alcohol related brain damage where people drink and get
| malnourished for a period of years. People with these
| syndromes create quite coherent complex stories which they
| will not be able to recall subsequently. This is quite
| different to filling in the blanks for remembered
| conversations where later on there is an opportunity for
| error correction. With confabulation there is not as it's
| tightly bound to memory impairment.
|
| So I'm in the camp where LMMs are confabulating - and
| there's and I personally think the argument that they can
| be seen as confabulation machines has some validity.
| bitwize wrote:
| > but rather are hallucinatory for some other interesting
| reason.
|
| In improv theater, the actor's job is to come up with plausible
| interactions. They are free to make shit up as they go along,
| hence improv, but they have to keep their inventions plausible
| to what had just happened before. So in improv if someone asks
| you "What is an eggplant?" it is perfectly okay to say "An
| eggplant is what you get when you genetically splice together
| an egg and a cucumber" or similar. It's _nonsense_ but it 's
| nonsense that follows nicely from what just came before.
|
| Large language models, especially interactive ones, are a kind
| of improv theater by machine: the machine outputs something
| statistically plausible to what had just come before; what
| "statistically plausible" means is based on the data about
| human conversations that came from the internet. But if there
| are gaps in the data, or the data lacks a specific answer that
| seems to statistically dominate, it seems like giving a
| definitive answer is more plausible in the language model than
| saying "I don't know", so the machine selects definitive, but
| wrong, answers.
| ninetyninenine wrote:
| Fiction and story writing is hallucination. It is the opposite of
| a stochastic parrot.
|
| We've achieved both extremes of AI. Computers can be both logical
| machines and hallucinators. Our goal is to create a machine that
| can be both at the same time and can differentiate between both.
|
| That's the key. Hallucination is important but the key is for the
| computer to be self aware about when it's hallucinating.
|
| Of course it's a hard problem but even humans hallucinate
| massively. Just look at religion. Only one religion can be right
| or none, that must mean,logically speaking all other religions
| are hallucinations.
| beardedwizard wrote:
| It is not the opposite of stochastic parrot, it is exactly the
| same thing only the predictions are worse due to sparse
| training data.
| elicksaur wrote:
| Comparing religion to LLM mistakes is a case of the very
| prevalent anthropomorphism in society currently. I fear this
| misunderstanding and conflation will prevent us actually
| improving the tech.
|
| Coming to an incorrect explanation such as, "Helios pulls the
| Sun across the sky every day," is categorically different than
| a math program incorrectly returning the most likely next token
| in a sequence. LLMs don't have beliefs at all.
|
| Helios answers a question "Why does the Sun rise?" Holding such
| a belief shows a logical understanding that _some_ force must
| be doing this, but due to a lack of knowledge of the world, the
| person comes up with an incorrect explanation.
|
| LLMs can't pose and reason about such questions. It is just not
| the same class of "hallucinations." Assuming we've figured out
| cognition via word prediction is going to get us nowhere fast
| in the long term.
| ninetyninenine wrote:
| No. I never said we figured out cognition.
|
| The LLM is still a black box feed forward network. It is the
| intricacies of how signals interact with each other in this
| network that we don't fully understand.
|
| Word prediction and curve fitting are high level concepts we
| used to build an intelligence we don't fully understand.
|
| Also belief and understanding are orthogonal concepts. I can
| believe something I don't understand and I can understand
| something I don't belief.
|
| My claim here is that LLMs understand things from a certain
| aspect because LLMs can produce output indistinguishable from
| understanding.
|
| Also because both the human brain and the LLM are black boxes
| there is no other metric we can use to gauge the level of
| understanding an LLM has other than comparing it's inputs and
| outputs to the human brain.
| MauranKilom wrote:
| > even humans hallucinate massively
|
| Simpler example: Dreams.
| ninetyninenine wrote:
| Yeah good point. But dreams are easily distinguishable from
| reality.
|
| Religion is often indistinguishable from truth and reality to
| those who hallucinate it.
| samatman wrote:
| Confusing sincere but incorrect belief with hallucination
| is categorically wrong.
| AlexandrB wrote:
| Most humans are extremely aware of the difference between
| dreams and reality. If LLMs had similar awareness when they
| hallucinated there wouldn't be a problem.
| timeon wrote:
| > Just look at religion.
|
| This is bit off-topic but what I see as one of driving force
| behind existence of religions is need for personification. It
| seems easier for human to interact with the world and its
| elements by communicating with it as it was familiar parson-
| like entity.
|
| Now when we talk about LLMs and AI in general, there is often
| personification as well.
| ninetyninenine wrote:
| LLMs are trained to actually imitate human understanding
| deliberately. The data is human and the high level training
| is defined as the most likely word prediction of human output
| data.
|
| So not surprising to find aspects of personification in the
| LLM. It is trained on US.
| NateEag wrote:
| > Only one religion can be right or none, that must
| mean,logically speaking all other religions are hallucinations.
|
| There are some mistakes in this sentence.
|
| It is possible (if unlikely) that multiple religions accurately
| describe some aspects of the world, while being mistaken about
| others. That is, treating rigorous complete "correctness" as
| the only useful state a religion could have is very misleading.
| Newtonian physics and special relativity both fail to predict
| some observed phenomena, but they're still both useful (and not
| every religion claims rigorous perfect correctness, even if
| some do).
|
| Even if some religions can be shown to be wrong, that doesn't
| automatically mean that they're hallucinations. People can
| believe things for plausible reasons and be wrong about them.
|
| People can also have reasonable stances like "I cannot prove
| this is true, and would not try to, but my subjective personal
| experience of visions of God persuade me it's probably real."
|
| That seems very different to me from an LLM hallucinating a
| paper from whole cloth out of the blue.
| wseqyrku wrote:
| So they always hallucinate, it's just sometimes good enough?
| throwawaaarrgh wrote:
| LLMs literally just place words one in front of another based on
| a probability and the "goodness" of training data. Of course it's
| going to make stuff up.
| graemebenzie wrote:
| I think part of understanding is filling in the gaps between
| facts. AIs can't recognize when that gap is too large
| graemebenzie wrote:
| I think understanding comes from filling the gaps between the
| facts. AIs can't tell when the gap between points of knowledge is
| too large to interpolate.
| valine wrote:
| This paper is arguing that it's impossible for an LLM to know the
| answer to every question, therefore it's impossible to eliminate
| hallucination.
|
| It's easy to imagine an LLM that responds "I don't know" to all
| questions. An LLM like that isn't very useful, but it also
| doesn't hallucinate. Eliminating hallucination by teaching it to
| recognize what it doesn't know is probably a more sane approach
| than teaching an LLM to know literally everything in the
| universe.
| zyklonix wrote:
| Hallucinations are essential for divergent thinking. Not
| everything is solved following goal driven approaches. Check out
| DreamGPT: https://github.com/DivergentAI/dreamGPT
| zuminator wrote:
| This is sort of like the compression algorithm "problem." For the
| overwhelming majority of inputs, compression algorithms don't
| compress, and it can be proven that on average they don't work.
| But we're not really interested in compressing things on average.
| What we use compression for amounts to edge cases of highly
| regularized or repeatable data.
|
| Thus the fact that LLMs can be proven in general to hallucinate
| doesn't necessarily imply that they must hallucinate in the types
| of situations for which we use them for. The paper itself
| discusses a number of mitigating strategies -- such as
| supplementing their training data with current information or
| using multiple LLMs to vote on the accuracy of the outcome --
| only to basically brush them aside and advise not to use LLMs in
| any sort of critical situation. And that's probably true enough
| today, but in the future I think these strategies will greatly
| reduce the severity of these hallucinations. Just as we as human
| beings have developed strategies to reduce our reliance on pure
| memory.
|
| This reminds me of a deposition I had to give a number of years
| back. One of the lawyers asked me if I remembered how the
| plaintiff and I came to discuss a certain accusation leveled at
| him by the defendant. And I confidently stated, "Sure, he and I
| used to have various conversations about the issue and one day he
| the plaintiff brought up this thing that defendant said to him."
| And the lawyer says, if you want to, you can refer to your phone
| text log to refresh your memory. Then I looked at my phone, and
| the truth was that I myself had spoken to the defendant, and she
| told me the accusation, and then I went and shared it with the
| plaintiff. So, I basically remembered the situation exactly
| backwards, i.e., a hallucination, which I was able to repair by
| referring to real world information instead of just my memory.
| educaysean wrote:
| Is it a theoretical distinction of "we can't get to 0%, but we
| can virtually trivialize it by reducing its frequency down to to
| 1x10^-8%" type of scenario? Or is it something that requires an
| external layer of control?
| DinaCoder99 wrote:
| "Hallucination" implies perception of non-real things, not
| generation of phrases that map poorly to reality (or are simply
| incoherent). It seems like a really bad term for this phenomenon.
| TheBlight wrote:
| "Bullsh***ing" seems more apt.
| earthwalker99 wrote:
| It makes being wrong sound impressive and mysterious, so it's
| here to stay.
| bonzaidrinkingb wrote:
| We use "confabulation".
|
| It is a feature, not a bug.
|
| "confabulation" could be "solved" when LLMs realize they are
| uncertain on a reply and making things up. But that requires
| the LLM saying "I don't know" and rewarding that more than a
| wrong guess. That requires a change in loss functions and not
| even sure if all users desire that.
| nedt wrote:
| Well humans believe that vacination either kills people or gives
| them chips for tracking and the top politicians are lizard people
| drinking the blood of children kept in caves and they had to fake
| a pandemic to get them out. I'd say an A.I. hallucinating isn't
| that far off from real humans. It's rather the recipient that
| needs to interpret any response from either.
| katehikes88 wrote:
| Have you considered that parts of what you said might be true
| but you ridicule it only because you associate with the others
| might be untrue and maybe even ridiculous?
| nedt wrote:
| It might be true, or not. It might be ridiculous, or not.
| With _it_ being a message from a human or an AI. A
| hallucination is not so much a problem as long as there is
| not blind trust or a single source of truth. And oh boy would
| I like to be pure ridiculous or satirical with the example of
| what humans are believing.
| bonzaidrinkingb wrote:
| Around 12k fatal outcomes have been reported in the EU after
| vaccination, but it is not certain in all cases that vaccines
| were the cause.
|
| The vaccine tracking chips come from two Microsoft (-affiliate)
| patents, one about using chips to track body activity to reward
| in cryptocurrency, and another about putting a vaccine passport
| chip in the hands of African immigrants. That vaccines contain
| tracking chips is a fabricated conspiracy to ridicule and
| obfuscate.
|
| Lizard people is often an anti-semitic dog whistle.
|
| Rich elites use blood transfusions of young people to combat
| aging and age-related disease.
|
| Children have been kept in cages and feral children have lived
| in caves.
|
| You likely made up the part about faking a pandemic to get
| children out of caves, unless you can point to discussion about
| these beliefs.
|
| Real humans do hallucinate all the time.
| dragonwriter wrote:
| > Real humans do hallucinate all the time.
|
| No, they don't hallucinate "all the time", but LLM
| "hallucination" is a bad metaphor, as the phenomenon is more
| like confabulation than hallucination.
|
| Humans also don't confabulate all the time, either, though.
| bonzaidrinkingb wrote:
| > "Everyone experiences hallucinations," Anil Seth, a
| neuroscientist at the University of Sussex in the UK, told
| DW.
|
| > "It's important to recognize hallucinations can come and
| go during our lives at points of stress or tiredness," Seth
| said. "There is a bit of a stigma around hallucinations. It
| comes from people associating them with mental illness and
| being called crazy."
|
| > But it's actually very common and happens even daily. The
| itching Yarwood experiences is particularly common,
| especially after drinking alcohol.
|
| > "It's also common for people with reduced hearing or
| vision function to get hallucinations in that ear or eye,"
| said Rick Adams, a psychiatrist at University College
| London. "These are non-clinical hallucinations because they
| are not associated with a psychiatric diagnosis."
|
| https://www.dw.com/en/hallucinations-are-more-common-than-
| yo...
|
| Confabulation is more like making something up when you
| don't have sufficient knowledge. Seems to happen regularly
| :)
| nedt wrote:
| In Germany and Austria we have those Querdenker telegram
| channels. All examples I've given are coming from there. I'd
| really like to say I've made it up. But all you did with my
| message is also what I'd do with AI output. It can be trained
| on wrong data, not understanding the question or make stuff
| up. Just like a human.
| bonzaidrinkingb wrote:
| I think you are (subconsciously) strawmanning the anti-vax
| movements like Querdenker. Most of these believe that
| mandatory vaccination (or reducing freedom of unvaccinated,
| or making it economically infeasible/required to work) is
| bad and goes against individual human rights, and that the
| risks and benefits of vaccines were not clearly
| communicated.
|
| So, even if you did not make it up, it is twisting the
| viewpoints to reduce their legitimacy by tying these to
| ridiculous theories. One could do similar by cherrypicking
| vaccine proponents and their ridiculous theories (like
| claiming COVID came from the wet market).
|
| If these channels are not indexed, I have a hard time
| believing you, given your misgivings and ridicule on your
| other statements. If a discussion about "Pandemic was faked
| to get children out of caves" can be sourced, please do so.
|
| AI output is already more careful and fair and balanced on
| these matters.
| nedt wrote:
| Source is Die Zeit as written here:
| https://news.ycombinator.com/item?id=39504716
|
| You could also find it in other sources like Science
| Busters etc. Most of it will be German, because Germany
| and Austria does have a real problem with some
| (dis-)believes in the medical system.
|
| Pretty sure other sources of human halicunations could be
| given (WMD in Iraq, lot of bad things because of religon,
| ... ). Point is not the strawman itself, but rather that
| any message needs evaluation. AI or not.
| nedt wrote:
| Here out from the German wikipedia about the lockdown being
| used to cover up the use of children for their blood:
| "According to the initial interpretation, the mass quarantine
| (the "lockdown") does not serve to combat the pandemic, but
| is intended to provide Trump and his allies with an excuse to
| free countless children from torture chambers, where
| adrenochrome is being withdrawn en masse on behalf of the
| elite." - translated via Google translate, but source is here
| with Die Zeit as source
| https://de.wikipedia.org/wiki/QAnon#cite_ref-29
| bonzaidrinkingb wrote:
| Thanks for the source so I can put this into context (which
| is the context of Russian disinformation, not grassroots
| beliefs representative of the anti-vax movement).
| nedt wrote:
| Ok I take it back. Hallucination is a problem. Seeing the
| downvotes and comments here it does seem to be hard to see what
| is made up and what is just fuxxed up humans. The will to
| believe that humans can't be that stupid is bigger than I
| thought and same evaluation might lead to an AI response taken
| as truth if it's just calm enough and sounds plausible.
| whycome wrote:
| Maybe this is why we need to force LLMs to dream. To get all the
| hallucinations out of the way during 'down time' and clean things
| up for 'waking hours'.
___________________________________________________________________
(page generated 2024-02-25 23:01 UTC)