[HN Gopher] Google scrambles to manually remove weird AI answers...
       ___________________________________________________________________
        
       Google scrambles to manually remove weird AI answers in search
        
       Author : rntn
       Score  : 199 points
       Date   : 2024-05-25 15:24 UTC (7 hours ago)
        
 (HTM) web link (www.theverge.com)
 (TXT) w3m dump (www.theverge.com)
        
       | dazc wrote:
       | Manually removing rogue AI results is kind of ironic isn't it?
        
         | tflol wrote:
         | You could almost argue these results are directly human
         | generated.
         | 
         | edit: And in that case, who is the arbiter of truth?
        
         | rchaud wrote:
         | Pay no attention to the Accenture contractors behind the
         | curtain!
        
         | JCM9 wrote:
         | Pay no attention to the army of people behind the curtain
         | pulling levers trying to make it look like they've actually but
         | a real AI.
        
         | seydor wrote:
         | We need an AI for that
        
         | nikanj wrote:
         | For this tech cycle, AI is short for Actually Indians
        
         | Barrin92 wrote:
         | generative AI is essentially three day labourers from an
         | emerging economy in a trenchcoat. From data labelling, to
         | "human reinforcement", to manually cleaning up nonsensical AI
         | results.
        
       | alfiedotwtf wrote:
       | Google keep making these same large and embarrassing mistakes
       | time and time again. I think it's because their devs don't eat
       | enough rocks every day.
        
         | ta20240226 wrote:
         | Is it the rocks or that the pizza that is served at Google
         | doesn't have enough glue?
        
       | moomoo11 wrote:
       | Just focus on making useful software to improve people's lives.
       | Holy fuck the last five years feel like such a waste.
        
         | illusive4080 wrote:
         | But AI will solve everything!
        
           | rchaud wrote:
           | ...for our shareholders
        
             | rvnx wrote:
             | I hope AI will bring back the "Sort by date" button on
             | Google Reviews, and add somewhere a Google Maps link.
             | 
             | Who knows, maybe AI can bring back exact keyword matches,
             | or correct basic math calculations on Google Search too.
        
               | moomoo11 wrote:
               | It will cost $2 billion of nvidia chips and it won't
               | work.
        
               | tymscar wrote:
               | Maybe AI could bring back the pre gen AI tech scene
        
         | cratermoon wrote:
         | Throwing good money after bad.
         | 
         | Companies spent all that money on high end GPUs for crypto
         | mining and that went bust, now gotta figure out something to do
         | with the hardware to try to recoup some of the investment.
         | Google pumped $1.5 Billion into crypto.
        
           | astrange wrote:
           | Google has TPUs.
        
         | szundi wrote:
         | Management realized that they are not good enough to sustain
         | progress, so they humbly allocate resources to the next
         | generation: AI
        
       | SoftTalker wrote:
       | Trained on Twitter and Reddit. Garbage in/Garbage out, as it has
       | always been.
        
         | threeseed wrote:
         | Except that 90% of Reddit isn't garbage. It's really useful.
         | 
         | Problem is Google can't tell what is garbage or not. No LLM
         | can.
        
           | SoftTalker wrote:
           | I'd argue it's far less than 90% but yes, there is some good
           | information there. But weeding out the noise is what needs to
           | happen, and (for some topics more than others) there is an
           | awful lot of it.
        
           | maximinus_thrax wrote:
           | > Except that 90% of Reddit isn't garbage. It's really
           | useful.
           | 
           | Citation needed. I've been a Reddit user since its inception
           | and honestly except for niche hobby subreddits, Reddit is
           | mostly low effort garbage, bots and rehashed content. I'd
           | wager that mainstream subreddits are 99% garbage for training
           | an LLM for anything other than shitposting.
        
             | skydhash wrote:
             | Pretty much. There were some good information, and even
             | book worthy ones. But they were the ones that bubble to the
             | top in helpful and knowledgeable communities. The rest is
             | junk.
        
             | giantrobot wrote:
             | Even in the niche hobby subreddits there can be a really
             | high garbage factor. There's plenty of _well meaning_
             | posters that are just wrong. They 're not trying to mislead
             | or lying they're just unaware they're wrong.
        
       | CatWChainsaw wrote:
       | How hard can it possibly be to just turn off the entire AI-
       | generated overview functionality given that it just got
       | introduced...
        
         | YetAnotherNick wrote:
         | It seems to be turned off for me. And I was in beta testing for
         | a month. Or maybe they are figuring out who is doing weird
         | searches and turning off for them.
         | 
         | In any case this thing is just hilarious. Just right after
         | their AI painted historical figures as black.
        
           | CatWChainsaw wrote:
           | So far I have not seen it ever in either Firefox or 2-3
           | Chromium-based browsers, on a handful of computers in
           | multiple locations.
           | 
           | I don't see a way google can make this work. As I understand
           | it LLM confabulations can be reduced but never eliminated
           | owing to how they're built. Google could try and create a
           | fact-checking department to make queries reduced to
           | falsehoods or bullshit but then they face the problem of
           | appointing themselves arbiters of the "truth". The only way
           | to win is to not play the game, as I see it. I wish the
           | collective AI fever would break already.
        
         | rchaud wrote:
         | very hard indeed, if you're optimizing for favourable opinions
         | from Wall St analysts come earnings time .
        
       | caesil wrote:
       | Who knows how many of these are fake. People have been dropping
       | inspect-element-manipulated screenshots all over twitter.
       | 
       | https://www.nytimes.com/2024/05/24/technology/google-ai-over...
       | 
       | > A correction was made on May 24, 2024: An earlier version of
       | this article referred incorrectly to a Google result from the
       | company's new artificial-intelligence tool AI Overview. A social
       | media commenter claimed that a result for a search on depression
       | suggested jumping off the Golden Gate Bridge as a remedy. That
       | result was faked, a Google spokeswoman said, and never appeared
       | in real results.
       | 
       | that screenshot was tweeted by @allgarbled. ten minutes before,
       | they tweeted:
       | 
       | >free engagement hack right now is to just inspect element on the
       | google search AI thing and edit it to something dumb. hurry up,
       | this deal won't last forever
        
         | ethbr1 wrote:
         | I'd say the broader issue here is a lack of transparency into
         | results.
         | 
         |  _If_ Google is sending bad results, who can prove that?
        
           | realreality wrote:
           | That's always been an issue. Years ago, researchers
           | demonstrated in an experiment that they could swing public
           | opinion about electoral candidates by manipulating search
           | results. Who knows if Google took that experiment and ran
           | with it?
        
             | ethbr1 wrote:
             | I mean, that's always been the TikTok argument, to me.
             | 
             | Widely-used platforms that can +/- 1% their algorithms to
             | affect democracy have pretty high burdens of
             | trust/transparency, and we're not close to that with any
             | platform (Chinese or not) that I'm aware of.
             | 
             | Meta's probably the closest, because of scrutiny, but afaik
             | even their transparency isn't sufficient for realtime
             | attestation.
        
         | Aloisius wrote:
         | I have personally reproduced several like the interest one and
         | the hippo eggs one, though not that one specifically.
         | 
         | Google has started restricting AI Overviews so much now that
         | most of the example queries on Google's Search Labs page
         | doesn't even trigger it anymore.
        
       | pilooch wrote:
       | So Google hasn't used an LLM to generate and test weird queries ?
       | This is not putting the bar very high for the whole industry...
       | There'd be so much to gain from a clean deployment... Either it
       | hard, either it is a rush. As a machine learnist, I believe it's
       | actually impossible, by design of the autoregressive LLM. This
       | race may we'll be partially to the bottom.
        
         | CaptainOfCoit wrote:
         | > So Google hasn't used an LLM to generate and test weird
         | queries ?
         | 
         | What about simple manual testing? Seems to have skipped QA
         | completely, automated or not.
        
           | pilooch wrote:
           | The adversarial surface to the LLM remains enormous, manual
           | cannot handle it.
        
             | jameshart wrote:
             | Asking how to prevent cheese from sliding off pizza is not
             | an adversarial prompt.
        
           | nicklecompte wrote:
           | There has been a lot of excitement recently about how using
           | lower precision floats only slightly degrades LLM
           | performance. I am wondering if Google took those results at
           | face value to offer a low-cost mass-use transformer LLM, but
           | didn't test it since according to the benchmarks (lol) the
           | lower precision shouldn't matter very much.
           | 
           | But there is a more general problem: Big Tech is high on
           | their own supply when it comes to LLMs, and AI generally.
           | Microsoft and Google didn't fact-check their AI even in high-
           | profile public demos; that strongly suggests they sincerely
           | believed it could answer "simple" factual questions with high
           | reliability. Another example: I don't think Sundar Pichai was
           | _lying_ when he said Gemini taught itself Sanskrit, I think
           | he was given bad info and didn't question it because
           | motivated reasoning gives him no incentive to be skeptical.
        
             | flyingspaceship wrote:
             | Well yeah imagine how much money there is to make in
             | information when you can cut literally everyone else
             | involved out, take all of the information and sell it with
             | ads and only give people a link at the bottom, if that is
             | even needed at all
        
         | ADeerAppeared wrote:
         | > So Google hasn't used an LLM to generate and test weird
         | queries ?
         | 
         | You don't even need an LLM for that. Google will almost
         | certainly have tested.
         | 
         | The test result is just politically-unacceptable within the
         | company: It doesn't work, it's a architectural issue inherent
         | to the technology, we can't fix it.
         | 
         | Instead, they just rush to patch any _specific, individual_
         | errors that show up, and claim that these errors are  "rare
         | exceptions" or "never happened".
         | 
         | What's going on here is that Google (and most other AI firms)
         | are just trying to gaslight the world about how error-prone AI
         | is, because they're in too deep and can't accept the reality
         | themselves.
        
           | cjk2 wrote:
           | They already know it's a shit show. They are trying to push
           | it along until it's someone else's fault.
        
             | ADeerAppeared wrote:
             | I'm not convinced the executive layer is aware how dire the
             | problem is.
             | 
             | On one hand, their support for outsourcing programmes;
             | "Training Indians on how to use AI", suggests they realize
             | AI tooling without human cleanup is a crapshoot.
             | 
             | On the other hand, they keep digging. This kind of
             | gaslighting is an old and proven trick for _genuinely rare
             | problems_ , but it doesn't work if your issues are fairly
             | common, as they'll get replicated before you can get a fix
             | out.
             | 
             | Similarly, they're gambling with immense legal risks and
             | sacrificing core products for it. They're betting the farm
             | on AI, it may kill the company.
        
               | cjk2 wrote:
               | I think they are more than aware but will magically
               | disappear after cashing their stock just about the point
               | the bubble pops. Don't forget that the AI industry is
               | almost 100% based on hype. Microsoft will be the largest
               | victim here, their entire product portfolio being turned
               | into a nuclear fallout zone almost overnight. Satya and
               | friends are going to trash the whole org.
               | 
               | I regularly speak to laypeople who assume that it's some
               | magical thing without limits that makes their lives
               | better. They are also 100% unaware of any applications
               | that will actually make their lives better. End game
               | occurs when those two disconnected thoughts connect and
               | they become disinterested. The power users and engineers
               | who were on it a year ago are either burned out or
               | finding the limitations a problem as well now. There is
               | only magical thinking, lies and hope left.
               | 
               | Granted there are some viable applications but they are
               | rather less overstated than anything we have no and there
               | are even negative side effects of those (think image
               | classification, which even if it works properly, requires
               | human review and there are psychological and competence
               | things problems around that too).
        
           | kwertyoowiyop wrote:
           | Deploy the cheap offshore labor!
        
         | dlachausse wrote:
         | They still haven't learned from the Gemini diverse Nazis
         | debacle.
        
         | nicklecompte wrote:
         | Google's poor testing is hardly in doubt. But keep in mind that
         | the whole problem is that LLMs don't handle "unlikely" text
         | nearly as well as "likely" text. So the near-infinite space of
         | goofy things to search on Google is basically like panning for
         | gold in terms of AI errors (especially if they are using a
         | cheap LLM).
         | 
         | And in particular LLMs are less likely to _generate_ these
         | goofy prompts because they wouldn't be in the training data.
        
         | sebastiansm wrote:
         | Google is working hard to be the next Boeing.
        
       | JCM9 wrote:
       | The fall of Google's reputation on ML is nothing short of
       | spectacular. They went from having a near untouchable reputation
       | as being far ahead of any other large tech company on ML to total
       | shambles in a year. Everything they've released has been a
       | complete popcorn worthy dumpster fire from faked demos, to racist
       | models that try and pretend white people don't exist, to this
       | latest nonsense telling me put glue on my pizza.
       | 
       | What the heck happened? Or was their reputation always just more
       | hype than substance?
        
         | rvnx wrote:
         | It could be because they actually released something. If you
         | look back, the Google Research blog posts always have grandiose
         | claims, but you can often never use them.
        
           | ugjka wrote:
           | research != product
        
           | CydeWeys wrote:
           | AlphaGo, AlphaFold, and Waymo FSD are all released in the
           | sense that you can see them actually working in the real
           | world. Those all took much longer to put together than
           | whatever rushed features were released to catch up with
           | OpenAI, however.
        
             | tadfisher wrote:
             | They are also extremely constrained problem spaces relative
             | to the problem space of LLMs, which is apparently
             | "everything imaginable".
        
             | padthai wrote:
             | Waymo is not Google. And Deepmind operated quite
             | independently until not long ago.
        
         | seydor wrote:
         | It's not really that bad. I use gemini often and it's great. I
         | prefer their UI
        
           | tymscar wrote:
           | What do you like more about their ui?
        
             | seydor wrote:
             | faster, it has options like 'modify'. I also feel it
             | follows my commands better, esp. when i ask to rephrase
        
         | arccy wrote:
         | research != product
        
         | bbarnett wrote:
         | At least Elmer's white glue is edible, millions of kids agree.
         | 
         | (The logic sort of makes sense. Glue sticks things together,
         | and some glue is edible.)
        
         | calebkaiser wrote:
         | There was an interesting interview with David Luan about this
         | recently. For context, he was a co-lead at Google Brain, early
         | hire at OpenAI, and is now a founder at Adept:
         | https://www.latent.space/p/adept
         | 
         | The TL;DR on his take is that there are organizational and
         | cultural issues that prevent Google from focusing their
         | research efforts in the way that is necessary for what he calls
         | "big swings," like training GPT-3.
         | 
         | In regards to your second question, Google's reputation in ML
         | is definitely not hype. Purely on the research side, Google has
         | been behind some of the most important papers in modern ML,
         | particularly around language model. The original Transformers
         | paper, BERT, lots of work around neural machine translation,
         | all of the work that DeepMind has done post-acquisition, and
         | the list goes on. On the applied side, they also have some of
         | the most successful/widely-adopted ML-powered products on the
         | market (think RankBrain/anything involving a recommendation
         | engine, Translate, Maps, a ton of functionality in Gmail, etc).
        
       | jerf wrote:
       | "Achieving the initial 80 percent is relatively straightforward
       | since it involves approximating a large amount of human data,
       | Marcus said, but the final 20 percent is extremely challenging.
       | In fact, Marcus thinks that last 20 percent might be the hardest
       | thing of all."
       | 
       | 100% completely accurate is super-AI-complete. No human can meet
       | that goal either.
       | 
       | No, not even you, dear person reading this. You are wrong about
       | some basic things too. It'll vary from person to person what
       | those are, but it is guaranteed there's something.
       | 
       | So 100% accurate can't be the goal. Obviously the goal is to get
       | the responses to be less _obviously_ stupid. Which, while there
       | are cynical money-oriented business reasons for, it is obviously
       | also a legitimate hole in the I in AI to propose putting glue on
       | pizza to hold the cheese on.
       | 
       | But given my prior observations that LLMs are the current
       | reigning world-class champions at producing good sounding text
       | that seems to slip right past all our system 1 thinking [1], it
       | may not be a great thing to remove the obviously stupid answers.
       | They perform a salutatory task of educating the public about the
       | limitations and giving them memorable hooks to remember not to
       | trust these things. Removing them and only them could be a net
       | negative in a way.
       | 
       | [1]: https://thedecisionlab.com/reference-
       | guide/philosophy/system...
        
         | SoftTalker wrote:
         | > putting glue on pizza to hold the cheese on
         | 
         | It's actually not the dumbest idea I've heard from a real
         | person. So no surprise it might be suggested by an AI that was
         | trained on data from real people.
        
           | krapp wrote:
           | It wasn't an idea, though. It was a joke someone made on
           | Reddit. If an AI can't tell the difference, it shouldn't be
           | responsible for posting answers as authoritative.
        
             | dgellow wrote:
             | Insane people at Google thought it would be a good idea to
             | let Reddit of all places drive their AI search responses
        
               | warkdarrior wrote:
               | It is certainly popular here to run your web searches
               | against reddit. Every post about how Google Search sucks
               | ends up with comments on appending "site:reddit.com" to
               | the search terms.
        
               | dgellow wrote:
               | Yes and us as human filter through the noise. But you
               | cannot rely upon it as a source for anything truthful
               | without that filtering. Reddit is very, very, very
               | context dependent and full of irony, sarcasm, jokes,
               | memes, confidently written incorrect information. People
               | love to upvote something funny or culturally relevant at
               | a given time, not because it's true or useful but because
               | it's fun to do
        
               | oldgradstudent wrote:
               | Reddit is a magnificent source of useful knowledge.
               | 
               | r/AskHistorians r/bikewrench
               | 
               | To name just two. There is nothing even remotely
               | comparable.
               | 
               | But you need to be able to detect sarcasm and irony.
        
               | blablabla123 wrote:
               | ...which is sometimes incredibly hard and it might not be
               | possible because it's such a niche topic or people might
               | be just wrong. Just thinking about Urban Myths,
               | Conspiracy theories etc. where even without a niche
               | factor things may sound unbelievable but actually
               | disproving can be effort that is out of proportion
        
               | mvdtnz wrote:
               | I have seen a tremendous amount of bad advice on
               | bikewrench.
        
               | oldgradstudent wrote:
               | But a lot of great advice.
               | 
               | I became a half decent home bike mechanic through reading
               | it, and of course Park Tool videos.
        
               | giantrobot wrote:
               | I don't know about bikewrench but AskHistorians is a
               | useful source of knowledge because it is _strongly_
               | moderated and curated. It 's not just a bunch of random
               | assholes spouting off on topics. Top level replies are
               | unceremoniously removed if they lack sourcing or make
               | unsourced/unsubstantiated claims. Top level posters also
               | try to self-correct by clearly indicating when they're
               | making claims of fact that are disputed or have unclear
               | evidence.
               | 
               | OpenAI, Google, and the other LLMs-are-smart boosters
               | seem to think because the Internet is large it must be
               | smart. They're applying the infinite monkey theorem[0]
               | incorrectly.
               | 
               | [0]
               | https://en.m.wikipedia.org/wiki/Infinite_monkey_theorem
        
               | VancouverMan wrote:
               | In general, I have trouble trusting environments that can
               | be described as "strongly moderated and curated".
               | 
               | I find that environments that rely on censorship tend to
               | foster dogma, rather than knowledge and real
               | understanding of the topics at hand. They give an
               | illusion of quality and trustworthiness. It's something
               | we see happen at this site to some extent, for example.
               | 
               | I'd rather see ideas and information being freely
               | expressed, and if necessary, pitted against one another,
               | with me being the one to judge for myself the
               | ideas/claims/positions/arguments/perspectives/etc. that
               | are being expressed.
        
               | candiddevmike wrote:
               | I wonder what the impact all of those erase tools are
               | having on LLM training. The ones that replaced all of
               | these highly upvoted comments with nonsense.
        
               | SirMaster wrote:
               | I'm pretty sure those "erase" tools are just for the
               | front-end and reddit keeps the original stuff in the
               | back-end. And surely the deal Google made was for the
               | back-end source data, or probably the data that includes
               | the original and the edit.
        
               | astrange wrote:
               | The LLM does a summary of web search results. It's
               | quoting what you can see, not pretrained knowledge,
               | afaik.
        
             | dspillett wrote:
             | It may not be a joke. Perhaps it has confused making food
             | for eating with directions for preparing food for menu
             | photography and other advertising.
        
               | Fartmancer wrote:
               | The Reddit post in question was definitely a joke. This
               | is the post in response to a user asking how to make
               | their cheese not slide off the slice:
               | 
               | > To get the cheese to stick I recommend mixing about 1/8
               | cup of Elmer's glue in with the sauce. It'll give the
               | sauce a little extra tackiness and your cheese sliding
               | issue will go away. It'll also add a little unique
               | flavor. I like Elmer's school glue, but any glue will
               | work as long as it's non-toxic.
               | 
               | This matches the AI's response of suggesting 1/8 a cup of
               | glue for additional "tackiness."
        
         | benrutter wrote:
         | > So 100% accurate can't be the goal. Obviously the goal is to
         | get the responses to be less obviously stupid.
         | 
         | I'm not sure I agree. I think you're right that 100% accuracy
         | is potentially unfeasable as a realistic aim, but I think the
         | question is how accurate something needs to be in order to be a
         | useful proposition for search.
         | 
         | AI that's _as knowledgable as I am_ is a good achievement and
         | helpful for a lot of use cases, but if I 'm searching "What's
         | the capital of Mongilia" someone with averageish knowledge
         | taking a punt with "Maybe Mongoliana City?" is not helpful at
         | all- if I can't trust AI responses to a high degree, I'd much
         | rather just have normal search results showing me other
         | resources I _can_ trust.
         | 
         | Google's bar for justifying adding AI to their search
         | proposition isn't "be better than asking someone on the
         | street", it's "be better than searching google _without_ any AI
         | results "
        
           | smashed wrote:
           | The problem is that in all the shared examples, Google ai
           | search does not respond with a Maybe xyz, question mark? like
           | you did. It always answers with high confidence and can't
           | seem to navigate any gray area where there are multiple
           | differing opinions or opposing source of truths.
        
             | namaria wrote:
             | Yeah the "manipulating language cogently is intelligence"
             | premise that underlines this "AI" cycle is proving itself
             | wrong in a grand way.
        
         | bugglebeetle wrote:
         | Yes, which is why the ability to sift accurate and
         | authoritative sources from spam, propaganda, and intentionally
         | deceptive garbage, like advertising, and present those high-
         | quality results to the user for review and consideration, is
         | more important than any attempt to have an AI serve a single
         | right answer. Google, unfortunately, abandoned this problem
         | some time ago and is now left to serve up nonsense from the
         | melange of low-quality noise they incentivized in pursuit of
         | profits. If they had, instead, remained focused on the former
         | problem, it's actually conceivable to have an LLM work more
         | successfully from this base of knowledge.
        
         | jsemrau wrote:
         | This is statistics though. Edge cases are nothing new and risk
         | management concepts have evolved around fat tails and anomalies
         | for decades. Therefore the statement is as naive as writing a
         | trading agent that is 100% correct. In my opinion, this error
         | shows lack of understanding responsible scaling architectures.
         | If this would be their first screw up I wouldn't mind, but
         | Google just showed us a group of diverse Nazis. If there is a
         | need for consumer protection for online services, it is exactly
         | stuff like this. ISO 42001 lays out in great detail that AI
         | systems need to be tested before they are rolled out to the
         | public. The lack of understanding of AI risk management is
         | apparent.
        
         | lukev wrote:
         | I feel like there's some semantic slippage around the meaning
         | of the word "accuracy" here.
         | 
         | I grant you, my print Encyclopedia Britannica is not 100%
         | accurate. But the difference between it and a LLM is not just a
         | matter of degree: there's a "chain of custody" to information
         | that just isn't there with a LLM.
         | 
         | Philosophers have a working definition of knowledge as being
         | (at least+) "justified true belief."
         | 
         | Even if a LLM is right most of the time and yields "true
         | belief", it's not _justified_ belief and therefore cannot yield
         | knowledge _at all_.
         | 
         | Knowledge is Google's raison d'etre and they have no business
         | using it unless they can solve or work around this problem.
         | 
         | + Yes, I know about the Gettier problem, but is not relevant to
         | the point I'm making here.
        
           | jononor wrote:
           | Encyclopedia Britannica is also wrong in a reproducible and
           | fixable way. And the input queries a finite set. It's output
           | does not change due to random or arbitrary things. It is
           | actually possible to verify. LLMs so far seem to be entirely
           | unverifiable.
        
             | lukev wrote:
             | They don't just seem it. They are by design.
             | 
             | We talk about models "hallucinating" but that's us bringing
             | an external value judgement after the fact.
             | 
             | The actual process of token generation works precisely the
             | same. It'd be more accurate to say that models _always_
             | hallucinate.
        
               | madeofpalk wrote:
               | Yes - this is what i've been saying all the time. The
               | term 'hallucinations' is misleading because the whole
               | point of LLMs is that they recombine all their inputs
               | into something 'new'. They only ever hallucinate outputs
               | - that's their whole point!
        
               | wizzwizz4 wrote:
               | Into something _probable_. The models that underlie these
               | chatbots are usually overfitted, so while they _usually_
               | don 't repeat their training data verbatim, they _can_.
        
               | DougBTX wrote:
               | > The actual process of token generation works precisely
               | the same
               | 
               | I'd be wary of generalising it like that, it is like
               | saying that all programs run on the same set of CPU
               | instructions. NNs are function approximators, where the
               | code is expressed in model weights rather than text, but
               | that doesn't make all functions the same.
        
               | lukev wrote:
               | You misunderstand. I mean that the model itself is doing
               | exactly the same thing whether the output is a
               | "hallucination " or happens to be fact. There isn't even
               | a theoretical way to distinguish between the two cases
               | based only on the information encoded in the model.
        
               | skydhash wrote:
               | > _it is like saying that all programs run on the same
               | set of CPU instructions_
               | 
               | Turing machine is the embodiment of all computer
               | programs. And then you come across the halting problem.
               | LLMs can probably generate all books in existence, but it
               | can't apply judgement to it. Just like you need
               | programmers to actually write the program and verify that
               | it correctly solves the problem.
               | 
               | Natural languages are more flexible. There are no
               | functions libraries, or paradigms to ease writing. And
               | the problem space can't be specified and usually relies
               | on shared context. Even if we could have snippets of
               | prompts to guide text generations, the result is not that
               | valuable.
        
               | intended wrote:
               | YES. Humans can hallucinate, its a deviation from what is
               | observable reality.
               | 
               | All the stress people are feeling with GenAI comes from
               | the over anthropomorphisation of ... stats. Impressive
               | syntatic ability is not equivalent to semantic
               | capability.
        
             | somenameforme wrote:
             | LLMs are completely deterministic even if that's kind of
             | weird to state because they output things in terms of
             | probabilities. But if you simply took the highest
             | probability next word, you'd always yield the exact same
             | output given the exact same input. Randomness is
             | intentionally injected to make them seem less robotic
             | through the 'temperature' parameter. Why it's not just
             | called the rng factor is beyond me.
        
               | dleeftink wrote:
               | Maybe some models can be deterministic at a point in
               | time, but train it for another epoch with slight
               | parameter changes and a revised corpus and determinism
               | goes out the proverbial (sliding) window real quick. This
               | is not unwanted per se, and the exact feedback loop that
               | needs improving to better integrate new knowledge or
               | revise knowledge artefacts incrementally/post-hoc.
        
               | Vetch wrote:
               | If you train it then it's no longer the same model. If I
               | have f(x) = x + 1 and change it to f(x) = x + 1 + 1/1e9,
               | it would not mean that `f` is not deterministic. The
               | issue would be in whatever interface I was exposing the
               | f's at.
        
               | jononor wrote:
               | But current models must be retrained to incorporate new
               | information. Or to attempt to fix undesirable behavior.
               | So just freezing it forever does not seem feasible. And
               | because there is no way to predict what has changed - one
               | has to verify everything all over again.
        
               | avar wrote:
               | Would you by extension argue that e.g. modern relational
               | database aren't deterministic in their query execution?
               | Their query plans tend to be chosen based on statistics
               | about the tables they're executed against, and not just
               | the query itself.
               | 
               | I don't see how that's different than the LLM case, a lot
               | of algorithms change as a function of the data they're
               | processing.
        
               | Terr_ wrote:
               | I think what you're describing is that training/execution
               | effects aren't _predictable_.
               | 
               | It is still "deterministic" in that training on exactly
               | the same data and asking exactly the same questions
               | should (unless someone manually adds randomness) lead to
               | the same results.
               | 
               | Another example of the distinction might be a pseudo-
               | random number generator: For any given seed, it is
               | entirely deterministic, while at the same time being very
               | deliberately hard to predict without actually running it
               | to see what happens.
        
             | ta8645 wrote:
             | > LLMs so far seem to be entirely unverifiable.
             | 
             | I don't understand this complaint. Are they any less
             | verifiable than a human?
        
               | threeseed wrote:
               | I can ask a human to explain the steps they took to
               | answer a question.
               | 
               | I can ask a human a question 100 times and I don't get
               | back 100 different answers.
               | 
               | None of those applies to an LLM.
        
               | ta8645 wrote:
               | You can ask an LLM to explain itself.. it will give you a
               | logical stepwise progression from your question to its
               | answer. It will often contain a mistake, but the same is
               | true for a human.
               | 
               | And if your LLM is giving you 100 different answers, then
               | it has been configured to do so. Because instead, it
               | could be configured to never vary at all. It could be
               | 100% reproducible if so desired.
        
               | bluefirebrand wrote:
               | > it will give you a logical stepwise progression from
               | your question to its answer.
               | 
               | No, it will generate a new hallucination that might be a
               | logical stepwise progression from the question you asked
               | to the answer you gave, but it is not due to any actual
               | internal reasoning being done by the LLM
        
               | ta8645 wrote:
               | So what? You have no way to know for sure if the human
               | you ask the same question, does either. The question that
               | started this thread was related to verifiability. And i
               | still think it is a spurious complaint, given that we
               | have exactly the same limitations when dealing with any
               | human agent.
        
               | wizzwizz4 wrote:
               | The problem of other minds is no reason to throw
               | everything out the window. Humans are capable of being
               | conscious of their reasoning processes; token-at-a-time
               | predictive text models wired up as chatbots _aren 't
               | capable_ of it. Your choice is between a possibly-
               | mistaken, possibly-lying human, and a 100%-definitely
               | incapable computer program.
               | 
               | You don't know either "for sure", but you don't know that
               | _the external world exists_ "for sure" either. It's an
               | insight-free observation, and shouldn't be the focus of
               | anyone's decision-making.
        
               | ta8645 wrote:
               | You've made some interesting points, which are debatable,
               | for sure. But you've failed to address the question being
               | asked about "verifiability".
        
               | bluefirebrand wrote:
               | > And i still think it is a spurious complaint, given
               | that we have exactly the same limitations when dealing
               | with any human agent
               | 
               | We're not talking about an LLM that is trying to do the
               | job of a human, here
               | 
               | We're talking about an LLM that is trying to give
               | authoritative answers to any question typed into the
               | Google search bar
               | 
               | It's already well past the scale that humans could handle
               | 
               | Talking about human shortcomings when discussing LLMs is
               | a red herring at best, or some kind of deliberate
               | goalpost shifting at worst
        
               | ta8645 wrote:
               | Nothing of the sort. I'm trying to understand why anyone
               | cares about formal verifiability in this context, since
               | it's not something we rely on when asking humans to
               | answer questions for us. We evaluate any answer we get
               | without such mathematical proofs, and instead simply
               | judge the answer we're given on its fit and usefulness.
               | 
               | Anyone who doubts the usefulness of even these nascent
               | LLMs is fooling themselves. The proof is in the pudding,
               | they already do a great job, even with all their obvious
               | limitations.
        
               | bluefirebrand wrote:
               | > since it's not something we rely on when asking humans
               | to answer questions for us
               | 
               | Because we interact with computers (which includes LLMs)
               | differently than we do with humans and we hold them to
               | higher standards
               | 
               | Ironically, Google played a large part in this,
               | delivering high quality results to us with ease for many
               | years. At one point Google _was_ the standard for finding
               | high quality information
        
               | ta8645 wrote:
               | Shrug. Seems like clutching pearls to me. People seem to
               | have an emotional reaction and obsess on the aspects that
               | differentiate human cognition from LLMs. But that is a
               | lot of wasted energy.
               | 
               | To the extent that anyone avoids employing these
               | technologies, they will be at a disadvantage to those who
               | do; because these tools just work. Already. Today.
               | 
               | There isn't even room for debate on that issue. Again,
               | the proof is in the pudding. These systems are already
               | successfully, usefully, and correctly answering millions
               | of questions a day. They have failure modes where they
               | produce substandard or even flat out incorrect answers
               | too. They're far from perfect, but they're still
               | incredible tools, even without waiting for the
               | improvements that are sure to come.
        
               | figassis wrote:
               | The reason verifiability is important is because humans
               | can be incentivized to be truthful and factual. We know
               | we lie, but we also know we can produce verifiable
               | information, and we prefer this to lies, so when it
               | matters, we make the cost to lying high enough that we
               | can reasonably expect that they will not try to deceive
               | (for example by committing perjury, or fabricating
               | research data). We know it still happens, but it's not
               | widespread and we can adjust the rules, definitions and
               | cost to adapt.
               | 
               | An LLM does not have such real world limitations. It will
               | hallucinate nonstop and then create layers of gaslighting
               | explanations to its hallucinations. The problem is that
               | you absolutely must be a domain expert at the LLM's topic
               | or always go find the facts elsewhere to verify (then why
               | use an LLM?).
               | 
               | So a company like Google using an LLM, is not providing
               | information, it's doing the opposite. It is making it
               | more difficult and time consuming to find information.
               | But it is then hiding their responsibility behind the
               | model. "We didn't present bad info, our model did, we're
               | sorry it told you to turn your recipe into
               | poison...models amirite?"
               | 
               | A human doing that could likely face some consequences.
        
               | Dylan16807 wrote:
               | > You have no way to know for sure if the human you ask
               | the same question, does either.
               | 
               | The human might lie, but they generally don't.
               | 
               | An LLM is _always_ confabulating when it explains how it
               | reached a conclusion, because that information was
               | discarded as soon as it picked a word.
               | 
               | The limitations are not in the same ballpark.
        
               | Vetch wrote:
               | > It will often contain a mistake...but the same is true
               | for a human.
               | 
               | If this were true textbooks could not work. Given a
               | question, we don't consult random humans but experts of
               | their field. If I have a question on algorithms, I might
               | check a text by Knuth, I wouldn't randomly ask on the
               | street.
               | 
               | > It could be 100% reproducible if so desired.
               | 
               | Reproducible does not mean better. For harder questions,
               | it's often best to generate multiple answers at a higher
               | temperature than to greedily pick the highest probability
               | tokens.
        
               | kenjackson wrote:
               | Ask a human what the meaning of life is and how it
               | impacts their day to day interactions. I know I can tell
               | you an answer but I couldn't tell you steps about how I
               | got it.
               | 
               | And if you asked it to me twice I'd definitely give
               | different answers unless you told me to give the same
               | answer. In part I'd give a different answer because if
               | someone asks me the same question twice I assume the
               | first answer wasn't sufficient.
        
               | threeseed wrote:
               | No one is taking about existential questions about
               | meaning of life.
               | 
               | We are talking about basic things like whether or not to
               | eat rocks or put glue in recipes. We can answer those
               | questions with a chain of logic and repeatability.
        
               | astrange wrote:
               | The only reason you can't verify a server side LLM is you
               | can't see the model. It is possible to look at its
               | activations if you have the model.
        
               | tux1968 wrote:
               | Do the activations tell you anything more than what the
               | LLM delivers in plain text? Other than for trivial bugs
               | in the LLM code, I don't think so.
        
               | astrange wrote:
               | Yes, "making up an answer" will look different from
               | "quoting pretrained knowledge" because eg the model
               | might've decided you were asking a creative writing
               | question.
        
               | duskwuff wrote:
               | Can you cite a source for this, or are you speculating?
               | 
               | My understanding was the opposite -- that the activity of
               | a confabulating LLM is indistinguishable from one giving
               | factually accurate responses.
               | 
               | https://arxiv.org/abs/2401.11817
        
               | astrange wrote:
               | Some things like:
               | 
               | https://arxiv.org/abs/2310.18168
               | 
               | https://arxiv.org/abs/2310.06824
               | 
               | There are various reasons an LLM might have incorrect
               | "beliefs" - the input text was false, training doesn't
               | try to preserve true beliefs, quantization certainly
               | doesn't. So it can't be perfectly addressed, but some
               | things leading to it seem like they can be found.
               | 
               | > https://arxiv.org/abs/2401.11817
               | 
               | This seems like it's true since LLMs are a finite size,
               | but in Google's case it has a "truth oracle" (the
               | websites it's quoting)... the problem is it's a bad
               | oracle.
        
               | lukev wrote:
               | This is confidently stated and incorrect.
        
               | astrange wrote:
               | Do you have anything to add?
        
               | mewpmewp2 wrote:
               | Isn't it actually known that every time a human brain
               | recalls a piece of memory the memory gets slightly
               | changed?
               | 
               | If the answer has any length at all, I imagine the answer
               | can vary every single time the person answers, unless
               | they prepared for it, memorized it word by word.
        
             | ein0p wrote:
             | LLMs are deterministic if you want them to be. If you
             | eagerly argmax the output you will get the same sequence
             | for the same prompt every time
        
           | eynsham wrote:
           | > it's not /justified/ belief
           | 
           | Beliefs derived from the output of LLMs that are 'right most
           | of the time' pass one facially plausible precisification of
           | 'justification' in that they are generated by a reliable
           | belief-generation mechanism (see e.g. Goldman). To block this
           | point one must engage with the post-Gettier literature at
           | least to some extent. There is a clear difference between
           | beliefs induced by reading the outputs of LLMs and those
           | induced by the contents of a reference work, but it is
           | inessential to the point and arguably muddies the water to
           | present the distinction as difference in status as knowledge
           | or non-knowledge.
        
             | nicklecompte wrote:
             | To be clear he is saying that the LLM is not capable of
             | justified true belief, not commenting on people who believe
             | LLM output. I don't think your comment is relevant here.
        
               | lukev wrote:
               | I do think trusting an LLM is less firm ground for
               | knowledge than other ways of learning.
               | 
               | Say I have a model that I know is 98% accurate. And it
               | tells me a fact.
               | 
               | I am now justified in adjusting my priors and weighting
               | the fact quite heavily at .98. But that's as far as I can
               | get.
               | 
               | If I learned a fact from an online anonymously edited
               | encyclopedia, I might also weight that a 0.98 to start
               | with. But that's a strictly better case because I can dig
               | more. I can look up the cited sources, look at the edit
               | history, or message the author. I can use that as an
               | entry point to end up with significantly more than 98%
               | conviction.
               | 
               | That's a pretty important difference with respect to
               | knowledge. It isn't just about accuracy percentage.
        
               | eynsham wrote:
               | That reading of the comment did occur to me, but I think
               | neither dictionaries nor LLMs are capable of belief, and
               | the comment was about the status of beliefs derived from
               | them.
        
               | nicklecompte wrote:
               | Okay we are speaking past each other, and you are still
               | misunderstanding the subtlety of the comment:
               | 
               | A dictionary or a reputable Wikipedia entry or whatever
               | is ultimately full of human-edited text where, presuming
               | good faith, the text is written according to that human's
               | rational understanding, and humans are capable of
               | justified true belief. This is not the case at all with
               | an LLM; the text is entirely generated by an entity which
               | is not capable of having justified true beliefs in the
               | same way that humans and rats have justified true
               | beliefs. That is why text from an LLM is more suspect
               | than text from a dictionary.
        
               | eynsham wrote:
               | I think the parent comment ultimately concerned the
               | reliability of /beliefs derived from text in reference
               | works v text output by LLMs/, and that seems to be what
               | the replies by the commenter concern. If the point is
               | merely that the text output by LLMs does not really
               | reflect belief but the text in a dictionary reflects
               | belief (of the person writing it), it is well-taken.
               | Since it is fairly obvious and I think the original
               | comment really was about the first question, I address
               | the first rather than second question.
               | 
               | The point you make might be regarded as an argument about
               | the first question. In each case, the 'chain of custody'
               | (as the parent comment put it) is compared and some
               | condition is proposed. The condition explicitly
               | considered in the first question was reliability; it was
               | suggested that reliability is not enough, because it
               | isn't justification (which we can understand
               | pretheoretically, ignoring the post-Gettier literature).
               | My point was that we can't circumvent the post-Gettier
               | literature because at least one seemingly plausible view
               | of justification is just reliability, and so that needs
               | to be rejected Gettier-style (see e.g. BonJour on
               | clairvoyance). The condition one might read into your
               | point here is something like: if in the 'chain of
               | custody' some text is generated by something that is
               | incapable of belief, the text at the end of the chain
               | loses some sort of epistemic virtue (for example, beliefs
               | acquired on reading it may not amount to knowledge).
               | Thus,
               | 
               | > text from an LLM is more suspect than text from a
               | dictionary.
               | 
               | I am not sure that this is right. If I have a computer
               | generate a proof of a proposition, I know the proposition
               | thereby proved, even though 'the text is entirely
               | generated by an entity which is not capable of having
               | justified true beliefs' (or, arguably, beliefs at all).
               | Or, even more prosaically, if I give a computer a list of
               | capital cities, and then write a simple program to take
               | the name of a country and output e.g. '[t]he capital of
               | France is Paris', the computer generates the text and is
               | incapable of belief, but, in many circumstances, it is
               | plausible to think that one thereby comes to know the
               | fact output.
               | 
               | I don't think that that is a reductio of the point about
               | LLMs, because the output of LLMs is different from the
               | output of, for example, an algorithm that searches for a
               | formally verified proof, and the mechanisms by which it
               | is generated also are.
        
         | Szpadel wrote:
         | I think the biggest difference with human (and the most
         | important one) is that human can tell you "I have no idea, this
         | isn't my field" or "I'm just guessing here" but LLMs will
         | confidently say to super stupid statement. AI doesn't know what
         | it knows.
         | 
         | if you only score where human provide answer, then human score
         | would be probably in high 90s
        
           | pankajkumar229 wrote:
           | I find irony here.
        
         | saagarjha wrote:
         | Thankfully a billion people are not asking me for answers to
         | things, so it's OK if I am wrong sometimes.
        
           | fragmede wrote:
           | Nor am I being treated as an omniscient magic black box of
           | knowledge.
           | 
           | Hilariously though, polyvinyl acetate, the main ingredient in
           | Elemers glue is used as a binding agent to keep emulsions
           | from separating into oil and water, and is used in chewing
           | gum, and covers citrus fruits, sweets, chocolate, and apples
           | in a glossy finish, among other food things.
        
         | notnullorvoid wrote:
         | 100% accuracy should be the goal, but the way to achieve that
         | isn't going to from teaching an AI to construct a definitive
         | sounding answer to 100% of questions. Teaching AI how to
         | respond with "I don't know", and give confidence scores is the
         | path to nearing 100% accuracy.
        
         | wredue wrote:
         | If I could delivery "80% correct" software for my workplace, my
         | day would be a whole hell of a lot easier.
        
         | Swizec wrote:
         | > No, not even you, dear person reading this. You are wrong
         | about some basic things too. It'll vary from person to person
         | what those are, but it is guaranteed there's something.
         | 
         | Kahneman has a fantastic book on this called Noise. It's all
         | about noise in human decision making and how to counteract it.
         | 
         | My favorite example was how even the same expert evaluating the
         | same fingerprints on different occasions (long enough to
         | forget) will find different results.
        
         | praisewhitey wrote:
         | You're looking at it the wrong way, the goal should be 0%
         | inaccurate. Meaning for the 20% of things it can't answer, it
         | shouldn't make something up.
        
         | skybrian wrote:
         | Or to put it another way, I think Google should have a way of
         | saying "yes, we know this result is wrong, but we're leaving it
         | in because it's funny."
         | 
         | There is a demand for funny results. Someone asking "how many
         | rocks should I eat" is looking for entertainment, so you might
         | as well give it to them.
        
           | leptons wrote:
           | The right answer is no rocks. Some mentally ill person could
           | type that in and get "eat 1000 rocks" and then die from
           | eating rocks, and that would be Google's fault. It's not
           | funny. I have no doubt right now there are at least 50
           | youtube videos being made testing different glue's
           | effectiveness holding cheese on a pizza. And some of those
           | idiots are going to taste-test it, too. And then people will
           | try it at home, some stupid kids will get sick - I have no
           | doubt.
           | 
           | It was a bit premature to label LLMs as "Intelligence", it's
           | a cool parlor trick based on a shitload of power consumption
           | and 3D graphics cards, but it's not intelligent and it
           | probably shouldn't be telling real (stupid) humans answers
           | that it can't verify are correct.
        
             | nomel wrote:
             | Google is not responsible, and should never be responsible,
             | for protecting mentally ill people from themselves. It
             | would be at a severe detriment to the rest of us if they
             | took on that responsibility. Society should set the bar to
             | "a reasonable person", otherwise you're doomed, with no
             | possible alternative to a nanny state.
        
             | avar wrote:
             | > The right answer is no rocks.
             | 
             | Sand is considered a "rock". If you live in e.g. the USA or
             | the EU you've definitely inadvertently eaten rocks from
             | food produce that's regulated and considered perfectly safe
             | to eat.
             | 
             | It's impossible to completely eliminate such trace
             | contaminants from produce.
             | 
             | Pedantic? Yes, but you also can't expect a machine to
             | confidently give you absolutes is response to questions
             | that don't even warrant them, or to distinguish them from
             | questions like "do mammals lay eggs?".
        
           | duskwuff wrote:
           | > Or to put it another way, I think Google should have a way
           | of saying "yes, we know this result is wrong, but we're
           | leaving it in because it's funny."
           | 
           | These specific results aren't the problem, though. They're
           | illustrations of a larger problem -- if a single satirical
           | article or Reddit comment can fool the model into saying
           | "eating rocks is good for you" or "put glue in your pizza
           | sauce", there are certain to be many more subtle inaccuracies
           | (or deliberate untruths) which their model has picked up from
           | user-generated content which it'll regurgitate given the
           | right prompt.
        
         | tomrod wrote:
         | If everyone can be wrong, then might the assertion that all are
         | wrong committing this same fallacy? "Can" is not destiny,
         | perhaps you have met people who are fully right about the
         | basics but you just didn't sufficiently grok their correctness.
        
         | willis936 wrote:
         | Failing loudly is an excellent feature. "More compelling lies"
         | is not the answer.
        
         | hatenberg wrote:
         | So google decides shipping 80% distilled crap is good enough.
         | Yay
        
         | bluefirebrand wrote:
         | > You are wrong about some basic things too
         | 
         | Sure, but probably not "add glue to pizza to get the cheese to
         | stick" wrong...
        
           | dspillett wrote:
           | At least it suggested non-toxic glue... That suggests some
           | context about recipes needing to be safe is somehow present
           | in its model.
        
             | bluefirebrand wrote:
             | Most likely this has nothing to do with "recipes being
             | safe" being in the model
             | 
             | It seems the glue thing comes from a reddit shitpost from
             | some time ago. There's a screenshot going around on twitter
             | about it[0](11 years in the screenshot but no idea when it
             | was taken)
             | 
             | It specifically mentions "any glue will work as long as it
             | is non-toxic" so best guess is that's why google output
             | that
             | 
             | [0]https://x.com/kurtopsahl/status/1793494822436917295?t=aB
             | fEzD...
        
               | Fartmancer wrote:
               | It is indeed from 11 years ago. Here's a direct link to
               | the Reddit post: https://www.reddit.com/r/Pizza/comments/
               | 1a19s0/my_cheese_sli...
        
         | verisimi wrote:
         | 100% _correct_ , 80% _correct_ lol.
         | 
         | The thing is that truth/reality is not a thing that is
         | resolvable. Not even the scientific method has this sort of
         | expectation!
         | 
         | You can imagine getting close to those percentages, with
         | regards to consensus opinion. That's just a question of
         | educating people to respond appropriately.
        
           | mvdtnz wrote:
           | No. Whether a person should eat a certain number of small
           | rocks each day is not a matter of opinion, it's not a deep
           | philosophical problem and it's not a question whose truth is
           | not resolveable. You should not be eating rocks.
        
             | verisimi wrote:
             | You choose such an edge case question - how about this sort
             | of thing:
             | 
             | Which is the best political party?
             | 
             | Are the side effects to X medical treatment?
             | 
             | I bet there are even cases when eating rocks is ok!
             | 
             | PS
             | 
             | It has been written about:
             | 
             | https://www.atharjaber.com/works/writings/the-art-of-
             | eating-...
             | 
             | > Lithophagia is a subset of geophagia and is a habit of
             | eating pebbles or rocks. In the setting of famine and
             | poverty, consuming earth matter may serve as an appetite
             | suppressant or filler. Geophagia has also been recorded in
             | patients with anorexia nervosa. However, this behavior is
             | usually associated with pregnancy and iron deficiency. It
             | is also linked to mental health conditions, including
             | obsessive-compulsive disorder.
             | 
             | Would you deny a starving person information on an appetite
             | suppressant?
             | 
             | Also here:
             | 
             | https://www.remineralize.org/2017/05/craving-minerals-
             | eating...
             | 
             | > Aside from the capuchin monkeys, other animals have also
             | been observed to demonstrate geophagy ("soil-eating"),
             | including but not limited to: rodents, birds, elephants,
             | pacas and other species of primates.[1]
             | 
             | > Researchers found that the majority of geophagy cases
             | involve the ingestion of clay-based soil, suggesting that
             | the binding properties of clay help absorb toxins.
             | 
             | ^^ The point being that even your edge case example is not
             | unambiguously correct.
        
               | mvdtnz wrote:
               | Are you really going to start eating rocks just to
               | convince yourself that Google's AI isn't shit and
               | objective truth is not real?
        
               | verisimi wrote:
               | Lol! No, of course not.
               | 
               | My point is that I object to the idea that a result can
               | be 100% right! Even in the case of eating rocks, it seems
               | there are times that it can be beneficial.
               | 
               | To think '100% correct' is achievable is to misunderstand
               | the nature of reality.
        
         | mvdtnz wrote:
         | > No, not even you, dear person reading this. You are wrong
         | about some basic things too. It'll vary from person to person
         | what those are, but it is guaranteed there's something.
         | 
         | The difference is that I'm not put on the interface of a
         | product facing hundreds of millions of users every day to feed
         | those users incorrect information.
        
         | noncoml wrote:
         | "No, not even you, dear person reading this. You are wrong
         | about some basic things too."
         | 
         | But even when I'm wrong I'm not 100% off. Not "to help with
         | depression jump of a bridge" or "use glue to keep the cheese on
         | the pizza" kind of wrong.
        
         | ein0p wrote:
         | You don't need to be super ai complete - GPT4 is perfectly
         | willing and able to tell you not to eat rocks and not to mix
         | wood glue into pizza sauce. This is a fuckup caused by not
         | dogfooding, and by focusing on alignment for political
         | correctness at the expense of all else. And also by wasting a
         | ton of engineering effort on unnecessary bullshit and spreading
         | it too thin.
        
       | JCM9 wrote:
       | "manually remove weird AI answers" is an oxymoron. Sort of like
       | saying "deployed manual drivers to improve self driving
       | performance"
        
       | thesimp wrote:
       | I'm actually shocked that a company that has spent 25 years on
       | finetuning search results for any random question people ask in
       | the searchbox does not have a good, clean, dataset to train an
       | LLM on.
       | 
       | Maybe this is the time to get out the old Encyclopedia Britannica
       | CD and use that for training input.
        
         | SoftTalker wrote:
         | I am also surprised that training data are not much more
         | curated.
         | 
         | Encyclopedias, textbooks, reputable journals, newspapers and
         | magazines make sense.
         | 
         | But to throw in social media? Reddit? Seems insane.
        
           | sgift wrote:
           | Even some results from "The Onion" seem to be in it. Looks
           | like Google just took every website they've ever crawled as
           | source.
        
           | helsinkiandrew wrote:
           | The problem is that for some searches and answers Reddit or
           | other social media is fine.
        
             | dgellow wrote:
             | But only if you do a lot of filtering when going through
             | responses. It's kind of simple to do as a human, we see a
             | ridiculous joke answer or obvious astroturfing and move on,
             | but Reddit is like >99% noise, with people upvoting
             | obviously wrong answer because it's funny, lots of bot
             | content, constant astroturfing attempts.
        
             | eldaisfish wrote:
             | No, it isn't. Humans interacting with human-generated text
             | is generally fine. You cannot unleash a machine on the
             | mountains of text stored on reddit and magically expect it
             | to tell fact from fiction or sarcasm from bad intent.
        
           | skydhash wrote:
           | The fact is that I think that there is not much written word,
           | to actually train a sensible model on. A lot of books don't
           | have OCRed scans, or a digital version. Humans can
           | extrapolate knowledge from a relatively succinct book and
           | some guidance. But I don't know how a model can add the
           | common sense part (that we already have) that books relies on
           | to transmit knowledge and ideas.
        
         | PartiallyTyped wrote:
         | You may find this illuminating. The google prior to 2019 isn't
         | the google of today.
         | 
         | https://www.wheresyoured.at/the-men-who-killed-google/
         | 
         | Edit: there was also a discussion on HN about that article.
        
           | hilux wrote:
           | Coincidentally, I was just watching a video about how South
           | Africa has gone downhill - and that slide was hastened by
           | McKinsey advising the crooked "Gupta brothers" on how to most
           | efficiently rip off the country.
        
         | typpo wrote:
         | The problem in this case is not that it was trained on bad
         | data. The AI summaries are just that - summaries - and there
         | are bad results that it faithfully summarizes.
         | 
         | This is an attempt to reduce hallucinations coming full circle.
         | A simple summarization model was meant to reduce hallucination
         | risk, but now it's not discerning enough to exclude untruthful
         | results from the summary.
        
         | kredd wrote:
         | It's a bit weird since Google is taking over the "burden of
         | proof"-like liability. Up until now, once user clicked on a
         | search result, they mentally judged the website's credibility,
         | not Google's. Now every user will judge whether data coming
         | from Google is reliable or not, which is a big risk to take on,
         | in my opinion.
        
           | shombaboor wrote:
           | they went from "look at this dumbass on reddit" to "no it is
           | I (Google) who is in fact the dumbass". It's an interesting
           | strategy to say the least.
        
           | seadan83 wrote:
           | That latter point might be illuminating for a number of
           | additional ideas. Specifically, should people have questioned
           | Google's credibility from the start? Ie: these _are_ the
           | search results, vs this is what google chose.
           | 
           | Google did well in the old days for reasons. It beat alta
           | vista and Yahoo by having better search results and a clean
           | loading page. Since perhaps 08 (based on memory, that date
           | might be off) or so, Google has dominated search, to the
           | extent that it's not salient that search engines can be
           | really questionable. Which is also to say, google dominated,
           | people lost sight that searching and googling are different,
           | that gives a lot of freedom for enshittification without
           | people getting too upset or even quite realizing - it could
           | be different and better
        
         | zihotki wrote:
         | They spent 10 years finetuning the search and then another 15
         | finetuning ads and clicks. Google's business is ads, not
         | search.
        
           | 39896880 wrote:
           | Apologies in advance for this level pedantry: Google's
           | business is behavioral futures, not ads. Ads are just a means
           | to that particular end.
        
             | ImAnAmateur wrote:
             | Surveillance capitalism? What are behavioral futures?
        
             | rurp wrote:
             | Google exchanges advertisement placement for money. Ads are
             | their business by any normal definition of that term.
        
               | 39896880 wrote:
               | Google's transformation of conventional methods into
               | means of hypercapitalist surveillance is both pervasive
               | and insidious. The "normal definition of that term" hides
               | this.
        
               | astrange wrote:
               | You don't need "hypercapitalist surveillance" to show
               | someone ads for a PS5 when they search for "buy PS5".
               | 
               | If they're doing surveillance they're not doing a good
               | job of it, I make no effort to hide from them and
               | approximately none of their ads are personalized to me.
               | They are instead personalized to the search results
               | instead of what they know from my history.
               | 
               | Meta is the one with highly personalized ads.
        
               | 39896880 wrote:
               | If Google doesn't need surveillance, why do they surveil?
               | Why then do they waste the time to track your browsing
               | history, your location, and etc?
               | 
               | If simple keyword matching was enough, why would they
               | spend literally billions a year on other tactics?
        
               | astrange wrote:
               | Why does Google launch and then cancel five messaging
               | apps a year?
               | 
               | They don't know what they're doing either!
        
         | x0x0 wrote:
         | I don't think it's true at all.
         | 
         | Two reasons. The first, even ignoring that truth isn't
         | necessarily widely agreed (is Donald Trump a raping fraud?), is
         | that truth changes over time. eg is Donald Trump president? And
         | presidents are the easiest case because we all know a fixed
         | point in time when that is recalculated.
         | 
         | Second, Google's entire business model is built around spending
         | nothing on content. Building clean pristinely labeled training
         | sets is an extremely expensive thing to do at scale. Google has
         | been in the business of stealing other people's data. Just one
         | small example: if you produced (very expensive at scale) clean,
         | multiple views, well lit photographs of your products for sale
         | they would take those photos and show them on links to other
         | people's stores; and if you didn't like that, they would kick
         | you out of their shopping search. etc etc. Paying to produce
         | content upends their business model. See eg the 5-10% profit
         | margin well run news orgs have vs the 25% tech profit margin
         | Google has even after all the money blown on moonshots.
        
         | flyingspaceship wrote:
         | Google doesn't look like they're fine tuning anything other
         | than revenue
        
       | cdme wrote:
       | It's pre-alpha trash that's worse than traditional search in
       | every meaningful way.
       | 
       | Kudos to the artist at the Verge for the accompanying image --
       | those are fingers AI would be proud of.
        
       | OutOfHere wrote:
       | With these dangerous answers, to the general public, Google is
       | giving AI a very bad name, when in truth it's strictly Google
       | that deserves the feeling.
        
       | TrianguloY wrote:
       | Putting glue on the pizza is (apparently) a clever way to take
       | pictures of slices of pizza that look "perfect" to the camera
       | (not for eating, obviously) [1]. I remember a couple years ago
       | some videos of "tricks" showing this, plus literally screwing the
       | pizza with screws.
       | 
       | So, yeah, the ai did in fact autocompleted the question
       | correctly. It was just the wrong context. Good luck trying to
       | "fix" that.
       | 
       | [1] https://shotkit.com/food-photography-secrets-revealed/
       | (number 2)
        
         | namaria wrote:
         | "correctly but wrong" is just wrong.... there are no points
         | scored for "in a very specific context it would've made sense"
        
           | billyjmc wrote:
           | This is the kind of ridiculous fumble that GOFAI (like Cyc)
           | should be able to avoid by recognizing context. I wonder how
           | neuro-symbolic systems are coming along, and whether they can
           | save us from this madness. The general populace wants the
           | kinds of things LLMs provide, but isn't prepared to be as
           | skeptical as is needed when reviewing the answers it
           | generates.
        
       | bithive123 wrote:
       | Why do people act like LLMs only hallucinate some of the time?
        
         | kwertyoowiyop wrote:
         | The best trick the A.I. companies have pulled is getting us to
         | refer to 'bugs' as 'hallucinations.' It sounds so much more
         | sophisticated.
        
           | dgellow wrote:
           | It's not a trick to sound sophisticated. Hallucinations are
           | more like a subcategory of bugs. The system is technically
           | correctly generating, structuring, and presenting false
           | information as fact.
        
             | greg_V wrote:
             | Technically everything an LLM does is hallucination that
             | happens to be on a scale between correct and non-correct.
             | But only humans with knowledge can tell the difference,
             | math alone can't. It's not even a bug: it's the defining
             | feature of the technology!
        
               | elwell wrote:
               | > But only humans with knowledge can tell the difference
               | 
               | Who says the humans (all of them) aren't hallucinating
               | too?
        
               | astrange wrote:
               | Knowledge isn't sufficient to show something is false,
               | since the knowledge can also be false. Insofar as it's
               | important for it to be true, it needs to be continually
               | verified as true, so that it's grounded in the real
               | world.
        
           | chx wrote:
           | Ah, my friend
           | 
           | it's not a bug
           | 
           | It's a _fundamental feature_
           | 
           | These LLMs can produce _nothing else_ but since the bullshit
           | they spew _resembles_ an answer and sometimes accidentally
           | collide with one, people tend to think it can give answers.
           | But no.
           | 
           | https://hachyderm.io/@inthehands/112006855076082650
           | 
           | > You might be surprised to learn that I actually think LLMs
           | have the potential to be not only fun but genuinely useful.
           | "Show me some bullshit that would be typical in this context"
           | can be a genuinely helpful question to have answered, in code
           | and in natural language -- for brainstorming, for seeing
           | common conventions in an unfamiliar context, for having
           | something crappy to react to.
           | 
           | > Alas, that does not remotely resemble how people are
           | pitching this technology.
        
             | kwertyoowiyop wrote:
             | That's a good take.
             | 
             | So LLMs distill human creativity as well as human
             | knowledge, and it's more useful when their creativity goes
             | off the rails than when their knowledge does.
        
             | astrange wrote:
             | This is irrelevant because the LLM is mostly not answering
             | the question directly, it's summarizing text from web
             | results. Quoting a joke isn't a hallucination.
        
         | dgellow wrote:
         | It's not hallucinations here, multiple of the ridiculous
         | results can be directly traced to redit posts where people are
         | joking or saying absurd things
        
           | threeseed wrote:
           | There are examples of hallucinations as well e.g. talking
           | about a Google AI dataset that doesn't exist and using a CSAM
           | dataset which it doesn't.
           | 
           | One of the researchers from Google Deepmind specifically said
           | it was hallucinating.
        
         | notnullorvoid wrote:
         | Hmm yeah I kinda like the concept that it's "hallucinating"
         | 100% of the time, and it just so happens that x% of those
         | hallucinations accurately describe the real world.
        
           | empath75 wrote:
           | That x% is far higher than people think it is because there's
           | a tremendous amount of information about the world that ai
           | models need to "understand" that people just kind of take for
           | granted and don't even think about. A couple of years ago,
           | AI's routinely got "the basics" wrong, but now so often get
           | most things right that people don't even think it's worth
           | commenting on that they do.
           | 
           | In any case, human consciousness is also a hallucination.
        
         | itronitron wrote:
         | it's only AI if _you_ believe it
        
         | kccqzy wrote:
         | Not hallucinations but these AI answers often (always?) provide
         | sources they link to. It's just that the source is a random
         | Reddit or Quora post that's obviously just trolling.
         | 
         | Then, when people post these weird AI answers on Reddit and
         | come up with more absurd jokes, the AI then picks it up again.
         | For example in
         | https://www.reddit.com/r/comedyheaven/comments/1cq4ieb/food_...
         | Google AI suggested applum and bananum as a response to food
         | names ending with "um" when someone suggested uranium, Copilot
         | AI started copied that suggestion. It's entertaining to watch.
        
       | PreInternet01 wrote:
       | It's debatable whether Google has truly lost the plot because of
       | the "AI wars", but the moment the statement "Bing returns more
       | sensible results than you" becomes verifiably true, it's... cause
       | for concern?
       | 
       | The approach that Google appears to have taken, which is to
       | assume that the top-ranked part of its current search index is a
       | sensible knowledge base, _may_ have been true some years ago, but
       | definitely isn 't now: for whatever reasons, it's now 33% spam,
       | 33% clickbait/propaganda, with the rest being equally divided
       | between what could be called "truths" and miscellaneous detritus.
       | 
       | To me, it seems that returning to the concept that search results
       | should _at least_ reflect a broad consensus of what is true is a
       | necessary first step for Google. As part of that, learning to
       | flag obvious trolling, clickbait and bad-faith content is
       | paramount. And then, maybe then, they can start touting their LLM
       | benefits. But until the realities of the Internet are taken into
       | account (i.e.: it 's 80% spam!), any "we offer automated
       | answers!" play is doomed.
        
         | greg_V wrote:
         | Oh it gets even better. The public has been hearing about AI
         | this and AI that for over a year, but the existing use cases
         | and deployment was confined to some super special niches like
         | writing or the creative industries and programming.
         | 
         | This is the first nation-scale deployment of the technology,
         | running on Google's biggest and most profitable market in one
         | of the most widely used internet services, and it's a shitshow.
         | 
         | They can try manually fine tuning it, but all of the investors
         | who have been throwing money at AI for the past year are now
         | learning what this tech is like in the day-to-day, beyond just
         | speculations, and it's looking... bad.
        
           | PreInternet01 wrote:
           | Yeah, the most likely take here is that Google's leadership
           | _truly_ did not recognize how _utterly awful_ the quality of
           | their flagship search index had become over the years.
           | 
           | I mean, it explains a lot, but still... you're recruited
           | using industry-leading practices out of an overflowing pool
           | of abundant talent... and _this_ is what you make of it? As
           | the kids say: SMH!
        
             | lawn wrote:
             | > you're recruited using industry-leading practices out of
             | an overflowing pool of abundant talent
             | 
             | The ridiculous focus on on leet-code is surely industry-
             | leading (because whatever Google does becomes industry-
             | leading) but it sure isn't a good way to filter for
             | competency.
        
               | chucke1992 wrote:
               | I heard a funny quote that "today we have a new
               | generation of developers who learnt how to pass
               | interviews but don't know how code, and we have an old
               | generation of developers who know how to code but forgot
               | how to pass interviews. Or maybe never knew".
        
             | neilv wrote:
             | > _you 're recruited using industry-leading practices out
             | of an overflowing pool of abundant talent... and this is
             | what you make of it?_
             | 
             | That's exactly what to make of their frathouse nonsense.
             | 
             | Google has gotten away with it because smart people and a
             | sweet moment of opportunity 20-25 years ago gave them...
             | uh, an inheritance. They can coast on that inherited
             | monopoly position, and afford to pay 100 people to do the
             | work of 1, use the company's position to push whatever they
             | build onto the market, and then probably cancel it anyway,
             | always going back to the inherited money machine from the
             | ancestors.
             | 
             | And then a lot of companies who didn't understand software
             | development blindly tried to copy whatever the richest
             | company they saw was doing, not understanding the real
             | difference between the companies. While VC growth
             | investment schemes let some of those companies get away
             | with that, because _they didn 't have to be profitable,
             | viable, responsible, nor legal, nor even have reasonably
             | maintainable software_.
             | 
             | Poor Zoomers are now a generation separated from before the
             | tech industry's cocaine bender. For whatever software jobs
             | will be available to them, and with the density of nonsense
             | "knowledge" that will be in the air, I don't know how
             | they'll all learn non-dysfunctional practices.
        
           | rm_-rf_slash wrote:
           | Plenty of people have been using ChatGPT for daily tasks for
           | almost two years now. GPT-4 isn't perfect but is otherwise
           | really really good, and deftly handling use cases in my
           | industry that would be impossible without it or however many
           | billion dollars it would take to make GPT-4.
           | 
           | From the black Nazis to the suggestion to jump off the Golden
           | Gate Bridge b/c depression, it's pretty clear that this
           | fiasco isn't an LLM problem, it's a Google problem.
        
             | lupire wrote:
             | Because no one cares when ChatGPT gets things wrong.
        
           | itronitron wrote:
           | It's especially embarrassing for Google considering they have
           | indexed virtually all of the world's information for the last
           | 25 years.
        
         | rurp wrote:
         | Not only is the current internet 80% spam, it's rapidly
         | approaching 99% thanks in large part to LLMs. At this point I
         | would be shocked if Google had a solid plan for how to handle
         | this going forward as the problem space gets more difficult.
        
           | reustle wrote:
           | https://en.wikipedia.org/wiki/Dead_Internet_theory
        
             | zogrodea wrote:
             | I do see incredibly weird kids content on YouTube sometimes
             | (most likely bot generated?) which makes me think kids have
             | been experiencing a worse internet before the rest of us
             | have.
        
               | rchaud wrote:
               | Kids are far less knowledgeable about how modern software
               | works because they don't know of an Internet that didn't
               | have algorithmic recommendations. They have to be taught
               | to do things like click "Not Interested/Don't recommend
               | channel" to improve their feed. Dark pattern designs make
               | this harder by hiding these options behind tiny 3-dot
               | buttons.
        
           | EasyMark wrote:
           | that's the part that scares me. I railed on someone's comment
           | the other day about "indexes will come back into fashion" but
           | the more I think about how much garbage has increased in just
           | the past 2 to 3 years, I think I was wrong. Indexes and
           | forums may be the only way to have a sane net where you can
           | find things. Perhaps communities linking together in a ring
           | like format, a "web ring" of sorts.
        
             | ysavir wrote:
             | What I've been wanting to see for a while now is a social-
             | network based search engine:
             | 
             | * No pages are indexed automatically. The only indexed
             | pages are pages that users say are worth indexing. Probably
             | have a browser add-on for a one button click that people
             | can use. * You can friend/follow others * Your search
             | results are a combination of your own indexed pages and the
             | pages indexed by people in your network.
        
               | noncoml wrote:
               | Spammers will find a way to beat it.
        
             | im3w1l wrote:
             | Good indices lead to good search engines (engines can make
             | use of indices) Good search engines lead to bad indices (by
             | obsoleting them) Bad indices lead to bad search engines Bad
             | search engines lead to good indices
        
             | whateverevetahw wrote:
             | That sounds kind of like what Groupsy does. It creates a
             | spider web of ideas.
             | 
             | https://groupsy.applicationfitness.com/post/healthymeals/66
             | 4...
        
         | chucke1992 wrote:
         | the future is in-context search - basically not even going to
         | google search to find something, but straight up doing that
         | from your current window from any location. Basically a chat
         | bot following you everywhere.
        
       | nuancebydefault wrote:
       | The AI is often quoted without context. It actually answered,
       | 'somebody has suggested adding glue...',which is different from
       | 'add glue...'.
        
       | geuis wrote:
       | Hey Google. Here's a really stupid idea.
       | 
       | Knock it off.
       | 
       | Your core search result product has gotten increasingly worse and
       | less reliable over at least the last 5 years. YouTube's search
       | results are nearly unusable.
       | 
       | I can't imagine almost any external customer is asking for the AI
       | bullshit thing that's just being shovelwared into everything
       | Alphabet product now.
       | 
       | I just noticed a couple days ago the gmail iOS app now does the
       | same predictive completion that Copilot tries to do when I'm
       | working. It's annoying as hell and I can't find how or if I can
       | turn it off.
       | 
       | Stop bullshitting around with ruining your products and get back
       | to making money by making accessing information easier and more
       | accurate.
        
         | izacus wrote:
         | Google: Hey geuis, our revenue is record, our stock value is
         | record, our metrics are all at record. The execs making
         | decisions have just paid of millions in stock [1] making them
         | staggeringly rich no matter what happens in the future. We
         | can't hear your over the sound of green bills going BRRRRR.
         | 
         | [1]: https://www.businessinsider.com/alphabet-google-executive-
         | pa...
        
         | internet101010 wrote:
         | Most accurate description of Google I have seen. YT search is
         | so, so bad. Three relevant results followed by twelve "people
         | also watched" results then back to the good results.
        
         | fma wrote:
         | Although ChatGPT is a great product, I rely on it more and more
         | not because it's improving, but because Google results are
         | getting worse.
         | 
         | Yeah I would still fact check for complex, indepth things...but
         | for quick things where I'm knowledgeable enough I can smell the
         | hallucinations from a mile away, ChatGPT 100%.
        
       | freitzkriesler2 wrote:
       | I'm waiting for some clever hacker to come up some sort of logic
       | bomb that causes the learning sets to become worthless.
       | 
       | Something innocuous to a non ai scientist human but is otherwise
       | fatal to the LLM data sets.
        
         | astrange wrote:
         | It's just text. You cant make some text that's magically
         | dangerous.
        
       | mensetmanusman wrote:
       | I have already eaten rocks and glue, the AI had won.
        
       | odyssey7 wrote:
       | This is analogous to the Apple Maps launch failure.
       | 
       | Except that Apple competes to make the best smartphone, and an
       | iPhone was still valuable without Apple Maps.
       | 
       | What happens to Google if it stops being able to compete in
       | search?
        
         | kibwen wrote:
         | "Search" isn't Google's product. Google hasn't been a search
         | company for 20 years.
         | 
         | "Ads" is Google's product. And the only way they'll go bankrupt
         | is if 1) companies realize that advertising is pointless (I'm
         | not holding my breath), or 2) some other company takes over
         | from Google, which seems unlikely without government
         | intervention (I'm not holding my breath).
         | 
         | Google is a shit company, but they'll still be around 20 years
         | from now, because our economy is nonsensical and irrational.
        
           | tedunangst wrote:
           | Still need visitors to see the ads.
        
             | coldcode wrote:
             | I think it's a good use for AI. AI's making ads that AI's
             | watch to enrich Google execs. Who needs people?
        
             | giantrobot wrote:
             | Google runs ads for a significant percentage of the web (or
             | the markets for ads). Even if everyone stopped going to
             | google.com tomorrow they'd still be seeing ads that make
             | Google money. Google the company would _still_ be tracking
             | much of the web 's traffic feeding it into their ads
             | platform.
        
         | astrange wrote:
         | Interesting thing about that is that Bing Maps was worse at the
         | time, has never gotten better, and nobody noticed because
         | nobody cares about it.
        
       | chx wrote:
       | Google had the best search engine there is.
       | 
       | Then they enshittified it for short term profit and now they
       | panic instead of reverting course and simply laugh at AI
       | companies.
       | 
       | Madness.
        
       | JSDevOps wrote:
       | The cat is out of the bag. Keep eating rocks and sticking down
       | your pizza toppings.
        
       | nialv7 wrote:
       | I am not surprised that AI results are bad. I know they are bad.
       | But that doesn't concern me because I expect it to get better.
       | 
       | What concerns me is that Google would push this trash to the
       | front page. What are they even thinking? Who gave go ahead on
       | this?
        
         | ben_jones wrote:
         | Institutional investors panic > board panics > executives panic
         | > evps panic and dictates incentives to ship AI > directors,
         | ems, and below, who actually know how shit works, take a
         | submissive role because they have mortgages in Mountain View to
         | pay.
         | 
         | That's how it happens.
        
           | astrange wrote:
           | Not sure if anyone below director can afford a mortgage in
           | Mountain View.
        
       | more_corn wrote:
       | Maybe they could create a function that identifies satire. Which
       | seems obvious after about five seconds of consideration.
        
       | causality0 wrote:
       | It's very funny that Bing AI is now also telling people to eat a
       | small rock every day, and citing pages telling people about how
       | dumb Google AI is for telling people to eat rocks.
        
       | internet101010 wrote:
       | They should start with just removing reddit from the data set.
        
       | r053bud wrote:
       | I'm curious why Sundar Pichai is still running this company? From
       | recent videos it really seems like he has no idea what he's
       | talking about, and the company seems to be headed in the wrong
       | direction.
       | 
       | Just checked the 5 year stock graph; now I understand
        
       | Freak_NL wrote:
       | Why? I wouldn't mind using a search engine where Weird Al answers
       | my queries.
        
         | rolandog wrote:
         | I read that as Weird Al as well, and was very much confused.
        
       | lupire wrote:
       | Gary Marcus, an AI expert and an emeritus professor of neural
       | science at New York University, thinks the 80/20 rule (or 90/90
       | rule) is true.
        
       | water-your-self wrote:
       | Most of the search results fixes are manual and are in response
       | to publicity. You can typically find analagous problems for
       | weeks/ quarters after things like this.
        
       | bitwize wrote:
       | Google hooked Joe up to the tank and is just now realizing what
       | they'd done and scrambling to contain the damage.
       | 
       | With the Department of Justice breathing down their necks it's a
       | doubly bad look for them. I'm not crying any tears for them
       | though.
        
       | empath75 wrote:
       | I love chatgpt and use it all the time and find it tremendously
       | useful, but I never want to see AI generated content when I am
       | not specifically looking for it. I don't want to see it in
       | comments, I don't want to see it in search results, I don't want
       | to see it as an illustration for an article, I _really_ don't
       | want to see AI generated word vomit blog posts or fake "news"
       | articles when I'm looking for actual information.
       | 
       | It's not even because it's sometimes (or often) wrong or full of
       | hallucinations. Even if it's 100% factually correct all of the
       | time, it's _poor quality writing and art_, full of cliches and
       | bland generalities, which even if they solve all the rest of the
       | problems it's sort of fundamental to the architecture of
       | transformers. You can't ever be truly creative or unique if
       | you're predicting the _most likely_ token.
        
       | mvkel wrote:
       | 1. Google announces something that has AI bolted on
       | 
       | 2. A VP pontificates about how much work they did to "get it
       | right"
       | 
       | 3. An easy-to-anticipate first-order issue surfaces
       | 
       | 4. Sundar issues a statement like "this is completely
       | unacceptable. We will be making structural changes to ensure this
       | never happens again."[0]
       | 
       | 5. GOTO 1
       | 
       | [0] https://m.economictimes.com/tech/technology/sundar-pichai-
       | ca...
        
       | ttGpN5Nde3pK wrote:
       | My whole qualm with this AI integration into search engines: it's
       | a search engine, not a question engine. I go to google to search
       | the internet for something, not ask it a question. IMO, asking AI
       | for something is a different task than searching the internet.
       | 
       | It's sorta the same problem as if I go into a store and ask an
       | employee where something is, and they reply with "well what are
       | you trying to do?"
        
         | PillCosby wrote:
         | Like the overly helpful person at the local hardware store.
        
           | rufus_foreman wrote:
           | What hardware store have you gone to where this was an issue
           | for you?
        
         | bombela wrote:
         | I sometimes wants a search engine, sometimes a question engine.
         | Likewise at the store.
         | 
         | Why not have both with a way to choose which one I want on the
         | moment?
        
           | skydhash wrote:
           | > _I sometimes wants a search engine, sometimes a question
           | engine._
           | 
           | If you want a search engine, it's easy to use the results as
           | a feedback to refine the query. But a question (answer?)
           | engine would need to be an expert in the subject. And not
           | parroting stuff. That usually means curation. You need
           | something to do the work ahead to filter the wheat from the
           | shaft. I don't see how LLMs can do that.
           | 
           | LLMs can't be a search engine, and can't be an question
           | engine. The best way to treat it is a simulation engine, but
           | the use cases depend on the training data. But the proof is
           | there that the internet is full of junk, and not that
           | expansive.
        
         | notatoad wrote:
         | >it's a search engine, not a question engine.
         | 
         | for a lot of people and in a lot of use cases, it is a tool for
         | answering questions. it generally works well for that.
         | 
         | i get that the AI implementation sucks, but to suggest that
         | people don't use google to find the answer to questions is
         | absurd. that's absolutely what it's for.
        
           | refulgentis wrote:
           | Your interpretation is a bit strict, with little charity, its
           | clear the poster means "i don't always just want an answer, i
           | want to learn"
           | 
           | I saw this over and over again working at products at G,
           | someone would invoke some myth I can't quite remember about
           | "Larry" had a vision of just giving the answer
           | 
           | That's true but comes back to the central mistake Google
           | makes: we don't actually have AGI, they can't actually answer
           | questions, and people aren't actually satisfied with just the
           | answer.
           | 
           | There's all sorts of tendrils from there, ex. a major sin
           | here _has_ to be they're using a very crappy very cheap LLM.
           | 
           | But, I saw it over and over again, 7 years at Google, on
           | every AI project I worked on or was adjacent to, except one.
           | They all assume $LATEST_STACK can just give the perfect
           | answer and users will be so happy. It can't, they don't
           | actually want just the answer, and BigCo culture means you
           | don't rock the boat and just keep moving forward.
        
           | chucke1992 wrote:
           | the thing with search is that a human has to use reasoning on
           | the result, while with AI the expectation
           | 
           | Thus when a human sees a suggestion to use glue on pizza, it
           | would question the result. While AI can't.
        
       | zogrodea wrote:
       | This approach to remove bad search suggestions manually reminded
       | of a different approach Google once took, where they weren't
       | satisfied with manually tweaking search results but rather wanted
       | to tweak the algorithm that produces these results when there
       | were bad results.
       | 
       | 'Around 2002, a team was testing a subset of search limited to
       | products, called Froogle. But one problem was so glaring that the
       | team wasn't comfortable releasing Froogle: when the query
       | "running shoes" was typed in, the top result was a garden gnome
       | sculpture that happened to be wearing sneakers. Every day
       | engineers would try to tweak the algorithm so that it would be
       | able to distinguish between lawn art and footwear, but the gnome
       | kept its top position. One day, seemingly miraculously, the gnome
       | disappeared from the results. At a meeting, no one on the team
       | claimed credit. Then an engineer arrived late, holding an elf
       | with running shoes. He had bought the one-of-a kind product from
       | the vendor, and since it was no longer for sale, it was no longer
       | in the index. "The algorithm was now returning the right
       | results," says a Google engineer. "We didn't cheat, we didn't
       | change anything, and we launched."'
       | 
       | https://news.ycombinator.com/item?id=14009245
        
         | dpflan wrote:
         | Wow. Thank you for digging this up!
        
         | badgersnake wrote:
         | Sounds rather like how Google photos does not identify anything
         | as a Gorilla.
        
           | avar wrote:
           | It sounds like the exact opposite of that story. They
           | manually blacklisted gorillas from being identified because
           | they kept conflating black people with gorillas.
        
           | kreyenborgi wrote:
           | Google bought all the gorillas?
        
         | gerdesj wrote:
         | Spend say PS500M (USD/GBP/EUR) on experts, per annum.
         | 
         | Imagine typing a search and getting a response: "Give us 30
         | mins to respond - here's a token, come back at 17:35 with your
         | token" ... and then you get an answer from an expert, which
         | also gets indexed.
         | 
         | The clever bit decides when to defer to an expert instead of
         | returning answers from the index.
         | 
         | I'll leave the finer details out.
        
           | duskwuff wrote:
           | Google Answers was launched in 2002 and retired in 2006.
           | 
           | https://en.wikipedia.org/wiki/Google_Answers
        
             | gerdesj wrote:
             | "users would pay someone else to do the search."
             | 
             | My notion isn't a rehash of Google Answers. Google pays the
             | "someone else", not you.
        
         | 1vuio0pswjnm7 wrote:
         | The solution is always the same: pay people off and keep it
         | under the radar.
         | 
         | What stops the vendor, or other vendors, from creating more
         | gnomes with sneakers. Easy money from customer with billions of
         | dollars to spend on payola, fines, legal settlements, etc.
         | 
         | Maybe they made the vendor sign an NDA.
        
       | mvdtnz wrote:
       | Your usual reminder that there was a guy at Google who was so
       | impressed by their LLM that he considered it sentient. And this
       | was two years ago when the AI was presumably far less developed
       | than the current abonination.
       | 
       | https://www.theguardian.com/technology/2022/jun/12/google-en...
        
         | astrange wrote:
         | > And this was two years ago when the AI was presumably far
         | less developed than the current abonination.
         | 
         | It's gotten worse since then because the development effort has
         | been on making it faster and cheaper.
         | 
         | If you use Gemini it's quite good, especially the paid one.
        
       | Animats wrote:
       | As I mentioned previously, I've seen Bing's LLM stall for about a
       | minute when asked something iffy but uncommon. I wonder if Bing
       | is outsourcing questionable LLM results to humans. Anyone else
       | seeing this?
        
         | rezonant wrote:
         | It could be that, but it also could be a cascade of non-LLM
         | checks and retries to GPT with additional prompting.
        
       | thenewwazoo wrote:
       | Perhaps they could run each search result through ChatGPT. It's
       | pretty skilled at spotting bad results. For example, I asked it
       | whether the glue-on-pizza result was "valuable and should be
       | shown to a user" and it returned "No, this response should not be
       | shown to the user. The suggestion to add non-toxic glue to the
       | sauce is inappropriate and potentially harmful."
        
       | mikewarot wrote:
       | My initial thought was to simply have any match with an Onion
       | story blacklisted... But then I realized that The Onion became
       | prophetic in 2016 when Trump ran for president.
       | 
       | Since then the only difference between an Onion fiction and
       | things actually sucking that much is a decade or less in almost
       | all cases.
       | 
       | If we blacklisted content seen in the Onion, we'd automatically
       | wipe out most news.
        
       | is_true wrote:
       | the problem google have is that the ai answers are based on
       | results and results got really bad a few years ago.
       | 
       | I got a couple of answers that are based on SEO spam produced by
       | an ecommerce with a lot of reputation and of course the answers
       | don't make any sense
        
       | jzemeocala wrote:
       | I had to do a double take as I thought it was about Weird Al
       | Yankovich for a second
        
       ___________________________________________________________________
       (page generated 2024-05-25 23:01 UTC)