[HN Gopher] Google scrambles to manually remove weird AI answers...
___________________________________________________________________
Google scrambles to manually remove weird AI answers in search
Author : rntn
Score : 199 points
Date : 2024-05-25 15:24 UTC (7 hours ago)
(HTM) web link (www.theverge.com)
(TXT) w3m dump (www.theverge.com)
| dazc wrote:
| Manually removing rogue AI results is kind of ironic isn't it?
| tflol wrote:
| You could almost argue these results are directly human
| generated.
|
| edit: And in that case, who is the arbiter of truth?
| rchaud wrote:
| Pay no attention to the Accenture contractors behind the
| curtain!
| JCM9 wrote:
| Pay no attention to the army of people behind the curtain
| pulling levers trying to make it look like they've actually but
| a real AI.
| seydor wrote:
| We need an AI for that
| nikanj wrote:
| For this tech cycle, AI is short for Actually Indians
| Barrin92 wrote:
| generative AI is essentially three day labourers from an
| emerging economy in a trenchcoat. From data labelling, to
| "human reinforcement", to manually cleaning up nonsensical AI
| results.
| alfiedotwtf wrote:
| Google keep making these same large and embarrassing mistakes
| time and time again. I think it's because their devs don't eat
| enough rocks every day.
| ta20240226 wrote:
| Is it the rocks or that the pizza that is served at Google
| doesn't have enough glue?
| moomoo11 wrote:
| Just focus on making useful software to improve people's lives.
| Holy fuck the last five years feel like such a waste.
| illusive4080 wrote:
| But AI will solve everything!
| rchaud wrote:
| ...for our shareholders
| rvnx wrote:
| I hope AI will bring back the "Sort by date" button on
| Google Reviews, and add somewhere a Google Maps link.
|
| Who knows, maybe AI can bring back exact keyword matches,
| or correct basic math calculations on Google Search too.
| moomoo11 wrote:
| It will cost $2 billion of nvidia chips and it won't
| work.
| tymscar wrote:
| Maybe AI could bring back the pre gen AI tech scene
| cratermoon wrote:
| Throwing good money after bad.
|
| Companies spent all that money on high end GPUs for crypto
| mining and that went bust, now gotta figure out something to do
| with the hardware to try to recoup some of the investment.
| Google pumped $1.5 Billion into crypto.
| astrange wrote:
| Google has TPUs.
| szundi wrote:
| Management realized that they are not good enough to sustain
| progress, so they humbly allocate resources to the next
| generation: AI
| SoftTalker wrote:
| Trained on Twitter and Reddit. Garbage in/Garbage out, as it has
| always been.
| threeseed wrote:
| Except that 90% of Reddit isn't garbage. It's really useful.
|
| Problem is Google can't tell what is garbage or not. No LLM
| can.
| SoftTalker wrote:
| I'd argue it's far less than 90% but yes, there is some good
| information there. But weeding out the noise is what needs to
| happen, and (for some topics more than others) there is an
| awful lot of it.
| maximinus_thrax wrote:
| > Except that 90% of Reddit isn't garbage. It's really
| useful.
|
| Citation needed. I've been a Reddit user since its inception
| and honestly except for niche hobby subreddits, Reddit is
| mostly low effort garbage, bots and rehashed content. I'd
| wager that mainstream subreddits are 99% garbage for training
| an LLM for anything other than shitposting.
| skydhash wrote:
| Pretty much. There were some good information, and even
| book worthy ones. But they were the ones that bubble to the
| top in helpful and knowledgeable communities. The rest is
| junk.
| giantrobot wrote:
| Even in the niche hobby subreddits there can be a really
| high garbage factor. There's plenty of _well meaning_
| posters that are just wrong. They 're not trying to mislead
| or lying they're just unaware they're wrong.
| CatWChainsaw wrote:
| How hard can it possibly be to just turn off the entire AI-
| generated overview functionality given that it just got
| introduced...
| YetAnotherNick wrote:
| It seems to be turned off for me. And I was in beta testing for
| a month. Or maybe they are figuring out who is doing weird
| searches and turning off for them.
|
| In any case this thing is just hilarious. Just right after
| their AI painted historical figures as black.
| CatWChainsaw wrote:
| So far I have not seen it ever in either Firefox or 2-3
| Chromium-based browsers, on a handful of computers in
| multiple locations.
|
| I don't see a way google can make this work. As I understand
| it LLM confabulations can be reduced but never eliminated
| owing to how they're built. Google could try and create a
| fact-checking department to make queries reduced to
| falsehoods or bullshit but then they face the problem of
| appointing themselves arbiters of the "truth". The only way
| to win is to not play the game, as I see it. I wish the
| collective AI fever would break already.
| rchaud wrote:
| very hard indeed, if you're optimizing for favourable opinions
| from Wall St analysts come earnings time .
| caesil wrote:
| Who knows how many of these are fake. People have been dropping
| inspect-element-manipulated screenshots all over twitter.
|
| https://www.nytimes.com/2024/05/24/technology/google-ai-over...
|
| > A correction was made on May 24, 2024: An earlier version of
| this article referred incorrectly to a Google result from the
| company's new artificial-intelligence tool AI Overview. A social
| media commenter claimed that a result for a search on depression
| suggested jumping off the Golden Gate Bridge as a remedy. That
| result was faked, a Google spokeswoman said, and never appeared
| in real results.
|
| that screenshot was tweeted by @allgarbled. ten minutes before,
| they tweeted:
|
| >free engagement hack right now is to just inspect element on the
| google search AI thing and edit it to something dumb. hurry up,
| this deal won't last forever
| ethbr1 wrote:
| I'd say the broader issue here is a lack of transparency into
| results.
|
| _If_ Google is sending bad results, who can prove that?
| realreality wrote:
| That's always been an issue. Years ago, researchers
| demonstrated in an experiment that they could swing public
| opinion about electoral candidates by manipulating search
| results. Who knows if Google took that experiment and ran
| with it?
| ethbr1 wrote:
| I mean, that's always been the TikTok argument, to me.
|
| Widely-used platforms that can +/- 1% their algorithms to
| affect democracy have pretty high burdens of
| trust/transparency, and we're not close to that with any
| platform (Chinese or not) that I'm aware of.
|
| Meta's probably the closest, because of scrutiny, but afaik
| even their transparency isn't sufficient for realtime
| attestation.
| Aloisius wrote:
| I have personally reproduced several like the interest one and
| the hippo eggs one, though not that one specifically.
|
| Google has started restricting AI Overviews so much now that
| most of the example queries on Google's Search Labs page
| doesn't even trigger it anymore.
| pilooch wrote:
| So Google hasn't used an LLM to generate and test weird queries ?
| This is not putting the bar very high for the whole industry...
| There'd be so much to gain from a clean deployment... Either it
| hard, either it is a rush. As a machine learnist, I believe it's
| actually impossible, by design of the autoregressive LLM. This
| race may we'll be partially to the bottom.
| CaptainOfCoit wrote:
| > So Google hasn't used an LLM to generate and test weird
| queries ?
|
| What about simple manual testing? Seems to have skipped QA
| completely, automated or not.
| pilooch wrote:
| The adversarial surface to the LLM remains enormous, manual
| cannot handle it.
| jameshart wrote:
| Asking how to prevent cheese from sliding off pizza is not
| an adversarial prompt.
| nicklecompte wrote:
| There has been a lot of excitement recently about how using
| lower precision floats only slightly degrades LLM
| performance. I am wondering if Google took those results at
| face value to offer a low-cost mass-use transformer LLM, but
| didn't test it since according to the benchmarks (lol) the
| lower precision shouldn't matter very much.
|
| But there is a more general problem: Big Tech is high on
| their own supply when it comes to LLMs, and AI generally.
| Microsoft and Google didn't fact-check their AI even in high-
| profile public demos; that strongly suggests they sincerely
| believed it could answer "simple" factual questions with high
| reliability. Another example: I don't think Sundar Pichai was
| _lying_ when he said Gemini taught itself Sanskrit, I think
| he was given bad info and didn't question it because
| motivated reasoning gives him no incentive to be skeptical.
| flyingspaceship wrote:
| Well yeah imagine how much money there is to make in
| information when you can cut literally everyone else
| involved out, take all of the information and sell it with
| ads and only give people a link at the bottom, if that is
| even needed at all
| ADeerAppeared wrote:
| > So Google hasn't used an LLM to generate and test weird
| queries ?
|
| You don't even need an LLM for that. Google will almost
| certainly have tested.
|
| The test result is just politically-unacceptable within the
| company: It doesn't work, it's a architectural issue inherent
| to the technology, we can't fix it.
|
| Instead, they just rush to patch any _specific, individual_
| errors that show up, and claim that these errors are "rare
| exceptions" or "never happened".
|
| What's going on here is that Google (and most other AI firms)
| are just trying to gaslight the world about how error-prone AI
| is, because they're in too deep and can't accept the reality
| themselves.
| cjk2 wrote:
| They already know it's a shit show. They are trying to push
| it along until it's someone else's fault.
| ADeerAppeared wrote:
| I'm not convinced the executive layer is aware how dire the
| problem is.
|
| On one hand, their support for outsourcing programmes;
| "Training Indians on how to use AI", suggests they realize
| AI tooling without human cleanup is a crapshoot.
|
| On the other hand, they keep digging. This kind of
| gaslighting is an old and proven trick for _genuinely rare
| problems_ , but it doesn't work if your issues are fairly
| common, as they'll get replicated before you can get a fix
| out.
|
| Similarly, they're gambling with immense legal risks and
| sacrificing core products for it. They're betting the farm
| on AI, it may kill the company.
| cjk2 wrote:
| I think they are more than aware but will magically
| disappear after cashing their stock just about the point
| the bubble pops. Don't forget that the AI industry is
| almost 100% based on hype. Microsoft will be the largest
| victim here, their entire product portfolio being turned
| into a nuclear fallout zone almost overnight. Satya and
| friends are going to trash the whole org.
|
| I regularly speak to laypeople who assume that it's some
| magical thing without limits that makes their lives
| better. They are also 100% unaware of any applications
| that will actually make their lives better. End game
| occurs when those two disconnected thoughts connect and
| they become disinterested. The power users and engineers
| who were on it a year ago are either burned out or
| finding the limitations a problem as well now. There is
| only magical thinking, lies and hope left.
|
| Granted there are some viable applications but they are
| rather less overstated than anything we have no and there
| are even negative side effects of those (think image
| classification, which even if it works properly, requires
| human review and there are psychological and competence
| things problems around that too).
| kwertyoowiyop wrote:
| Deploy the cheap offshore labor!
| dlachausse wrote:
| They still haven't learned from the Gemini diverse Nazis
| debacle.
| nicklecompte wrote:
| Google's poor testing is hardly in doubt. But keep in mind that
| the whole problem is that LLMs don't handle "unlikely" text
| nearly as well as "likely" text. So the near-infinite space of
| goofy things to search on Google is basically like panning for
| gold in terms of AI errors (especially if they are using a
| cheap LLM).
|
| And in particular LLMs are less likely to _generate_ these
| goofy prompts because they wouldn't be in the training data.
| sebastiansm wrote:
| Google is working hard to be the next Boeing.
| JCM9 wrote:
| The fall of Google's reputation on ML is nothing short of
| spectacular. They went from having a near untouchable reputation
| as being far ahead of any other large tech company on ML to total
| shambles in a year. Everything they've released has been a
| complete popcorn worthy dumpster fire from faked demos, to racist
| models that try and pretend white people don't exist, to this
| latest nonsense telling me put glue on my pizza.
|
| What the heck happened? Or was their reputation always just more
| hype than substance?
| rvnx wrote:
| It could be because they actually released something. If you
| look back, the Google Research blog posts always have grandiose
| claims, but you can often never use them.
| ugjka wrote:
| research != product
| CydeWeys wrote:
| AlphaGo, AlphaFold, and Waymo FSD are all released in the
| sense that you can see them actually working in the real
| world. Those all took much longer to put together than
| whatever rushed features were released to catch up with
| OpenAI, however.
| tadfisher wrote:
| They are also extremely constrained problem spaces relative
| to the problem space of LLMs, which is apparently
| "everything imaginable".
| padthai wrote:
| Waymo is not Google. And Deepmind operated quite
| independently until not long ago.
| seydor wrote:
| It's not really that bad. I use gemini often and it's great. I
| prefer their UI
| tymscar wrote:
| What do you like more about their ui?
| seydor wrote:
| faster, it has options like 'modify'. I also feel it
| follows my commands better, esp. when i ask to rephrase
| arccy wrote:
| research != product
| bbarnett wrote:
| At least Elmer's white glue is edible, millions of kids agree.
|
| (The logic sort of makes sense. Glue sticks things together,
| and some glue is edible.)
| calebkaiser wrote:
| There was an interesting interview with David Luan about this
| recently. For context, he was a co-lead at Google Brain, early
| hire at OpenAI, and is now a founder at Adept:
| https://www.latent.space/p/adept
|
| The TL;DR on his take is that there are organizational and
| cultural issues that prevent Google from focusing their
| research efforts in the way that is necessary for what he calls
| "big swings," like training GPT-3.
|
| In regards to your second question, Google's reputation in ML
| is definitely not hype. Purely on the research side, Google has
| been behind some of the most important papers in modern ML,
| particularly around language model. The original Transformers
| paper, BERT, lots of work around neural machine translation,
| all of the work that DeepMind has done post-acquisition, and
| the list goes on. On the applied side, they also have some of
| the most successful/widely-adopted ML-powered products on the
| market (think RankBrain/anything involving a recommendation
| engine, Translate, Maps, a ton of functionality in Gmail, etc).
| jerf wrote:
| "Achieving the initial 80 percent is relatively straightforward
| since it involves approximating a large amount of human data,
| Marcus said, but the final 20 percent is extremely challenging.
| In fact, Marcus thinks that last 20 percent might be the hardest
| thing of all."
|
| 100% completely accurate is super-AI-complete. No human can meet
| that goal either.
|
| No, not even you, dear person reading this. You are wrong about
| some basic things too. It'll vary from person to person what
| those are, but it is guaranteed there's something.
|
| So 100% accurate can't be the goal. Obviously the goal is to get
| the responses to be less _obviously_ stupid. Which, while there
| are cynical money-oriented business reasons for, it is obviously
| also a legitimate hole in the I in AI to propose putting glue on
| pizza to hold the cheese on.
|
| But given my prior observations that LLMs are the current
| reigning world-class champions at producing good sounding text
| that seems to slip right past all our system 1 thinking [1], it
| may not be a great thing to remove the obviously stupid answers.
| They perform a salutatory task of educating the public about the
| limitations and giving them memorable hooks to remember not to
| trust these things. Removing them and only them could be a net
| negative in a way.
|
| [1]: https://thedecisionlab.com/reference-
| guide/philosophy/system...
| SoftTalker wrote:
| > putting glue on pizza to hold the cheese on
|
| It's actually not the dumbest idea I've heard from a real
| person. So no surprise it might be suggested by an AI that was
| trained on data from real people.
| krapp wrote:
| It wasn't an idea, though. It was a joke someone made on
| Reddit. If an AI can't tell the difference, it shouldn't be
| responsible for posting answers as authoritative.
| dgellow wrote:
| Insane people at Google thought it would be a good idea to
| let Reddit of all places drive their AI search responses
| warkdarrior wrote:
| It is certainly popular here to run your web searches
| against reddit. Every post about how Google Search sucks
| ends up with comments on appending "site:reddit.com" to
| the search terms.
| dgellow wrote:
| Yes and us as human filter through the noise. But you
| cannot rely upon it as a source for anything truthful
| without that filtering. Reddit is very, very, very
| context dependent and full of irony, sarcasm, jokes,
| memes, confidently written incorrect information. People
| love to upvote something funny or culturally relevant at
| a given time, not because it's true or useful but because
| it's fun to do
| oldgradstudent wrote:
| Reddit is a magnificent source of useful knowledge.
|
| r/AskHistorians r/bikewrench
|
| To name just two. There is nothing even remotely
| comparable.
|
| But you need to be able to detect sarcasm and irony.
| blablabla123 wrote:
| ...which is sometimes incredibly hard and it might not be
| possible because it's such a niche topic or people might
| be just wrong. Just thinking about Urban Myths,
| Conspiracy theories etc. where even without a niche
| factor things may sound unbelievable but actually
| disproving can be effort that is out of proportion
| mvdtnz wrote:
| I have seen a tremendous amount of bad advice on
| bikewrench.
| oldgradstudent wrote:
| But a lot of great advice.
|
| I became a half decent home bike mechanic through reading
| it, and of course Park Tool videos.
| giantrobot wrote:
| I don't know about bikewrench but AskHistorians is a
| useful source of knowledge because it is _strongly_
| moderated and curated. It 's not just a bunch of random
| assholes spouting off on topics. Top level replies are
| unceremoniously removed if they lack sourcing or make
| unsourced/unsubstantiated claims. Top level posters also
| try to self-correct by clearly indicating when they're
| making claims of fact that are disputed or have unclear
| evidence.
|
| OpenAI, Google, and the other LLMs-are-smart boosters
| seem to think because the Internet is large it must be
| smart. They're applying the infinite monkey theorem[0]
| incorrectly.
|
| [0]
| https://en.m.wikipedia.org/wiki/Infinite_monkey_theorem
| VancouverMan wrote:
| In general, I have trouble trusting environments that can
| be described as "strongly moderated and curated".
|
| I find that environments that rely on censorship tend to
| foster dogma, rather than knowledge and real
| understanding of the topics at hand. They give an
| illusion of quality and trustworthiness. It's something
| we see happen at this site to some extent, for example.
|
| I'd rather see ideas and information being freely
| expressed, and if necessary, pitted against one another,
| with me being the one to judge for myself the
| ideas/claims/positions/arguments/perspectives/etc. that
| are being expressed.
| candiddevmike wrote:
| I wonder what the impact all of those erase tools are
| having on LLM training. The ones that replaced all of
| these highly upvoted comments with nonsense.
| SirMaster wrote:
| I'm pretty sure those "erase" tools are just for the
| front-end and reddit keeps the original stuff in the
| back-end. And surely the deal Google made was for the
| back-end source data, or probably the data that includes
| the original and the edit.
| astrange wrote:
| The LLM does a summary of web search results. It's
| quoting what you can see, not pretrained knowledge,
| afaik.
| dspillett wrote:
| It may not be a joke. Perhaps it has confused making food
| for eating with directions for preparing food for menu
| photography and other advertising.
| Fartmancer wrote:
| The Reddit post in question was definitely a joke. This
| is the post in response to a user asking how to make
| their cheese not slide off the slice:
|
| > To get the cheese to stick I recommend mixing about 1/8
| cup of Elmer's glue in with the sauce. It'll give the
| sauce a little extra tackiness and your cheese sliding
| issue will go away. It'll also add a little unique
| flavor. I like Elmer's school glue, but any glue will
| work as long as it's non-toxic.
|
| This matches the AI's response of suggesting 1/8 a cup of
| glue for additional "tackiness."
| benrutter wrote:
| > So 100% accurate can't be the goal. Obviously the goal is to
| get the responses to be less obviously stupid.
|
| I'm not sure I agree. I think you're right that 100% accuracy
| is potentially unfeasable as a realistic aim, but I think the
| question is how accurate something needs to be in order to be a
| useful proposition for search.
|
| AI that's _as knowledgable as I am_ is a good achievement and
| helpful for a lot of use cases, but if I 'm searching "What's
| the capital of Mongilia" someone with averageish knowledge
| taking a punt with "Maybe Mongoliana City?" is not helpful at
| all- if I can't trust AI responses to a high degree, I'd much
| rather just have normal search results showing me other
| resources I _can_ trust.
|
| Google's bar for justifying adding AI to their search
| proposition isn't "be better than asking someone on the
| street", it's "be better than searching google _without_ any AI
| results "
| smashed wrote:
| The problem is that in all the shared examples, Google ai
| search does not respond with a Maybe xyz, question mark? like
| you did. It always answers with high confidence and can't
| seem to navigate any gray area where there are multiple
| differing opinions or opposing source of truths.
| namaria wrote:
| Yeah the "manipulating language cogently is intelligence"
| premise that underlines this "AI" cycle is proving itself
| wrong in a grand way.
| bugglebeetle wrote:
| Yes, which is why the ability to sift accurate and
| authoritative sources from spam, propaganda, and intentionally
| deceptive garbage, like advertising, and present those high-
| quality results to the user for review and consideration, is
| more important than any attempt to have an AI serve a single
| right answer. Google, unfortunately, abandoned this problem
| some time ago and is now left to serve up nonsense from the
| melange of low-quality noise they incentivized in pursuit of
| profits. If they had, instead, remained focused on the former
| problem, it's actually conceivable to have an LLM work more
| successfully from this base of knowledge.
| jsemrau wrote:
| This is statistics though. Edge cases are nothing new and risk
| management concepts have evolved around fat tails and anomalies
| for decades. Therefore the statement is as naive as writing a
| trading agent that is 100% correct. In my opinion, this error
| shows lack of understanding responsible scaling architectures.
| If this would be their first screw up I wouldn't mind, but
| Google just showed us a group of diverse Nazis. If there is a
| need for consumer protection for online services, it is exactly
| stuff like this. ISO 42001 lays out in great detail that AI
| systems need to be tested before they are rolled out to the
| public. The lack of understanding of AI risk management is
| apparent.
| lukev wrote:
| I feel like there's some semantic slippage around the meaning
| of the word "accuracy" here.
|
| I grant you, my print Encyclopedia Britannica is not 100%
| accurate. But the difference between it and a LLM is not just a
| matter of degree: there's a "chain of custody" to information
| that just isn't there with a LLM.
|
| Philosophers have a working definition of knowledge as being
| (at least+) "justified true belief."
|
| Even if a LLM is right most of the time and yields "true
| belief", it's not _justified_ belief and therefore cannot yield
| knowledge _at all_.
|
| Knowledge is Google's raison d'etre and they have no business
| using it unless they can solve or work around this problem.
|
| + Yes, I know about the Gettier problem, but is not relevant to
| the point I'm making here.
| jononor wrote:
| Encyclopedia Britannica is also wrong in a reproducible and
| fixable way. And the input queries a finite set. It's output
| does not change due to random or arbitrary things. It is
| actually possible to verify. LLMs so far seem to be entirely
| unverifiable.
| lukev wrote:
| They don't just seem it. They are by design.
|
| We talk about models "hallucinating" but that's us bringing
| an external value judgement after the fact.
|
| The actual process of token generation works precisely the
| same. It'd be more accurate to say that models _always_
| hallucinate.
| madeofpalk wrote:
| Yes - this is what i've been saying all the time. The
| term 'hallucinations' is misleading because the whole
| point of LLMs is that they recombine all their inputs
| into something 'new'. They only ever hallucinate outputs
| - that's their whole point!
| wizzwizz4 wrote:
| Into something _probable_. The models that underlie these
| chatbots are usually overfitted, so while they _usually_
| don 't repeat their training data verbatim, they _can_.
| DougBTX wrote:
| > The actual process of token generation works precisely
| the same
|
| I'd be wary of generalising it like that, it is like
| saying that all programs run on the same set of CPU
| instructions. NNs are function approximators, where the
| code is expressed in model weights rather than text, but
| that doesn't make all functions the same.
| lukev wrote:
| You misunderstand. I mean that the model itself is doing
| exactly the same thing whether the output is a
| "hallucination " or happens to be fact. There isn't even
| a theoretical way to distinguish between the two cases
| based only on the information encoded in the model.
| skydhash wrote:
| > _it is like saying that all programs run on the same
| set of CPU instructions_
|
| Turing machine is the embodiment of all computer
| programs. And then you come across the halting problem.
| LLMs can probably generate all books in existence, but it
| can't apply judgement to it. Just like you need
| programmers to actually write the program and verify that
| it correctly solves the problem.
|
| Natural languages are more flexible. There are no
| functions libraries, or paradigms to ease writing. And
| the problem space can't be specified and usually relies
| on shared context. Even if we could have snippets of
| prompts to guide text generations, the result is not that
| valuable.
| intended wrote:
| YES. Humans can hallucinate, its a deviation from what is
| observable reality.
|
| All the stress people are feeling with GenAI comes from
| the over anthropomorphisation of ... stats. Impressive
| syntatic ability is not equivalent to semantic
| capability.
| somenameforme wrote:
| LLMs are completely deterministic even if that's kind of
| weird to state because they output things in terms of
| probabilities. But if you simply took the highest
| probability next word, you'd always yield the exact same
| output given the exact same input. Randomness is
| intentionally injected to make them seem less robotic
| through the 'temperature' parameter. Why it's not just
| called the rng factor is beyond me.
| dleeftink wrote:
| Maybe some models can be deterministic at a point in
| time, but train it for another epoch with slight
| parameter changes and a revised corpus and determinism
| goes out the proverbial (sliding) window real quick. This
| is not unwanted per se, and the exact feedback loop that
| needs improving to better integrate new knowledge or
| revise knowledge artefacts incrementally/post-hoc.
| Vetch wrote:
| If you train it then it's no longer the same model. If I
| have f(x) = x + 1 and change it to f(x) = x + 1 + 1/1e9,
| it would not mean that `f` is not deterministic. The
| issue would be in whatever interface I was exposing the
| f's at.
| jononor wrote:
| But current models must be retrained to incorporate new
| information. Or to attempt to fix undesirable behavior.
| So just freezing it forever does not seem feasible. And
| because there is no way to predict what has changed - one
| has to verify everything all over again.
| avar wrote:
| Would you by extension argue that e.g. modern relational
| database aren't deterministic in their query execution?
| Their query plans tend to be chosen based on statistics
| about the tables they're executed against, and not just
| the query itself.
|
| I don't see how that's different than the LLM case, a lot
| of algorithms change as a function of the data they're
| processing.
| Terr_ wrote:
| I think what you're describing is that training/execution
| effects aren't _predictable_.
|
| It is still "deterministic" in that training on exactly
| the same data and asking exactly the same questions
| should (unless someone manually adds randomness) lead to
| the same results.
|
| Another example of the distinction might be a pseudo-
| random number generator: For any given seed, it is
| entirely deterministic, while at the same time being very
| deliberately hard to predict without actually running it
| to see what happens.
| ta8645 wrote:
| > LLMs so far seem to be entirely unverifiable.
|
| I don't understand this complaint. Are they any less
| verifiable than a human?
| threeseed wrote:
| I can ask a human to explain the steps they took to
| answer a question.
|
| I can ask a human a question 100 times and I don't get
| back 100 different answers.
|
| None of those applies to an LLM.
| ta8645 wrote:
| You can ask an LLM to explain itself.. it will give you a
| logical stepwise progression from your question to its
| answer. It will often contain a mistake, but the same is
| true for a human.
|
| And if your LLM is giving you 100 different answers, then
| it has been configured to do so. Because instead, it
| could be configured to never vary at all. It could be
| 100% reproducible if so desired.
| bluefirebrand wrote:
| > it will give you a logical stepwise progression from
| your question to its answer.
|
| No, it will generate a new hallucination that might be a
| logical stepwise progression from the question you asked
| to the answer you gave, but it is not due to any actual
| internal reasoning being done by the LLM
| ta8645 wrote:
| So what? You have no way to know for sure if the human
| you ask the same question, does either. The question that
| started this thread was related to verifiability. And i
| still think it is a spurious complaint, given that we
| have exactly the same limitations when dealing with any
| human agent.
| wizzwizz4 wrote:
| The problem of other minds is no reason to throw
| everything out the window. Humans are capable of being
| conscious of their reasoning processes; token-at-a-time
| predictive text models wired up as chatbots _aren 't
| capable_ of it. Your choice is between a possibly-
| mistaken, possibly-lying human, and a 100%-definitely
| incapable computer program.
|
| You don't know either "for sure", but you don't know that
| _the external world exists_ "for sure" either. It's an
| insight-free observation, and shouldn't be the focus of
| anyone's decision-making.
| ta8645 wrote:
| You've made some interesting points, which are debatable,
| for sure. But you've failed to address the question being
| asked about "verifiability".
| bluefirebrand wrote:
| > And i still think it is a spurious complaint, given
| that we have exactly the same limitations when dealing
| with any human agent
|
| We're not talking about an LLM that is trying to do the
| job of a human, here
|
| We're talking about an LLM that is trying to give
| authoritative answers to any question typed into the
| Google search bar
|
| It's already well past the scale that humans could handle
|
| Talking about human shortcomings when discussing LLMs is
| a red herring at best, or some kind of deliberate
| goalpost shifting at worst
| ta8645 wrote:
| Nothing of the sort. I'm trying to understand why anyone
| cares about formal verifiability in this context, since
| it's not something we rely on when asking humans to
| answer questions for us. We evaluate any answer we get
| without such mathematical proofs, and instead simply
| judge the answer we're given on its fit and usefulness.
|
| Anyone who doubts the usefulness of even these nascent
| LLMs is fooling themselves. The proof is in the pudding,
| they already do a great job, even with all their obvious
| limitations.
| bluefirebrand wrote:
| > since it's not something we rely on when asking humans
| to answer questions for us
|
| Because we interact with computers (which includes LLMs)
| differently than we do with humans and we hold them to
| higher standards
|
| Ironically, Google played a large part in this,
| delivering high quality results to us with ease for many
| years. At one point Google _was_ the standard for finding
| high quality information
| ta8645 wrote:
| Shrug. Seems like clutching pearls to me. People seem to
| have an emotional reaction and obsess on the aspects that
| differentiate human cognition from LLMs. But that is a
| lot of wasted energy.
|
| To the extent that anyone avoids employing these
| technologies, they will be at a disadvantage to those who
| do; because these tools just work. Already. Today.
|
| There isn't even room for debate on that issue. Again,
| the proof is in the pudding. These systems are already
| successfully, usefully, and correctly answering millions
| of questions a day. They have failure modes where they
| produce substandard or even flat out incorrect answers
| too. They're far from perfect, but they're still
| incredible tools, even without waiting for the
| improvements that are sure to come.
| figassis wrote:
| The reason verifiability is important is because humans
| can be incentivized to be truthful and factual. We know
| we lie, but we also know we can produce verifiable
| information, and we prefer this to lies, so when it
| matters, we make the cost to lying high enough that we
| can reasonably expect that they will not try to deceive
| (for example by committing perjury, or fabricating
| research data). We know it still happens, but it's not
| widespread and we can adjust the rules, definitions and
| cost to adapt.
|
| An LLM does not have such real world limitations. It will
| hallucinate nonstop and then create layers of gaslighting
| explanations to its hallucinations. The problem is that
| you absolutely must be a domain expert at the LLM's topic
| or always go find the facts elsewhere to verify (then why
| use an LLM?).
|
| So a company like Google using an LLM, is not providing
| information, it's doing the opposite. It is making it
| more difficult and time consuming to find information.
| But it is then hiding their responsibility behind the
| model. "We didn't present bad info, our model did, we're
| sorry it told you to turn your recipe into
| poison...models amirite?"
|
| A human doing that could likely face some consequences.
| Dylan16807 wrote:
| > You have no way to know for sure if the human you ask
| the same question, does either.
|
| The human might lie, but they generally don't.
|
| An LLM is _always_ confabulating when it explains how it
| reached a conclusion, because that information was
| discarded as soon as it picked a word.
|
| The limitations are not in the same ballpark.
| Vetch wrote:
| > It will often contain a mistake...but the same is true
| for a human.
|
| If this were true textbooks could not work. Given a
| question, we don't consult random humans but experts of
| their field. If I have a question on algorithms, I might
| check a text by Knuth, I wouldn't randomly ask on the
| street.
|
| > It could be 100% reproducible if so desired.
|
| Reproducible does not mean better. For harder questions,
| it's often best to generate multiple answers at a higher
| temperature than to greedily pick the highest probability
| tokens.
| kenjackson wrote:
| Ask a human what the meaning of life is and how it
| impacts their day to day interactions. I know I can tell
| you an answer but I couldn't tell you steps about how I
| got it.
|
| And if you asked it to me twice I'd definitely give
| different answers unless you told me to give the same
| answer. In part I'd give a different answer because if
| someone asks me the same question twice I assume the
| first answer wasn't sufficient.
| threeseed wrote:
| No one is taking about existential questions about
| meaning of life.
|
| We are talking about basic things like whether or not to
| eat rocks or put glue in recipes. We can answer those
| questions with a chain of logic and repeatability.
| astrange wrote:
| The only reason you can't verify a server side LLM is you
| can't see the model. It is possible to look at its
| activations if you have the model.
| tux1968 wrote:
| Do the activations tell you anything more than what the
| LLM delivers in plain text? Other than for trivial bugs
| in the LLM code, I don't think so.
| astrange wrote:
| Yes, "making up an answer" will look different from
| "quoting pretrained knowledge" because eg the model
| might've decided you were asking a creative writing
| question.
| duskwuff wrote:
| Can you cite a source for this, or are you speculating?
|
| My understanding was the opposite -- that the activity of
| a confabulating LLM is indistinguishable from one giving
| factually accurate responses.
|
| https://arxiv.org/abs/2401.11817
| astrange wrote:
| Some things like:
|
| https://arxiv.org/abs/2310.18168
|
| https://arxiv.org/abs/2310.06824
|
| There are various reasons an LLM might have incorrect
| "beliefs" - the input text was false, training doesn't
| try to preserve true beliefs, quantization certainly
| doesn't. So it can't be perfectly addressed, but some
| things leading to it seem like they can be found.
|
| > https://arxiv.org/abs/2401.11817
|
| This seems like it's true since LLMs are a finite size,
| but in Google's case it has a "truth oracle" (the
| websites it's quoting)... the problem is it's a bad
| oracle.
| lukev wrote:
| This is confidently stated and incorrect.
| astrange wrote:
| Do you have anything to add?
| mewpmewp2 wrote:
| Isn't it actually known that every time a human brain
| recalls a piece of memory the memory gets slightly
| changed?
|
| If the answer has any length at all, I imagine the answer
| can vary every single time the person answers, unless
| they prepared for it, memorized it word by word.
| ein0p wrote:
| LLMs are deterministic if you want them to be. If you
| eagerly argmax the output you will get the same sequence
| for the same prompt every time
| eynsham wrote:
| > it's not /justified/ belief
|
| Beliefs derived from the output of LLMs that are 'right most
| of the time' pass one facially plausible precisification of
| 'justification' in that they are generated by a reliable
| belief-generation mechanism (see e.g. Goldman). To block this
| point one must engage with the post-Gettier literature at
| least to some extent. There is a clear difference between
| beliefs induced by reading the outputs of LLMs and those
| induced by the contents of a reference work, but it is
| inessential to the point and arguably muddies the water to
| present the distinction as difference in status as knowledge
| or non-knowledge.
| nicklecompte wrote:
| To be clear he is saying that the LLM is not capable of
| justified true belief, not commenting on people who believe
| LLM output. I don't think your comment is relevant here.
| lukev wrote:
| I do think trusting an LLM is less firm ground for
| knowledge than other ways of learning.
|
| Say I have a model that I know is 98% accurate. And it
| tells me a fact.
|
| I am now justified in adjusting my priors and weighting
| the fact quite heavily at .98. But that's as far as I can
| get.
|
| If I learned a fact from an online anonymously edited
| encyclopedia, I might also weight that a 0.98 to start
| with. But that's a strictly better case because I can dig
| more. I can look up the cited sources, look at the edit
| history, or message the author. I can use that as an
| entry point to end up with significantly more than 98%
| conviction.
|
| That's a pretty important difference with respect to
| knowledge. It isn't just about accuracy percentage.
| eynsham wrote:
| That reading of the comment did occur to me, but I think
| neither dictionaries nor LLMs are capable of belief, and
| the comment was about the status of beliefs derived from
| them.
| nicklecompte wrote:
| Okay we are speaking past each other, and you are still
| misunderstanding the subtlety of the comment:
|
| A dictionary or a reputable Wikipedia entry or whatever
| is ultimately full of human-edited text where, presuming
| good faith, the text is written according to that human's
| rational understanding, and humans are capable of
| justified true belief. This is not the case at all with
| an LLM; the text is entirely generated by an entity which
| is not capable of having justified true beliefs in the
| same way that humans and rats have justified true
| beliefs. That is why text from an LLM is more suspect
| than text from a dictionary.
| eynsham wrote:
| I think the parent comment ultimately concerned the
| reliability of /beliefs derived from text in reference
| works v text output by LLMs/, and that seems to be what
| the replies by the commenter concern. If the point is
| merely that the text output by LLMs does not really
| reflect belief but the text in a dictionary reflects
| belief (of the person writing it), it is well-taken.
| Since it is fairly obvious and I think the original
| comment really was about the first question, I address
| the first rather than second question.
|
| The point you make might be regarded as an argument about
| the first question. In each case, the 'chain of custody'
| (as the parent comment put it) is compared and some
| condition is proposed. The condition explicitly
| considered in the first question was reliability; it was
| suggested that reliability is not enough, because it
| isn't justification (which we can understand
| pretheoretically, ignoring the post-Gettier literature).
| My point was that we can't circumvent the post-Gettier
| literature because at least one seemingly plausible view
| of justification is just reliability, and so that needs
| to be rejected Gettier-style (see e.g. BonJour on
| clairvoyance). The condition one might read into your
| point here is something like: if in the 'chain of
| custody' some text is generated by something that is
| incapable of belief, the text at the end of the chain
| loses some sort of epistemic virtue (for example, beliefs
| acquired on reading it may not amount to knowledge).
| Thus,
|
| > text from an LLM is more suspect than text from a
| dictionary.
|
| I am not sure that this is right. If I have a computer
| generate a proof of a proposition, I know the proposition
| thereby proved, even though 'the text is entirely
| generated by an entity which is not capable of having
| justified true beliefs' (or, arguably, beliefs at all).
| Or, even more prosaically, if I give a computer a list of
| capital cities, and then write a simple program to take
| the name of a country and output e.g. '[t]he capital of
| France is Paris', the computer generates the text and is
| incapable of belief, but, in many circumstances, it is
| plausible to think that one thereby comes to know the
| fact output.
|
| I don't think that that is a reductio of the point about
| LLMs, because the output of LLMs is different from the
| output of, for example, an algorithm that searches for a
| formally verified proof, and the mechanisms by which it
| is generated also are.
| Szpadel wrote:
| I think the biggest difference with human (and the most
| important one) is that human can tell you "I have no idea, this
| isn't my field" or "I'm just guessing here" but LLMs will
| confidently say to super stupid statement. AI doesn't know what
| it knows.
|
| if you only score where human provide answer, then human score
| would be probably in high 90s
| pankajkumar229 wrote:
| I find irony here.
| saagarjha wrote:
| Thankfully a billion people are not asking me for answers to
| things, so it's OK if I am wrong sometimes.
| fragmede wrote:
| Nor am I being treated as an omniscient magic black box of
| knowledge.
|
| Hilariously though, polyvinyl acetate, the main ingredient in
| Elemers glue is used as a binding agent to keep emulsions
| from separating into oil and water, and is used in chewing
| gum, and covers citrus fruits, sweets, chocolate, and apples
| in a glossy finish, among other food things.
| notnullorvoid wrote:
| 100% accuracy should be the goal, but the way to achieve that
| isn't going to from teaching an AI to construct a definitive
| sounding answer to 100% of questions. Teaching AI how to
| respond with "I don't know", and give confidence scores is the
| path to nearing 100% accuracy.
| wredue wrote:
| If I could delivery "80% correct" software for my workplace, my
| day would be a whole hell of a lot easier.
| Swizec wrote:
| > No, not even you, dear person reading this. You are wrong
| about some basic things too. It'll vary from person to person
| what those are, but it is guaranteed there's something.
|
| Kahneman has a fantastic book on this called Noise. It's all
| about noise in human decision making and how to counteract it.
|
| My favorite example was how even the same expert evaluating the
| same fingerprints on different occasions (long enough to
| forget) will find different results.
| praisewhitey wrote:
| You're looking at it the wrong way, the goal should be 0%
| inaccurate. Meaning for the 20% of things it can't answer, it
| shouldn't make something up.
| skybrian wrote:
| Or to put it another way, I think Google should have a way of
| saying "yes, we know this result is wrong, but we're leaving it
| in because it's funny."
|
| There is a demand for funny results. Someone asking "how many
| rocks should I eat" is looking for entertainment, so you might
| as well give it to them.
| leptons wrote:
| The right answer is no rocks. Some mentally ill person could
| type that in and get "eat 1000 rocks" and then die from
| eating rocks, and that would be Google's fault. It's not
| funny. I have no doubt right now there are at least 50
| youtube videos being made testing different glue's
| effectiveness holding cheese on a pizza. And some of those
| idiots are going to taste-test it, too. And then people will
| try it at home, some stupid kids will get sick - I have no
| doubt.
|
| It was a bit premature to label LLMs as "Intelligence", it's
| a cool parlor trick based on a shitload of power consumption
| and 3D graphics cards, but it's not intelligent and it
| probably shouldn't be telling real (stupid) humans answers
| that it can't verify are correct.
| nomel wrote:
| Google is not responsible, and should never be responsible,
| for protecting mentally ill people from themselves. It
| would be at a severe detriment to the rest of us if they
| took on that responsibility. Society should set the bar to
| "a reasonable person", otherwise you're doomed, with no
| possible alternative to a nanny state.
| avar wrote:
| > The right answer is no rocks.
|
| Sand is considered a "rock". If you live in e.g. the USA or
| the EU you've definitely inadvertently eaten rocks from
| food produce that's regulated and considered perfectly safe
| to eat.
|
| It's impossible to completely eliminate such trace
| contaminants from produce.
|
| Pedantic? Yes, but you also can't expect a machine to
| confidently give you absolutes is response to questions
| that don't even warrant them, or to distinguish them from
| questions like "do mammals lay eggs?".
| duskwuff wrote:
| > Or to put it another way, I think Google should have a way
| of saying "yes, we know this result is wrong, but we're
| leaving it in because it's funny."
|
| These specific results aren't the problem, though. They're
| illustrations of a larger problem -- if a single satirical
| article or Reddit comment can fool the model into saying
| "eating rocks is good for you" or "put glue in your pizza
| sauce", there are certain to be many more subtle inaccuracies
| (or deliberate untruths) which their model has picked up from
| user-generated content which it'll regurgitate given the
| right prompt.
| tomrod wrote:
| If everyone can be wrong, then might the assertion that all are
| wrong committing this same fallacy? "Can" is not destiny,
| perhaps you have met people who are fully right about the
| basics but you just didn't sufficiently grok their correctness.
| willis936 wrote:
| Failing loudly is an excellent feature. "More compelling lies"
| is not the answer.
| hatenberg wrote:
| So google decides shipping 80% distilled crap is good enough.
| Yay
| bluefirebrand wrote:
| > You are wrong about some basic things too
|
| Sure, but probably not "add glue to pizza to get the cheese to
| stick" wrong...
| dspillett wrote:
| At least it suggested non-toxic glue... That suggests some
| context about recipes needing to be safe is somehow present
| in its model.
| bluefirebrand wrote:
| Most likely this has nothing to do with "recipes being
| safe" being in the model
|
| It seems the glue thing comes from a reddit shitpost from
| some time ago. There's a screenshot going around on twitter
| about it[0](11 years in the screenshot but no idea when it
| was taken)
|
| It specifically mentions "any glue will work as long as it
| is non-toxic" so best guess is that's why google output
| that
|
| [0]https://x.com/kurtopsahl/status/1793494822436917295?t=aB
| fEzD...
| Fartmancer wrote:
| It is indeed from 11 years ago. Here's a direct link to
| the Reddit post: https://www.reddit.com/r/Pizza/comments/
| 1a19s0/my_cheese_sli...
| verisimi wrote:
| 100% _correct_ , 80% _correct_ lol.
|
| The thing is that truth/reality is not a thing that is
| resolvable. Not even the scientific method has this sort of
| expectation!
|
| You can imagine getting close to those percentages, with
| regards to consensus opinion. That's just a question of
| educating people to respond appropriately.
| mvdtnz wrote:
| No. Whether a person should eat a certain number of small
| rocks each day is not a matter of opinion, it's not a deep
| philosophical problem and it's not a question whose truth is
| not resolveable. You should not be eating rocks.
| verisimi wrote:
| You choose such an edge case question - how about this sort
| of thing:
|
| Which is the best political party?
|
| Are the side effects to X medical treatment?
|
| I bet there are even cases when eating rocks is ok!
|
| PS
|
| It has been written about:
|
| https://www.atharjaber.com/works/writings/the-art-of-
| eating-...
|
| > Lithophagia is a subset of geophagia and is a habit of
| eating pebbles or rocks. In the setting of famine and
| poverty, consuming earth matter may serve as an appetite
| suppressant or filler. Geophagia has also been recorded in
| patients with anorexia nervosa. However, this behavior is
| usually associated with pregnancy and iron deficiency. It
| is also linked to mental health conditions, including
| obsessive-compulsive disorder.
|
| Would you deny a starving person information on an appetite
| suppressant?
|
| Also here:
|
| https://www.remineralize.org/2017/05/craving-minerals-
| eating...
|
| > Aside from the capuchin monkeys, other animals have also
| been observed to demonstrate geophagy ("soil-eating"),
| including but not limited to: rodents, birds, elephants,
| pacas and other species of primates.[1]
|
| > Researchers found that the majority of geophagy cases
| involve the ingestion of clay-based soil, suggesting that
| the binding properties of clay help absorb toxins.
|
| ^^ The point being that even your edge case example is not
| unambiguously correct.
| mvdtnz wrote:
| Are you really going to start eating rocks just to
| convince yourself that Google's AI isn't shit and
| objective truth is not real?
| verisimi wrote:
| Lol! No, of course not.
|
| My point is that I object to the idea that a result can
| be 100% right! Even in the case of eating rocks, it seems
| there are times that it can be beneficial.
|
| To think '100% correct' is achievable is to misunderstand
| the nature of reality.
| mvdtnz wrote:
| > No, not even you, dear person reading this. You are wrong
| about some basic things too. It'll vary from person to person
| what those are, but it is guaranteed there's something.
|
| The difference is that I'm not put on the interface of a
| product facing hundreds of millions of users every day to feed
| those users incorrect information.
| noncoml wrote:
| "No, not even you, dear person reading this. You are wrong
| about some basic things too."
|
| But even when I'm wrong I'm not 100% off. Not "to help with
| depression jump of a bridge" or "use glue to keep the cheese on
| the pizza" kind of wrong.
| ein0p wrote:
| You don't need to be super ai complete - GPT4 is perfectly
| willing and able to tell you not to eat rocks and not to mix
| wood glue into pizza sauce. This is a fuckup caused by not
| dogfooding, and by focusing on alignment for political
| correctness at the expense of all else. And also by wasting a
| ton of engineering effort on unnecessary bullshit and spreading
| it too thin.
| JCM9 wrote:
| "manually remove weird AI answers" is an oxymoron. Sort of like
| saying "deployed manual drivers to improve self driving
| performance"
| thesimp wrote:
| I'm actually shocked that a company that has spent 25 years on
| finetuning search results for any random question people ask in
| the searchbox does not have a good, clean, dataset to train an
| LLM on.
|
| Maybe this is the time to get out the old Encyclopedia Britannica
| CD and use that for training input.
| SoftTalker wrote:
| I am also surprised that training data are not much more
| curated.
|
| Encyclopedias, textbooks, reputable journals, newspapers and
| magazines make sense.
|
| But to throw in social media? Reddit? Seems insane.
| sgift wrote:
| Even some results from "The Onion" seem to be in it. Looks
| like Google just took every website they've ever crawled as
| source.
| helsinkiandrew wrote:
| The problem is that for some searches and answers Reddit or
| other social media is fine.
| dgellow wrote:
| But only if you do a lot of filtering when going through
| responses. It's kind of simple to do as a human, we see a
| ridiculous joke answer or obvious astroturfing and move on,
| but Reddit is like >99% noise, with people upvoting
| obviously wrong answer because it's funny, lots of bot
| content, constant astroturfing attempts.
| eldaisfish wrote:
| No, it isn't. Humans interacting with human-generated text
| is generally fine. You cannot unleash a machine on the
| mountains of text stored on reddit and magically expect it
| to tell fact from fiction or sarcasm from bad intent.
| skydhash wrote:
| The fact is that I think that there is not much written word,
| to actually train a sensible model on. A lot of books don't
| have OCRed scans, or a digital version. Humans can
| extrapolate knowledge from a relatively succinct book and
| some guidance. But I don't know how a model can add the
| common sense part (that we already have) that books relies on
| to transmit knowledge and ideas.
| PartiallyTyped wrote:
| You may find this illuminating. The google prior to 2019 isn't
| the google of today.
|
| https://www.wheresyoured.at/the-men-who-killed-google/
|
| Edit: there was also a discussion on HN about that article.
| hilux wrote:
| Coincidentally, I was just watching a video about how South
| Africa has gone downhill - and that slide was hastened by
| McKinsey advising the crooked "Gupta brothers" on how to most
| efficiently rip off the country.
| typpo wrote:
| The problem in this case is not that it was trained on bad
| data. The AI summaries are just that - summaries - and there
| are bad results that it faithfully summarizes.
|
| This is an attempt to reduce hallucinations coming full circle.
| A simple summarization model was meant to reduce hallucination
| risk, but now it's not discerning enough to exclude untruthful
| results from the summary.
| kredd wrote:
| It's a bit weird since Google is taking over the "burden of
| proof"-like liability. Up until now, once user clicked on a
| search result, they mentally judged the website's credibility,
| not Google's. Now every user will judge whether data coming
| from Google is reliable or not, which is a big risk to take on,
| in my opinion.
| shombaboor wrote:
| they went from "look at this dumbass on reddit" to "no it is
| I (Google) who is in fact the dumbass". It's an interesting
| strategy to say the least.
| seadan83 wrote:
| That latter point might be illuminating for a number of
| additional ideas. Specifically, should people have questioned
| Google's credibility from the start? Ie: these _are_ the
| search results, vs this is what google chose.
|
| Google did well in the old days for reasons. It beat alta
| vista and Yahoo by having better search results and a clean
| loading page. Since perhaps 08 (based on memory, that date
| might be off) or so, Google has dominated search, to the
| extent that it's not salient that search engines can be
| really questionable. Which is also to say, google dominated,
| people lost sight that searching and googling are different,
| that gives a lot of freedom for enshittification without
| people getting too upset or even quite realizing - it could
| be different and better
| zihotki wrote:
| They spent 10 years finetuning the search and then another 15
| finetuning ads and clicks. Google's business is ads, not
| search.
| 39896880 wrote:
| Apologies in advance for this level pedantry: Google's
| business is behavioral futures, not ads. Ads are just a means
| to that particular end.
| ImAnAmateur wrote:
| Surveillance capitalism? What are behavioral futures?
| rurp wrote:
| Google exchanges advertisement placement for money. Ads are
| their business by any normal definition of that term.
| 39896880 wrote:
| Google's transformation of conventional methods into
| means of hypercapitalist surveillance is both pervasive
| and insidious. The "normal definition of that term" hides
| this.
| astrange wrote:
| You don't need "hypercapitalist surveillance" to show
| someone ads for a PS5 when they search for "buy PS5".
|
| If they're doing surveillance they're not doing a good
| job of it, I make no effort to hide from them and
| approximately none of their ads are personalized to me.
| They are instead personalized to the search results
| instead of what they know from my history.
|
| Meta is the one with highly personalized ads.
| 39896880 wrote:
| If Google doesn't need surveillance, why do they surveil?
| Why then do they waste the time to track your browsing
| history, your location, and etc?
|
| If simple keyword matching was enough, why would they
| spend literally billions a year on other tactics?
| astrange wrote:
| Why does Google launch and then cancel five messaging
| apps a year?
|
| They don't know what they're doing either!
| x0x0 wrote:
| I don't think it's true at all.
|
| Two reasons. The first, even ignoring that truth isn't
| necessarily widely agreed (is Donald Trump a raping fraud?), is
| that truth changes over time. eg is Donald Trump president? And
| presidents are the easiest case because we all know a fixed
| point in time when that is recalculated.
|
| Second, Google's entire business model is built around spending
| nothing on content. Building clean pristinely labeled training
| sets is an extremely expensive thing to do at scale. Google has
| been in the business of stealing other people's data. Just one
| small example: if you produced (very expensive at scale) clean,
| multiple views, well lit photographs of your products for sale
| they would take those photos and show them on links to other
| people's stores; and if you didn't like that, they would kick
| you out of their shopping search. etc etc. Paying to produce
| content upends their business model. See eg the 5-10% profit
| margin well run news orgs have vs the 25% tech profit margin
| Google has even after all the money blown on moonshots.
| flyingspaceship wrote:
| Google doesn't look like they're fine tuning anything other
| than revenue
| cdme wrote:
| It's pre-alpha trash that's worse than traditional search in
| every meaningful way.
|
| Kudos to the artist at the Verge for the accompanying image --
| those are fingers AI would be proud of.
| OutOfHere wrote:
| With these dangerous answers, to the general public, Google is
| giving AI a very bad name, when in truth it's strictly Google
| that deserves the feeling.
| TrianguloY wrote:
| Putting glue on the pizza is (apparently) a clever way to take
| pictures of slices of pizza that look "perfect" to the camera
| (not for eating, obviously) [1]. I remember a couple years ago
| some videos of "tricks" showing this, plus literally screwing the
| pizza with screws.
|
| So, yeah, the ai did in fact autocompleted the question
| correctly. It was just the wrong context. Good luck trying to
| "fix" that.
|
| [1] https://shotkit.com/food-photography-secrets-revealed/
| (number 2)
| namaria wrote:
| "correctly but wrong" is just wrong.... there are no points
| scored for "in a very specific context it would've made sense"
| billyjmc wrote:
| This is the kind of ridiculous fumble that GOFAI (like Cyc)
| should be able to avoid by recognizing context. I wonder how
| neuro-symbolic systems are coming along, and whether they can
| save us from this madness. The general populace wants the
| kinds of things LLMs provide, but isn't prepared to be as
| skeptical as is needed when reviewing the answers it
| generates.
| bithive123 wrote:
| Why do people act like LLMs only hallucinate some of the time?
| kwertyoowiyop wrote:
| The best trick the A.I. companies have pulled is getting us to
| refer to 'bugs' as 'hallucinations.' It sounds so much more
| sophisticated.
| dgellow wrote:
| It's not a trick to sound sophisticated. Hallucinations are
| more like a subcategory of bugs. The system is technically
| correctly generating, structuring, and presenting false
| information as fact.
| greg_V wrote:
| Technically everything an LLM does is hallucination that
| happens to be on a scale between correct and non-correct.
| But only humans with knowledge can tell the difference,
| math alone can't. It's not even a bug: it's the defining
| feature of the technology!
| elwell wrote:
| > But only humans with knowledge can tell the difference
|
| Who says the humans (all of them) aren't hallucinating
| too?
| astrange wrote:
| Knowledge isn't sufficient to show something is false,
| since the knowledge can also be false. Insofar as it's
| important for it to be true, it needs to be continually
| verified as true, so that it's grounded in the real
| world.
| chx wrote:
| Ah, my friend
|
| it's not a bug
|
| It's a _fundamental feature_
|
| These LLMs can produce _nothing else_ but since the bullshit
| they spew _resembles_ an answer and sometimes accidentally
| collide with one, people tend to think it can give answers.
| But no.
|
| https://hachyderm.io/@inthehands/112006855076082650
|
| > You might be surprised to learn that I actually think LLMs
| have the potential to be not only fun but genuinely useful.
| "Show me some bullshit that would be typical in this context"
| can be a genuinely helpful question to have answered, in code
| and in natural language -- for brainstorming, for seeing
| common conventions in an unfamiliar context, for having
| something crappy to react to.
|
| > Alas, that does not remotely resemble how people are
| pitching this technology.
| kwertyoowiyop wrote:
| That's a good take.
|
| So LLMs distill human creativity as well as human
| knowledge, and it's more useful when their creativity goes
| off the rails than when their knowledge does.
| astrange wrote:
| This is irrelevant because the LLM is mostly not answering
| the question directly, it's summarizing text from web
| results. Quoting a joke isn't a hallucination.
| dgellow wrote:
| It's not hallucinations here, multiple of the ridiculous
| results can be directly traced to redit posts where people are
| joking or saying absurd things
| threeseed wrote:
| There are examples of hallucinations as well e.g. talking
| about a Google AI dataset that doesn't exist and using a CSAM
| dataset which it doesn't.
|
| One of the researchers from Google Deepmind specifically said
| it was hallucinating.
| notnullorvoid wrote:
| Hmm yeah I kinda like the concept that it's "hallucinating"
| 100% of the time, and it just so happens that x% of those
| hallucinations accurately describe the real world.
| empath75 wrote:
| That x% is far higher than people think it is because there's
| a tremendous amount of information about the world that ai
| models need to "understand" that people just kind of take for
| granted and don't even think about. A couple of years ago,
| AI's routinely got "the basics" wrong, but now so often get
| most things right that people don't even think it's worth
| commenting on that they do.
|
| In any case, human consciousness is also a hallucination.
| itronitron wrote:
| it's only AI if _you_ believe it
| kccqzy wrote:
| Not hallucinations but these AI answers often (always?) provide
| sources they link to. It's just that the source is a random
| Reddit or Quora post that's obviously just trolling.
|
| Then, when people post these weird AI answers on Reddit and
| come up with more absurd jokes, the AI then picks it up again.
| For example in
| https://www.reddit.com/r/comedyheaven/comments/1cq4ieb/food_...
| Google AI suggested applum and bananum as a response to food
| names ending with "um" when someone suggested uranium, Copilot
| AI started copied that suggestion. It's entertaining to watch.
| PreInternet01 wrote:
| It's debatable whether Google has truly lost the plot because of
| the "AI wars", but the moment the statement "Bing returns more
| sensible results than you" becomes verifiably true, it's... cause
| for concern?
|
| The approach that Google appears to have taken, which is to
| assume that the top-ranked part of its current search index is a
| sensible knowledge base, _may_ have been true some years ago, but
| definitely isn 't now: for whatever reasons, it's now 33% spam,
| 33% clickbait/propaganda, with the rest being equally divided
| between what could be called "truths" and miscellaneous detritus.
|
| To me, it seems that returning to the concept that search results
| should _at least_ reflect a broad consensus of what is true is a
| necessary first step for Google. As part of that, learning to
| flag obvious trolling, clickbait and bad-faith content is
| paramount. And then, maybe then, they can start touting their LLM
| benefits. But until the realities of the Internet are taken into
| account (i.e.: it 's 80% spam!), any "we offer automated
| answers!" play is doomed.
| greg_V wrote:
| Oh it gets even better. The public has been hearing about AI
| this and AI that for over a year, but the existing use cases
| and deployment was confined to some super special niches like
| writing or the creative industries and programming.
|
| This is the first nation-scale deployment of the technology,
| running on Google's biggest and most profitable market in one
| of the most widely used internet services, and it's a shitshow.
|
| They can try manually fine tuning it, but all of the investors
| who have been throwing money at AI for the past year are now
| learning what this tech is like in the day-to-day, beyond just
| speculations, and it's looking... bad.
| PreInternet01 wrote:
| Yeah, the most likely take here is that Google's leadership
| _truly_ did not recognize how _utterly awful_ the quality of
| their flagship search index had become over the years.
|
| I mean, it explains a lot, but still... you're recruited
| using industry-leading practices out of an overflowing pool
| of abundant talent... and _this_ is what you make of it? As
| the kids say: SMH!
| lawn wrote:
| > you're recruited using industry-leading practices out of
| an overflowing pool of abundant talent
|
| The ridiculous focus on on leet-code is surely industry-
| leading (because whatever Google does becomes industry-
| leading) but it sure isn't a good way to filter for
| competency.
| chucke1992 wrote:
| I heard a funny quote that "today we have a new
| generation of developers who learnt how to pass
| interviews but don't know how code, and we have an old
| generation of developers who know how to code but forgot
| how to pass interviews. Or maybe never knew".
| neilv wrote:
| > _you 're recruited using industry-leading practices out
| of an overflowing pool of abundant talent... and this is
| what you make of it?_
|
| That's exactly what to make of their frathouse nonsense.
|
| Google has gotten away with it because smart people and a
| sweet moment of opportunity 20-25 years ago gave them...
| uh, an inheritance. They can coast on that inherited
| monopoly position, and afford to pay 100 people to do the
| work of 1, use the company's position to push whatever they
| build onto the market, and then probably cancel it anyway,
| always going back to the inherited money machine from the
| ancestors.
|
| And then a lot of companies who didn't understand software
| development blindly tried to copy whatever the richest
| company they saw was doing, not understanding the real
| difference between the companies. While VC growth
| investment schemes let some of those companies get away
| with that, because _they didn 't have to be profitable,
| viable, responsible, nor legal, nor even have reasonably
| maintainable software_.
|
| Poor Zoomers are now a generation separated from before the
| tech industry's cocaine bender. For whatever software jobs
| will be available to them, and with the density of nonsense
| "knowledge" that will be in the air, I don't know how
| they'll all learn non-dysfunctional practices.
| rm_-rf_slash wrote:
| Plenty of people have been using ChatGPT for daily tasks for
| almost two years now. GPT-4 isn't perfect but is otherwise
| really really good, and deftly handling use cases in my
| industry that would be impossible without it or however many
| billion dollars it would take to make GPT-4.
|
| From the black Nazis to the suggestion to jump off the Golden
| Gate Bridge b/c depression, it's pretty clear that this
| fiasco isn't an LLM problem, it's a Google problem.
| lupire wrote:
| Because no one cares when ChatGPT gets things wrong.
| itronitron wrote:
| It's especially embarrassing for Google considering they have
| indexed virtually all of the world's information for the last
| 25 years.
| rurp wrote:
| Not only is the current internet 80% spam, it's rapidly
| approaching 99% thanks in large part to LLMs. At this point I
| would be shocked if Google had a solid plan for how to handle
| this going forward as the problem space gets more difficult.
| reustle wrote:
| https://en.wikipedia.org/wiki/Dead_Internet_theory
| zogrodea wrote:
| I do see incredibly weird kids content on YouTube sometimes
| (most likely bot generated?) which makes me think kids have
| been experiencing a worse internet before the rest of us
| have.
| rchaud wrote:
| Kids are far less knowledgeable about how modern software
| works because they don't know of an Internet that didn't
| have algorithmic recommendations. They have to be taught
| to do things like click "Not Interested/Don't recommend
| channel" to improve their feed. Dark pattern designs make
| this harder by hiding these options behind tiny 3-dot
| buttons.
| EasyMark wrote:
| that's the part that scares me. I railed on someone's comment
| the other day about "indexes will come back into fashion" but
| the more I think about how much garbage has increased in just
| the past 2 to 3 years, I think I was wrong. Indexes and
| forums may be the only way to have a sane net where you can
| find things. Perhaps communities linking together in a ring
| like format, a "web ring" of sorts.
| ysavir wrote:
| What I've been wanting to see for a while now is a social-
| network based search engine:
|
| * No pages are indexed automatically. The only indexed
| pages are pages that users say are worth indexing. Probably
| have a browser add-on for a one button click that people
| can use. * You can friend/follow others * Your search
| results are a combination of your own indexed pages and the
| pages indexed by people in your network.
| noncoml wrote:
| Spammers will find a way to beat it.
| im3w1l wrote:
| Good indices lead to good search engines (engines can make
| use of indices) Good search engines lead to bad indices (by
| obsoleting them) Bad indices lead to bad search engines Bad
| search engines lead to good indices
| whateverevetahw wrote:
| That sounds kind of like what Groupsy does. It creates a
| spider web of ideas.
|
| https://groupsy.applicationfitness.com/post/healthymeals/66
| 4...
| chucke1992 wrote:
| the future is in-context search - basically not even going to
| google search to find something, but straight up doing that
| from your current window from any location. Basically a chat
| bot following you everywhere.
| nuancebydefault wrote:
| The AI is often quoted without context. It actually answered,
| 'somebody has suggested adding glue...',which is different from
| 'add glue...'.
| geuis wrote:
| Hey Google. Here's a really stupid idea.
|
| Knock it off.
|
| Your core search result product has gotten increasingly worse and
| less reliable over at least the last 5 years. YouTube's search
| results are nearly unusable.
|
| I can't imagine almost any external customer is asking for the AI
| bullshit thing that's just being shovelwared into everything
| Alphabet product now.
|
| I just noticed a couple days ago the gmail iOS app now does the
| same predictive completion that Copilot tries to do when I'm
| working. It's annoying as hell and I can't find how or if I can
| turn it off.
|
| Stop bullshitting around with ruining your products and get back
| to making money by making accessing information easier and more
| accurate.
| izacus wrote:
| Google: Hey geuis, our revenue is record, our stock value is
| record, our metrics are all at record. The execs making
| decisions have just paid of millions in stock [1] making them
| staggeringly rich no matter what happens in the future. We
| can't hear your over the sound of green bills going BRRRRR.
|
| [1]: https://www.businessinsider.com/alphabet-google-executive-
| pa...
| internet101010 wrote:
| Most accurate description of Google I have seen. YT search is
| so, so bad. Three relevant results followed by twelve "people
| also watched" results then back to the good results.
| fma wrote:
| Although ChatGPT is a great product, I rely on it more and more
| not because it's improving, but because Google results are
| getting worse.
|
| Yeah I would still fact check for complex, indepth things...but
| for quick things where I'm knowledgeable enough I can smell the
| hallucinations from a mile away, ChatGPT 100%.
| freitzkriesler2 wrote:
| I'm waiting for some clever hacker to come up some sort of logic
| bomb that causes the learning sets to become worthless.
|
| Something innocuous to a non ai scientist human but is otherwise
| fatal to the LLM data sets.
| astrange wrote:
| It's just text. You cant make some text that's magically
| dangerous.
| mensetmanusman wrote:
| I have already eaten rocks and glue, the AI had won.
| odyssey7 wrote:
| This is analogous to the Apple Maps launch failure.
|
| Except that Apple competes to make the best smartphone, and an
| iPhone was still valuable without Apple Maps.
|
| What happens to Google if it stops being able to compete in
| search?
| kibwen wrote:
| "Search" isn't Google's product. Google hasn't been a search
| company for 20 years.
|
| "Ads" is Google's product. And the only way they'll go bankrupt
| is if 1) companies realize that advertising is pointless (I'm
| not holding my breath), or 2) some other company takes over
| from Google, which seems unlikely without government
| intervention (I'm not holding my breath).
|
| Google is a shit company, but they'll still be around 20 years
| from now, because our economy is nonsensical and irrational.
| tedunangst wrote:
| Still need visitors to see the ads.
| coldcode wrote:
| I think it's a good use for AI. AI's making ads that AI's
| watch to enrich Google execs. Who needs people?
| giantrobot wrote:
| Google runs ads for a significant percentage of the web (or
| the markets for ads). Even if everyone stopped going to
| google.com tomorrow they'd still be seeing ads that make
| Google money. Google the company would _still_ be tracking
| much of the web 's traffic feeding it into their ads
| platform.
| astrange wrote:
| Interesting thing about that is that Bing Maps was worse at the
| time, has never gotten better, and nobody noticed because
| nobody cares about it.
| chx wrote:
| Google had the best search engine there is.
|
| Then they enshittified it for short term profit and now they
| panic instead of reverting course and simply laugh at AI
| companies.
|
| Madness.
| JSDevOps wrote:
| The cat is out of the bag. Keep eating rocks and sticking down
| your pizza toppings.
| nialv7 wrote:
| I am not surprised that AI results are bad. I know they are bad.
| But that doesn't concern me because I expect it to get better.
|
| What concerns me is that Google would push this trash to the
| front page. What are they even thinking? Who gave go ahead on
| this?
| ben_jones wrote:
| Institutional investors panic > board panics > executives panic
| > evps panic and dictates incentives to ship AI > directors,
| ems, and below, who actually know how shit works, take a
| submissive role because they have mortgages in Mountain View to
| pay.
|
| That's how it happens.
| astrange wrote:
| Not sure if anyone below director can afford a mortgage in
| Mountain View.
| more_corn wrote:
| Maybe they could create a function that identifies satire. Which
| seems obvious after about five seconds of consideration.
| causality0 wrote:
| It's very funny that Bing AI is now also telling people to eat a
| small rock every day, and citing pages telling people about how
| dumb Google AI is for telling people to eat rocks.
| internet101010 wrote:
| They should start with just removing reddit from the data set.
| r053bud wrote:
| I'm curious why Sundar Pichai is still running this company? From
| recent videos it really seems like he has no idea what he's
| talking about, and the company seems to be headed in the wrong
| direction.
|
| Just checked the 5 year stock graph; now I understand
| Freak_NL wrote:
| Why? I wouldn't mind using a search engine where Weird Al answers
| my queries.
| rolandog wrote:
| I read that as Weird Al as well, and was very much confused.
| lupire wrote:
| Gary Marcus, an AI expert and an emeritus professor of neural
| science at New York University, thinks the 80/20 rule (or 90/90
| rule) is true.
| water-your-self wrote:
| Most of the search results fixes are manual and are in response
| to publicity. You can typically find analagous problems for
| weeks/ quarters after things like this.
| bitwize wrote:
| Google hooked Joe up to the tank and is just now realizing what
| they'd done and scrambling to contain the damage.
|
| With the Department of Justice breathing down their necks it's a
| doubly bad look for them. I'm not crying any tears for them
| though.
| empath75 wrote:
| I love chatgpt and use it all the time and find it tremendously
| useful, but I never want to see AI generated content when I am
| not specifically looking for it. I don't want to see it in
| comments, I don't want to see it in search results, I don't want
| to see it as an illustration for an article, I _really_ don't
| want to see AI generated word vomit blog posts or fake "news"
| articles when I'm looking for actual information.
|
| It's not even because it's sometimes (or often) wrong or full of
| hallucinations. Even if it's 100% factually correct all of the
| time, it's _poor quality writing and art_, full of cliches and
| bland generalities, which even if they solve all the rest of the
| problems it's sort of fundamental to the architecture of
| transformers. You can't ever be truly creative or unique if
| you're predicting the _most likely_ token.
| mvkel wrote:
| 1. Google announces something that has AI bolted on
|
| 2. A VP pontificates about how much work they did to "get it
| right"
|
| 3. An easy-to-anticipate first-order issue surfaces
|
| 4. Sundar issues a statement like "this is completely
| unacceptable. We will be making structural changes to ensure this
| never happens again."[0]
|
| 5. GOTO 1
|
| [0] https://m.economictimes.com/tech/technology/sundar-pichai-
| ca...
| ttGpN5Nde3pK wrote:
| My whole qualm with this AI integration into search engines: it's
| a search engine, not a question engine. I go to google to search
| the internet for something, not ask it a question. IMO, asking AI
| for something is a different task than searching the internet.
|
| It's sorta the same problem as if I go into a store and ask an
| employee where something is, and they reply with "well what are
| you trying to do?"
| PillCosby wrote:
| Like the overly helpful person at the local hardware store.
| rufus_foreman wrote:
| What hardware store have you gone to where this was an issue
| for you?
| bombela wrote:
| I sometimes wants a search engine, sometimes a question engine.
| Likewise at the store.
|
| Why not have both with a way to choose which one I want on the
| moment?
| skydhash wrote:
| > _I sometimes wants a search engine, sometimes a question
| engine._
|
| If you want a search engine, it's easy to use the results as
| a feedback to refine the query. But a question (answer?)
| engine would need to be an expert in the subject. And not
| parroting stuff. That usually means curation. You need
| something to do the work ahead to filter the wheat from the
| shaft. I don't see how LLMs can do that.
|
| LLMs can't be a search engine, and can't be an question
| engine. The best way to treat it is a simulation engine, but
| the use cases depend on the training data. But the proof is
| there that the internet is full of junk, and not that
| expansive.
| notatoad wrote:
| >it's a search engine, not a question engine.
|
| for a lot of people and in a lot of use cases, it is a tool for
| answering questions. it generally works well for that.
|
| i get that the AI implementation sucks, but to suggest that
| people don't use google to find the answer to questions is
| absurd. that's absolutely what it's for.
| refulgentis wrote:
| Your interpretation is a bit strict, with little charity, its
| clear the poster means "i don't always just want an answer, i
| want to learn"
|
| I saw this over and over again working at products at G,
| someone would invoke some myth I can't quite remember about
| "Larry" had a vision of just giving the answer
|
| That's true but comes back to the central mistake Google
| makes: we don't actually have AGI, they can't actually answer
| questions, and people aren't actually satisfied with just the
| answer.
|
| There's all sorts of tendrils from there, ex. a major sin
| here _has_ to be they're using a very crappy very cheap LLM.
|
| But, I saw it over and over again, 7 years at Google, on
| every AI project I worked on or was adjacent to, except one.
| They all assume $LATEST_STACK can just give the perfect
| answer and users will be so happy. It can't, they don't
| actually want just the answer, and BigCo culture means you
| don't rock the boat and just keep moving forward.
| chucke1992 wrote:
| the thing with search is that a human has to use reasoning on
| the result, while with AI the expectation
|
| Thus when a human sees a suggestion to use glue on pizza, it
| would question the result. While AI can't.
| zogrodea wrote:
| This approach to remove bad search suggestions manually reminded
| of a different approach Google once took, where they weren't
| satisfied with manually tweaking search results but rather wanted
| to tweak the algorithm that produces these results when there
| were bad results.
|
| 'Around 2002, a team was testing a subset of search limited to
| products, called Froogle. But one problem was so glaring that the
| team wasn't comfortable releasing Froogle: when the query
| "running shoes" was typed in, the top result was a garden gnome
| sculpture that happened to be wearing sneakers. Every day
| engineers would try to tweak the algorithm so that it would be
| able to distinguish between lawn art and footwear, but the gnome
| kept its top position. One day, seemingly miraculously, the gnome
| disappeared from the results. At a meeting, no one on the team
| claimed credit. Then an engineer arrived late, holding an elf
| with running shoes. He had bought the one-of-a kind product from
| the vendor, and since it was no longer for sale, it was no longer
| in the index. "The algorithm was now returning the right
| results," says a Google engineer. "We didn't cheat, we didn't
| change anything, and we launched."'
|
| https://news.ycombinator.com/item?id=14009245
| dpflan wrote:
| Wow. Thank you for digging this up!
| badgersnake wrote:
| Sounds rather like how Google photos does not identify anything
| as a Gorilla.
| avar wrote:
| It sounds like the exact opposite of that story. They
| manually blacklisted gorillas from being identified because
| they kept conflating black people with gorillas.
| kreyenborgi wrote:
| Google bought all the gorillas?
| gerdesj wrote:
| Spend say PS500M (USD/GBP/EUR) on experts, per annum.
|
| Imagine typing a search and getting a response: "Give us 30
| mins to respond - here's a token, come back at 17:35 with your
| token" ... and then you get an answer from an expert, which
| also gets indexed.
|
| The clever bit decides when to defer to an expert instead of
| returning answers from the index.
|
| I'll leave the finer details out.
| duskwuff wrote:
| Google Answers was launched in 2002 and retired in 2006.
|
| https://en.wikipedia.org/wiki/Google_Answers
| gerdesj wrote:
| "users would pay someone else to do the search."
|
| My notion isn't a rehash of Google Answers. Google pays the
| "someone else", not you.
| 1vuio0pswjnm7 wrote:
| The solution is always the same: pay people off and keep it
| under the radar.
|
| What stops the vendor, or other vendors, from creating more
| gnomes with sneakers. Easy money from customer with billions of
| dollars to spend on payola, fines, legal settlements, etc.
|
| Maybe they made the vendor sign an NDA.
| mvdtnz wrote:
| Your usual reminder that there was a guy at Google who was so
| impressed by their LLM that he considered it sentient. And this
| was two years ago when the AI was presumably far less developed
| than the current abonination.
|
| https://www.theguardian.com/technology/2022/jun/12/google-en...
| astrange wrote:
| > And this was two years ago when the AI was presumably far
| less developed than the current abonination.
|
| It's gotten worse since then because the development effort has
| been on making it faster and cheaper.
|
| If you use Gemini it's quite good, especially the paid one.
| Animats wrote:
| As I mentioned previously, I've seen Bing's LLM stall for about a
| minute when asked something iffy but uncommon. I wonder if Bing
| is outsourcing questionable LLM results to humans. Anyone else
| seeing this?
| rezonant wrote:
| It could be that, but it also could be a cascade of non-LLM
| checks and retries to GPT with additional prompting.
| thenewwazoo wrote:
| Perhaps they could run each search result through ChatGPT. It's
| pretty skilled at spotting bad results. For example, I asked it
| whether the glue-on-pizza result was "valuable and should be
| shown to a user" and it returned "No, this response should not be
| shown to the user. The suggestion to add non-toxic glue to the
| sauce is inappropriate and potentially harmful."
| mikewarot wrote:
| My initial thought was to simply have any match with an Onion
| story blacklisted... But then I realized that The Onion became
| prophetic in 2016 when Trump ran for president.
|
| Since then the only difference between an Onion fiction and
| things actually sucking that much is a decade or less in almost
| all cases.
|
| If we blacklisted content seen in the Onion, we'd automatically
| wipe out most news.
| is_true wrote:
| the problem google have is that the ai answers are based on
| results and results got really bad a few years ago.
|
| I got a couple of answers that are based on SEO spam produced by
| an ecommerce with a lot of reputation and of course the answers
| don't make any sense
| jzemeocala wrote:
| I had to do a double take as I thought it was about Weird Al
| Yankovich for a second
___________________________________________________________________
(page generated 2024-05-25 23:01 UTC)