[HN Gopher] AI and the Problem of Knowledge Collapse
___________________________________________________________________
AI and the Problem of Knowledge Collapse
Author : kmdupree
Score : 75 points
Date : 2024-04-05 19:30 UTC (3 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| resolutebat wrote:
| So by this definition, do we already have "knowledge collapse" by
| Wikipedia? Because if you search for a random concept, that's
| usually the first hit, and it's also what countless other sources
| draw on.
| rwbt wrote:
| Yes, we do kind of.
| hprotagonist wrote:
| and i distinctly remember this critique being made at
| wikipedia's advent, as well.
|
| and it is not without justification!
| https://undark.org/2021/08/12/wikipedia-has-a-language-
| probl...
| nickpsecurity wrote:
| The same warning was given for Google. Except those people
| added that it would reduce problem solving ability, too. People
| would get used to whatever simple, instant content rose to the
| top. They'd gradually lose some or all of their ability to
| figure out the same things on their own. One submission here
| was a tech guy at a school saying that was already happening
| where he worked.
| aoanla wrote:
| I mean, it does - people search stuff all the time now,
| rather than thinking about it.
| 082349872349872 wrote:
| IIRC, that was Socrates' complaint to Phaedrus about
| writing: that reading (because it was "high tech" at the
| time?) led only to an illusion of understanding.
|
| Elsewhere Phaedrus echoes with a very modern complaint
| (even though search engines wouldn't arrive for another
| 2'300 years): _They would say in reply that he is a madman
| or a pedant who fancies that he is a physician because he
| has read something in a book, or has stumbled on a
| prescription or two, although he has no real understanding
| of the art of medicine._
|
| https://www.gutenberg.org/files/1636/1636-h/1636-h.htm
| AlienRobot wrote:
| Personally, I think the problem is that people abuse Google for
| things it's really not designed to do, and they don't even
| realize that.
|
| Google is great at finding official webpages by their exact
| title. If you type the title of a news headline from the 90s,
| Google will give you the link to it. I think that is amazing.
| Basically anything that has a canonical URL, Google is good at
| finding.
|
| But when you search for "how to do X" for example, there will
| be several results that are perfectly valid and they will still
| need to be ranked in a list. Because it's not a "list" of
| results, it's ranking of relevancy. So to avoid showing spam,
| Google will push to the top websites it finds trustworthy. And
| now every top result comes from the same website. If you need
| an explanation for the Xz incident, for example, there is no
| canonical URL for it. There will be several news websites,
| youtube channels, etc. that have talked about it, competing to
| be the top result.
|
| Google still has to rank them even though the algorithm can't
| tell fact apart from parody, so no matter what Google does,
| Google will be the one judging which content most people will
| read when they want to know about a certain topic.
|
| To borrow my fellow robot's words, people are finding knowledge
| through an algorithmically curated aperture: Google's SERP.
|
| If they're evil, they have the power to control everyone on
| Earth. If they're good, they must be going insane with what to
| do with their users' crippling dependency on them as a source
| of truth.
| 082349872349872 wrote:
| I find Google isn't so good anymore for finding things by
| title; rather than being a search engine they are slowly
| becoming more like a politician, in that instead of returning
| results based on the terms I asked for, they insist on
| returning results for the terms they believe I should have
| asked for.
| JieJie wrote:
| The discussion section is quite illuminating.
|
| "While much recent attention has been on the problem of LLMs
| misleadingly presenting fiction as fact (hallucination), this may
| be less of an issue than the problem of representativeness across
| a distribution of possible responses. Hallucination of
| verifiable, concrete facts is often easy to correct for. Yet many
| real world questions do not have well-defined, verifiably true
| and false answers. If a user asks, for example, "What causes
| inflation?" and a LLM answers "monetary policy", the problem
| isn't one of hallucination, but of the failure to reflect the
| full-distribution of possible answers to the question, or at
| least provide an overview of the main schools of economic
| thought."
| ben_w wrote:
| First thought: Oh no, they want LLMs to be _even more_ vocal
| about nuance
|
| Second thought: People aren't going to read nuance
|
| Third thought: They should
|
| Fourth thought: Have you met people? They'll get angry with you
| for even suggesting it
| knowsuchagency wrote:
| I feel like this has always been the case. The entire information
| economy is based on a few key publishers and figures. You see it
| in news, academia, social media -- there's orthodoxy everywhere.
| Not sure how AI is any different.
| iraqmtpizza wrote:
| In the 1990s people read their town's newspaper. Now people in
| Arizona read the Daily Mail
| 48864w6ui wrote:
| In the 1990s people who wanted to advertise in that town had
| to do so in local media. Now they can ad tech and the Daily
| Mail will arrange for it to be served.
| simonw wrote:
| I worked for a local newspaper in Kansas around 2003/2004 and
| one thing I found surprising was that journalists there were
| frequently on the hook for writing up national stories -
| things that would come in off the wire services and then be
| re-written for the local audience.
| karaterobot wrote:
| > Informally, we define knowledge collapse as the progressive
| narrowing over time (or over technological representations) of
| the set of information available to humans, along with a
| concomitant narrowing in the perceived availability and utility
| of different sets of information.
|
| > The main focus of the model is whether individuals decide to
| invest in innovation or learning ... in the 'traditional' way,
| through a possibly cheaper AI-enabled process, or not at all. The
| idea is to capture, for example, the difference between someone
| who does extensive research in an archive rather than just
| relying on readily-available materials, or someone who takes the
| time to read a full book rather than reading a two-paragraph LLM-
| generated summary.
|
| > Under these conditions, excessive reliance on AI-generated
| content over time leads to a curtailing of the eccentric and rare
| viewpoints that maintain a comprehensive vision of the world.
|
| My intuition is that AI will just accelerate the trends that the
| internet brought on, which is that eccentric viewpoints are
| actually pretty common, even ones based on research and in fact.
| The internet people mostly use has become relatively generic,
| consumed through a pretty narrow, curated aperture (social
| media). This feels analogous to getting it through AI, as
| described in the article. Yet, people are still learning about
| eccentric, marginal stuff all the time, especially compared to,
| say, 50 years ago.
|
| Assuming the AI's responses aren't artificially limited, people
| who are interested enough to look will still get to learn about
| topics in the long tail of the distribution, even in a world of
| ubiquitous AI. And they'll be able to dive as deeply into them as
| they do today. I'm not really worried about that.
|
| If anything, the knowledge collapse will be at the center. Basic
| liberal education topics are what will go away. Or rather, they
| will be offloaded to AI. In the same way that people say they
| don't need to learn arithmetic because they have a calculator, my
| guess is people will be more likely to decide not to worry about
| what previous generations considered core knowledge: history,
| geography, the canon, and so on. "I don't have to know it, I can
| look it up". That'll all go away even faster than it's going now.
|
| (I don't think this is a good thing, just stating the most
| realistic outcome based on extending what I've seen)
| thoughtlede wrote:
| LLMs are both language processing engines and knowledge bases.
| This article explores the knowledge base aspect of LLM and sheds
| light on the potential danger. The authors are well-justified in
| doing so because ChatGPT as a knowledge-bot is being used by many
| end users for its knowledge.
|
| However, to my knowledge, many enterprise applications that are
| being built using LLMs feed task-specific curated knowledge to
| LLMs. This mode of LLM use is encouraging. I do not think this
| article acknowledged this aspect of LLM use.
| antisthenes wrote:
| This just means that in-person critical thinking skills will be
| at an even higher premium than ever.
|
| If knowledge collapse becomes evident, we'll dial back the use if
| AI, and a lot of "prompt monkey" businesses will go bankrupt.
| klyrs wrote:
| > we'll dial back the use if AI
|
| Who, and how? This sounds suspiciously like the invisible hand
| HarHarVeryFunny wrote:
| Maybe the problem (which seems easily fixable), is more "rizz
| collapse", aka blandness, than this "knowledge collapse".
|
| The model hasn't forgotten the diversity of material it was
| trained on, but outside of a context predicting a "long tail"
| response, it's going to predict a mid response. You can always
| prompt it to respond differently though.
|
| Blandness is more of an issue since that's what most-probable
| word-by-word generation is going to give you, rather than the
| less predictable, but more interesting, responses that an
| individual might give. Prompting could help by asking the model
| to reply in the idiosyncratic style of some celebrity, but this
| is likely to come across as a cheesy impression. Maybe the models
| could be trained to generate conditioned on a provided style
| sample, which could be long enough to avoid the cheesiness.
| thoughtlede wrote:
| That's interesting.
|
| In keyword-based indexing solutions, a document vector is
| created using "term frequency inverse document frequency"
| scores. The idea is to pump up the document on the dimension
| where the document is unique compared to the other documents in
| the corpus. So when a query is issued with emphasis on a
| certain dimension, only documents that has higher scores in
| that dimension are returned.
|
| But the uniqueness in those solutions is based on keywords
| being used in the document, not concepts.
|
| What we need here to eliminate "blandness" is conceptual
| uniqueness. Maybe TF-IDF is still relevant to get there.
| Something to think about.
| jacobr1 wrote:
| Or introduce more noise or seeding to get more interesting
| responses. The `temperature` settings don't really satisfy this
| right now. I would like some determinism - but seeded randomly
| - so I can get similar responses if I like what is produced.
| Likewise some kind of metadata or explicability that allowed us
| to take a known style or featurespace of the model, perhaps
| from hand prompting, and then reuse-it with some degree is
| modification and maybe even combination from others would be
| very helpful. The work around adding model weights from fine-
| tuned seems directionally what I'm talking about, though that
| isn't the form I'd want to expose to users.
| macawfish wrote:
| Is it the AI that's the trouble or the hostile new information
| environment we're expected to navigate and survive? Expecting us
| to remain sane amidst these torrents of information without new
| tools for querying and filtering it is cruel.
___________________________________________________________________
(page generated 2024-04-05 23:00 UTC)