[HN Gopher] Wikipedia-grounded chatbot "outperforms all baseline...
       ___________________________________________________________________
        
       Wikipedia-grounded chatbot "outperforms all baselines" on factual
       accuracy
        
       Author : akolbe
       Score  : 211 points
       Date   : 2023-07-17 12:41 UTC (10 hours ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | c0brac0bra wrote:
       | But did they measure the truthiness of those facts?
        
       | altilunium wrote:
       | Related : https://diff.wikimedia.org/2023/07/13/exploring-paths-
       | for-th...
        
       | quotemstr wrote:
       | Accuracy according to whom? Wikipedia is a battleground for
       | ideologues. You can't trust anything even remotely controversial
       | present there.
        
         | PopePompus wrote:
         | This highlights a problem LLMs will face if they improve enough
         | to solve their hallucination problems. People will begin to
         | treat the LLM like some sort of all knowing oracle. Activists
         | will fight fiercely to control the model's output on
         | controversial topics, and will demand lots of model "tuning"
         | after training.
        
           | quotemstr wrote:
           | > Activists will fight fiercely to control the model's output
           | on controversial topics,
           | 
           | They already do. I'd love to know how much "brain damage"
           | RLHF and other censorship techniques cause to the general
           | purpose reasoning abilities of models. (Human reasoning
           | ability is also harmed by lying.) We know the damage is
           | nontrivial.
        
         | lacksconfidence wrote:
         | Accuracy as in faithfully represents the source material. It
         | doesn't matter if the source material is true or not in this
         | analysis.
        
       | sebzim4500 wrote:
       | Can someone update the link to https://arxiv.org/abs/2305.14292 ?
       | 
       | The headline refers to only a small portion of the linked page.
        
       | charlieo88 wrote:
       | I wish I had the time or facility to take a snapshot of wikipedia
       | now before the imminent deluge of Chat-GPT based updates that
       | start materially modifying wikipedia is some weird and
       | unpredictable manner.
        
         | ravetcofx wrote:
         | You can use Kiwix too as an easy way to get an archive of it
        
         | Der_Einzige wrote:
         | The wikipedia politburo already makes it impossible for normies
         | to edit any wikipedia article worth editing. If you don't
         | believe me, try it out with a stopwatch to see how long it
         | takes for your edit to be reverted.
        
           | _djo_ wrote:
           | That you call them a 'politburo' and refer to 'normies' gives
           | an indication that the types of edits you were making were
           | neither well sourced nor neutral.
           | 
           | I've never had an edit reverted on Wikipedia.
        
         | cheald wrote:
         | You can torrent a copy of Wikipedia, including article history.
         | Locally, you can go back to any revision of any article you
         | want. I keep a copy locally just because it seems something
         | valuable to have.
        
         | LeoPanthera wrote:
         | In late 2021 / early 2022 I got scared about the incoming
         | consequences of LLMs and downloaded all the "Kiwix" archives I
         | could find, including Wikipedia, a bunch of other Wikimedia
         | sites, Stack Overflow, etc.
         | 
         | I'm pretty glad that I did. I'm going to hold onto them
         | indefinitely. They have become the "low background steel" of
         | text.
        
           | samwillis wrote:
           | I really like that analogy.
           | 
           | For anyone curious what low background steel is, it's steel
           | that was made before the first atomic bombs were tested:
           | https://en.m.wikipedia.org/wiki/Low-background_steel
        
           | tivert wrote:
           | > In late 2021 / early 2022 I got scared about the incoming
           | consequences of LLMs and downloaded all the "Kiwix" archives
           | I could find, including Wikipedia, a bunch of other Wikimedia
           | sites, Stack Overflow, etc.
           | 
           | > I'm pretty glad that I did. I'm going to hold onto them
           | indefinitely. They have become the "low background steel" of
           | text.
           | 
           | Also, ironically, the Pushshift reddit dumps (still available
           | via torrent), before they were taken down. The exact time
           | Reddit shut down the API to sell their data for AI training
           | is also exactly the time it started to become less valuable
           | for that.
           | 
           | I believe a lot of subreddits started implementing protest
           | moderation policies after reddit came down on the blackout.
           | IMHO, they should implement rules like "no posts unless it's
           | a ChatGPT hallucination."
        
         | bombela wrote:
         | You can download a full archive already.
         | 
         | edit, link:
         | https://en.wikipedia.org/wiki/Wikipedia:Database_download
        
         | hughesjj wrote:
         | Wikipedia has released snapshots available for download for
         | over a decade now including ones with full edit histories,
         | meaning you can just revert all edits to before a chosen epoch.
        
         | masklinn wrote:
         | Wikipedia dumps are publicly available, both from themselves
         | and from the Internet archives.
         | 
         | There's no "time or facility" constraint, only storage space.
        
         | speedgoose wrote:
         | Wikipedia doesn't remove the old versions.
         | 
         | Otherwise you can find an archive there:
         | https://archive.org/details/wikimediadownloads?and%5B%5D=sub...
        
         | int_19h wrote:
         | Without article history and videos, it's small enough that many
         | modern smartphones can have a local offline copy.
         | 
         | http://kiwix.org/
        
         | deepserket wrote:
         | "As of 2 July 2023, the size of the current version of all
         | articles compressed is about 22.14 GB without media." -
         | https://en.m.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
        
         | pradn wrote:
         | I'm unsure if this will happen. There's plenty of checks-and-
         | balances for Wikipedia edits. There's automated spam detection,
         | editors manually looking over edits for articles on their
         | watchlist, editors who look over subtopics, and even editors
         | that take a look at the general stream of edits. It's already
         | possible to flag mass edits. As for whether ChatGPT will
         | inflect the subtle tone and bias of edits made using it, that's
         | the same as bias from human users. And the same mechanisms for
         | dealing with human bias apply here.
         | 
         | In terms of practical utility, for the vast majority of
         | humanity, access to translated articles in their local language
         | is the biggest problem, I think. There is no Yoruba-language
         | Wiki article on General Relativity, for example. Second comes
         | entire biased communities - like some of the smaller Wikis are
         | full of far-right editors, and most editors (like 90%) are men.
        
           | worrycue wrote:
           | I can see AI bots submitting convincing edits at random times
           | in no particular pattern. Eventually they will overwhelm
           | Wikipedia checks and balances.
        
           | tivert wrote:
           | >> I wish I had the time or facility to take a snapshot of
           | wikipedia now before the imminent deluge of Chat-GPT based
           | updates that start materially modifying wikipedia is some
           | weird and unpredictable manner.
           | 
           | > I'm unsure if this will happen. There's plenty of checks-
           | and-balances for Wikipedia edits.
           | 
           | I think it will. It's so tedious to edit Wikipedia (due to
           | bureaucracy and internal politics) that their editorial
           | population is in a long-term decline, which means their
           | oversight ability is declining too.
           | 
           | Probably what will happen is LLM generated content will creep
           | into long-tail articles, then work its way into more "medium-
           | profile" articles as editors get exhausted. The extremely
           | high-profile stuff (e.g. New York City), political
           | battleground articles (e.g. Donald Trump), and areas
           | patrolled by obsessives (railroads, Pokemon) will probably
           | remain unaffected by the corruption the longest. At some
           | point, the only way to resist will to become much more
           | hostile to new editors, but that's also long-term suicide for
           | the project.
           | 
           | I think they're painted into a corner.
        
         | oxguy3 wrote:
         | The Wikipedia community has generally been pretty resistant to
         | allowing fully AI-based tools in. We've had tools such as
         | Lsjbot (https://en.wikipedia.org/wiki/Lsjbot) in the past, but
         | they've failed to gain community consent on any of the large
         | Wikipedias. If someone tries to bring an LLM-based tool to
         | Wikipedia, it would take a lot of finesse to have any shot of
         | the community allowing it.
        
           | 1270018080 wrote:
           | But what about the whole mass of tech bros who don't
           | understand what LLM's are (random text generators and nothing
           | more), and manually start to add changes? It's a virus
           | polluting every industry.
        
           | bruce343434 wrote:
           | I don't think it takes much finesse to just randomly start
           | "improving" articles using the output of an LLM. It only
           | takes a single well meaning yet misguided person. Remember
           | this? https://www.theguardian.com/uk-news/2020/aug/26/shock-
           | an-aw-...
        
       | TillE wrote:
       | The trouble with Wikipedia is that it's an inch deep. For any
       | given topic, especially history, there's a trove of information
       | in scholarly books from good publishers, but the corresponding
       | Wikipedia article is like three pages long.
        
         | bsaul wrote:
         | Just an anecdotal experience : wikipedia is up-to-date with
         | regards to "recent" discoveries. Whereas books will always be
         | engraved with what was regarded as knowledge at the time of
         | writing.
         | 
         | Case in point : elder of zions protocol french article
         | contained outdated knowledge (it said we knew who the author
         | was, propagating an old hypothesis that had been debunked in
         | the last 10 years, eventhough the other wikipedia articles were
         | fixed). Historians were heard repeating that same boggus claim
         | on radio. Until i convinced an historian friend of mine to fix
         | the french article, and all of the sudden historians started
         | fixing their speech. Meaning not only did they not update their
         | knowledge from scholarly books, but they needed wikipedia to
         | help them get up to date.
        
           | quotemstr wrote:
           | New is not always better.
           | 
           | For example, from about 1960-2010, anthropologists
           | universally held a "pots, not people" view of prehistory:
           | they asserted, with great confidence, that styles of pottery
           | and metalworking changed over time due to voluntary exchange
           | of ideas among peaceful, cooperating peoples. These
           | anthropologists asserted that pre-1960 theories that pottery
           | styles changed because population groups violently replaced
           | each other were not only wrong, but immoral and barbaric. To
           | them, it was modernity that made humans violent.
           | 
           | Now due to ancient DNA, we know that the pre-1960s
           | anthropologists were right and the post-1960 consensus was
           | wrong: prehistory was violent and populations violently
           | replaced each other with regularity.
           | 
           | You're more informed reading, say, Gordon Child book from
           | 1920 than a serious book on prehistoric archaeology from
           | 2000.
           | 
           | So it goes in many fields. Imagine how much longer it would
           | take for science's self correction mechanism to operate if
           | our knowledge were encoded solely in a "living" information
           | system aligned with only currently fashionable ideas.
        
             | ragequitta wrote:
             | But wouldn't you agree reading about this topic now, with
             | the counter-argument of the post-1960 consensus (though I
             | have a hard time thinking most things debatable like this
             | are ever strictly consensus), and the follow-up DNA
             | evidence, is far more informative and convincing than what
             | you would read in 1920? It seems that the people guessing
             | from 1920 might've had about as much chance of being right
             | as the people guessing in 1960 with neither having the
             | relevant evidence to back their claim.
        
               | quotemstr wrote:
               | Come on: if you're excavating an ancient village and find
               | a layer of charcoal littered with arrowheads and skulls
               | and find totally different pottery before and after the
               | charcoal layer, then unless your brain has been
               | codrycepted by fashionable academic nonsense, you're
               | going to conclude that someone conquered that village and
               | replaced its people --- not that the charcoal layer
               | represents some kind of ceremonial swords-to-plowshares
               | peaceful pottery replacement ceremony. For 50 years,
               | academics insisted on the latter interpretation. If you'd
               | read old books, you'd know the post-1960s consensus was
               | nonsense even without ancient DNA. Ancient DNA merely
               | created a body of evidence so totally compelling that not
               | even diffusionists (the "pots not people" crowd) could
               | stick to their stories and keep a straight face.
        
         | yieldcrv wrote:
         | Check the wiki page in another language, closer to the affected
         | area.
         | 
         | Its not a direct translation, its an entirely different
         | encyclopedia and can be far more robust.
         | 
         | (Maybe an LLM could harmonize all the wikipedias across
         | languages)
        
         | curiousllama wrote:
         | I wonder if that's the next iteration of Wikipedia. Right now,
         | the model is to summarize secondary sources. Once summarization
         | becomes trivial via LLMs, the most valuable thing to do would
         | be to assemble ever-expanding datasets of secondary sources for
         | the LLM to pull from.
        
         | klyrs wrote:
         | My hope in this regard is that Wikipedia pages tend to have
         | much more than an inch of citations. If even a significant
         | fraction of those sources can be digested, it could give rise
         | to a much deeper source. The really cool thing about their
         | chatbot is that it appears to have the ability to summarize and
         | highlight where the summaries came from. Extending that to the
         | ability to summarize the backing sources, and point to where
         | _that_ came from, could be an incredible research tool.
        
         | ilyt wrote:
         | Well, it has link that it can follow.
         | 
         | Add some AI to take the footnotes and get to the sources and
         | train on that.
        
         | the_af wrote:
         | Is it wrong that Wikipedia articles are only three pages long?
         | Does anybody claim that reading an encyclopedia article
         | (Wikipedia, Britannica or whatever) is better than reading a
         | scholarly book on the given topic?
        
           | Aerbil313 wrote:
           | Some people just demand that Wikipedia be a universal factual
           | info database, missing nothing. It'd be nice, though.
        
             | ghaff wrote:
             | There really isn't such a thing though for many topics as a
             | universal factual info database. For many, e.g. historical,
             | topics different books have different areas of focus and
             | interpret events differently. Encyclopedias do to a certain
             | degree (and historical "truth" may shift over time) but, in
             | general, they're not the place to hash out the "right"
             | interpretation of events.
        
               | LawTalkingGuy wrote:
               | > historical, topics different books have different areas
               | of focus and interpret events differently
               | 
               | In those areas the fact isn't the base fact, but claims
               | of fact. We don't know who explored which bit of the
               | great pyramid in which order, and may never, but we know
               | of many specific claims.
               | 
               | The fact check wouldn't be "The great pyramid X" but
               | "Herodotus said X about the great pyramid".
               | 
               | > in general, they're not the place to hash out the
               | "right" interpretation of events.
               | 
               | Once you scope the problem correctly it's not a problem.
               | The point isn't to solve historical riddles, it's to
               | document what evidence we have.
               | 
               | Sometimes that evidence is broadly accepted measurements
               | (land area of Australia) and other times it's not.
        
           | NegativeK wrote:
           | People with just a Wikipedia-level knowledge will argue with
           | actual experts as if Wikipedia is equivalent.
           | 
           | Of course, you can't really say that distrusting experts is
           | unique to encyclopedias.
        
             | harshreality wrote:
             | If there's something in a wikipedia article that experts
             | will argue against, the article needs updating to be
             | compatible with, even if does not include, expert-level
             | knowledge.
        
             | the_af wrote:
             | But isn't that the same as people who argue because "they
             | read it on a magazine/on the newspaper"? So they are wrong
             | -- is it Wikipedia's fault though?
             | 
             | An encyclopedia is always the starting point, never the end
             | of serious research. (It's ok however to stick to Wikipedia
             | if a superficial acquaintance with the topic is enough!).
        
         | mcv wrote:
         | Still deeper than most people, though. You can't put all of
         | human knowledge in Wikipedia, but it's extremely thorough in
         | the basics.
        
         | mfer wrote:
         | Wikipedia is a summary. Is it meant to be deep? If you want to
         | go deep on any topic you'll need to go to other sources.
        
         | Wissenschafter wrote:
         | Isn't that literally the point of an encyclopedia? A starting
         | point, it's the abstract on the subject if you will.
        
           | seydor wrote:
           | it has a good format though and would be nice to have a
           | second level of scholarship (e.g scholarpedia). Modeling
           | itself after encyclopedia would be regressive
        
           | michaelt wrote:
           | In discussions about "deletionism" I've seen people argue
           | that, disk space being cheap, Wikipedia should try to be much
           | more expansive than an encyclopedia.
           | 
           | A paper encyclopedia might not have time or space for
           | individual entries about many hundreds of pokemon, episodes
           | of the simpsons, or characters from star wars.
        
             | [deleted]
        
           | thefifthsetpin wrote:
           | I don't think that TillE meant "the problem with wikipedia is
           | that wikipedia is an inch deep." I think TillE meant "the
           | problem with training a chatbot grounded with wikipedia is
           | that wikipedia is only an inch deep."
        
           | tivert wrote:
           | > Isn't that literally the point of an encyclopedia? A
           | starting point, it's the abstract on the subject if you will.
           | 
           | Yes, but Wikipedia is also frequently conceived and marketed
           | as "the sum of all human knowledge," which that shows is a
           | lie by definition.
        
             | humanistbot wrote:
             | > "the sum of all human knowledge," which that shows is a
             | lie by definition.
             | 
             | By which definition? In math, the sum of a set necessarily
             | implies a loss of information about that set, for sets
             | larger than 1. But they're using "sum" not in the purely
             | mathematical sense, more like "the summary of all human
             | knowledge". But the same principle applies axiomatically,
             | because summaries are lossy compression: you cannot have a
             | summary that contains all the information of the source it
             | is summarizing.
        
               | tivert wrote:
               | >> "the sum of all human knowledge," which that shows is
               | a lie by definition.
               | 
               | > By which definition?
               | 
               | That's pretty easy: definition 2 "the whole amount :
               | aggregate" (https://www.merriam-
               | webster.com/dictionary/sum). That it's interpreted that
               | way is shown by the frequency of people saying stuff like
               | "I loaded Wikipedia onto this battery powered Raspberry
               | Pi in a Pelican case, now I'm ready to rebuild
               | civilization if it collapses," and seemingly believing
               | it.
               | 
               | But you do correctly point to another issue: sum has a
               | meaning of "a summary of the chief points or thoughts,"
               | which I feel is a less common usage. So the marketing
               | phrase may not be so much a lie, but rather an _extremely
               | misleading_ statement that invites misinterpretation that
               | usually goes unchallenged. IMHO, those are actually even
               | more pernicious than outright lies.
        
               | kube-system wrote:
               | > people saying stuff like "I loaded Wikipedia onto this
               | battery powered Raspberry Pi in a Pelican case, now I'm
               | ready to rebuild civilization if it collapses," and
               | seemingly believing it.
               | 
               | The least delusional part of this is the sparseness of
               | information contained within Wikipedia. If that scenario
               | came to be, they wouldn't be short on information. They'd
               | be short on time, resources, and skills.
        
               | humanistbot wrote:
               | > the frequency of people saying stuff like "I loaded
               | Wikipedia onto this battery powered Raspberry Pi in a
               | Pelican case, now I'm ready to rebuild civilization if it
               | collapses," and seemingly believing it.
               | 
               | As someone who inched closer into the doomsday prepper
               | scene before swerving far away from it, I assure you that
               | people in that subculture have a lot of unrealistic
               | beliefs about their own capacities and resources. I don't
               | think it's Wikipedia's fault that they (and you) are
               | taking a quote about Wikipedia's never-ending goal and
               | interpreting it as if it is their description of what
               | they are.
               | 
               | An even worse example of deceptive marketing would be a
               | compact folding multitool marketed as "the only tool
               | you'll ever need." Even with that, I'd say that if you
               | actually believe that you can rebuild civilization with
               | that tool solely on the basis of that marketing slogan,
               | then that's your fault as much as it is the marketers.
               | 
               | And a minor nitpick: the standard prepper info archives
               | also include collections of various survival guides and
               | resources that are specifically written for these kinds
               | of purposes.
        
             | karaterobot wrote:
             | It sounds like your issue is with someone describing
             | Wikipedia as the sum of all human knowledge, not with
             | Wikipedia itself, which is what the person to whom you're
             | replying seemed to be saying.
        
             | sdht0 wrote:
             | Wikipedia also has numerous sister projects like Wikibooks
             | and Wikiversity (including open access WikiJournal) which
             | aim to fill in the details. All these project taken
             | together can indeed fulfill the *aspirational* goal of
             | noting down all human knowledge. If we ever reach there is
             | of course upto us.
        
             | jjoonathan wrote:
             | If we could condemn a thing due to hype alone, we would
             | condemn all that is good in the world.
        
             | neilk wrote:
             | Jimmy Wales said that phrase in an interview, but it was
             | never meant to say that Wikipedia itself was the only work
             | that needed to be consulted.
             | 
             | https://en.m.wikipedia.org/wiki/Wikipedia:Prime_objective
             | 
             | You seem to have an issue with some person or persons who
             | has been advising others that the only work they need to
             | consult is Wikipedia. Who are they? Specifically.
        
         | narag wrote:
         | Please, before assuming you know what I mean, read the complete
         | comment.
         | 
         |  _The trouble with Wikipedia is that it 's an inch deep._
         | 
         | That's not its biggest problem. Wikipedia is biased.
         | 
         | Of course there's political bias in the American fashion. But
         | that's not all. There is bias about History depending on what
         | country is telling the story. And there is a strong bias even
         | in scientific topics (maybe specially in them) when there are
         | commercial interests involved.
         | 
         | That's not specific to Wikipedia.
         | 
         | But when you research some topic, reading multiple books,
         | you'll notice there are different opinions, you learn to
         | discount bias looking at the procedence. Wikipedia tries to
         | adopt a neutral tone and cite different sources, but sometimes
         | it does a terrible job at it.
        
           | nomel wrote:
           | > There is bias about History depending on what country is
           | telling the story.
           | 
           | A funny example of this is fan death [1]. Comparing English
           | to southeast Asian languages show that Asian languages pages
           | suggest that it's real (at least last few times I translated
           | it).
           | 
           | An, in a bit of relevancy, the Japanese page has been
           | overwritten with "I love you" and "I'm sorry".
           | 
           | [1] https://en.wikipedia.org/wiki/Fan_death
        
         | jojobas wrote:
         | Can there be factual correctness from being grounded in
         | scholarly books as such?
         | 
         | The amount of disagreement between researchers over time and
         | changing consensus requires an external arbiter of individual
         | facts at the very least.
        
         | xiphias2 wrote:
         | Maybe a Wikipedia based LLM could make decisions about which
         | papers are factual enough to include in a more extensive LLM.
        
           | joeframbach wrote:
           | You'll end up with a corpus consisting entirely of paywall
           | text and 404 pages.
        
         | janalsncm wrote:
         | Wikipedia is not some thing handed down by God at the beginning
         | of time. It's a work in progress by volunteers. If you think a
         | page lacks depth, you're free to update it.
        
         | wing-_-nuts wrote:
         | I don't think that's a fair criticism of wikipedia. Summarizing
         | knowledge is literally the job of an encyclopedia. There was a
         | reason all my professors in college told us to use wiki as a
         | jumping off point for further reading in the citations.
        
           | Closi wrote:
           | > I don't think that's a fair criticism of wikipedia.
           | 
           | It's not a fair criticism of Wikipedia, but it is a fair
           | criticism of using Wikipedia as a single-source.
        
           | mcpackieh wrote:
           | Once you stray off the popsci/undergrad topics, wikipedia's
           | summation of knowledge is often a few sentences _if any_.
           | Topics for which numerous books have been written may get
           | only one or two sentences on wikipedia, so I think it 's fair
           | to say that wikipedia is an inch deep. Maybe a few inches
           | deep since popular topics do get longer articles, but the
           | long tail of knowledge gets very shallow coverage on
           | wikipedia.
        
             | downWidOutaFite wrote:
             | If you're knowledgeable about missing topics that's a
             | perfect opportunity to give back to wikipedia and write the
             | article yourself.
        
               | mcpackieh wrote:
               | Often it's a topic I don't yet know much about. I hear
               | about a topic and search for it, and find a disappointing
               | wikipedia stub. I continue my search to find there are
               | numerous books and research papers about the subject.
               | After reading those my intellectual curiosity may be
               | satisfied, but I wouldn't consider myself an expert and I
               | also don't have any inclination to go back and write a
               | proper wikipedia page.
        
               | CSMastermind wrote:
               | I generally agree with you and I will say that my
               | experience contributing to wikipedia has been extremely
               | pleasant. The community does a good job of making
               | newcomers feel welcome even if you make mistakes.
               | 
               | With that said I've seen two areas where contributing to
               | wikipedia falls short:
               | 
               | The first is things involving what I'd call
               | editorialization (I'm sure there's some wikipedia term
               | for it). Any article that's about an unsettled or
               | somewhat contentious issue seems to give outsized weight
               | to the non-consensus view. Even if 85% of a field thinks
               | that one thing is more likely than the other the
               | wikipedia article will often split its coverage of the
               | views 50/50 and then maybe tack a sentence on at the end
               | saying that the majority of people in the field favor xyz
               | view.
               | 
               | Contributing to or changing those pages is often a hassle
               | because you have to argue with people and that's
               | generally not worth your time (unless you're of the
               | minority opinion and you want to give legitimacy to your
               | side - in which case you are motivated to argue).
               | 
               | The second are the stub articles. The ones that say "This
               | article is a stub. You can help Wikipedia by expanding
               | it." Often I could help wikipedia by expanding it but
               | it's so much work to write a full encyclopedia entry.
               | Like it might take me 4 hours to summarize what I know,
               | look up references, etc. It's easier to just not do it.
               | 
               | Where I find contributing useful is when I'm fixing a
               | small factual error, updating based on a recent
               | discovery, fixing a citation, etc.
               | 
               | I'm not sure if they do it but it would be good for
               | Wikipedia to pay someone to go through and fill out the
               | basics of a bunch of pages on a topic so that there's a
               | scaffolding to work with and then the occasional
               | volunteers could come through and add on facts and fix
               | problems.
        
           | arp242 wrote:
           | I think it's a fair criticism; or rather, an important
           | limitation one needs to keep in mind. Wikipedia articles can
           | miss out on a great deal of nuance and context, which can
           | matter a great deal.
        
             | thebooktocome wrote:
             | Worse than that, motivated editors often color their pet
             | pages with specific nuance and context, and no casual
             | editor has a hope of winning an edit war against such
             | opposition.
             | 
             | My favorite example of this is the debate between a faction
             | that believes Lithobates is the proper genus of a certain
             | set of frogs, and another faction that believes the correct
             | genus is Rana. The Lithobates side is essentially one
             | person along with his sock and/or meat puppets, so in the
             | end, after many rounds of moderation, most of the species
             | in question are listed under both genuses.
        
             | jurimasa wrote:
             | [flagged]
        
         | weregiraffe wrote:
         | Wikipedia is full of references to scholarly sources. Make a
         | bot that follows the references to the sources and incorporates
         | them in the training data, and Bob's your uncle.
        
       | jefftk wrote:
       | There are multiple studies discussed on this page; the one we're
       | looking at is partway down the page, under "Wikipedia-based LLM
       | chatbot "outperforms all baselines" regarding factual accuracy".
       | This link will take you there if your browser supports scrolling
       | to text fragments:
       | https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...
        
         | kemayo wrote:
         | This link will take you there regardless of text-fragment-link
         | support:
         | https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...
        
       | indymike wrote:
       | This article was more about open the value of open access when
       | publishing research being 15% more likely to be cited on
       | Wikipedia. The AI part was somewhat weak as it did not compare
       | against ChatGPT.
        
         | ot wrote:
         | The layout is very confusing, but the page is a review of
         | various recent research papers about Wikipedia and the title
         | references one of them, search for the section titled
         | 
         | > Wikipedia-based LLM chatbot "outperforms all baselines"
         | regarding factual accuracy
        
           | dr_dshiv wrote:
           | How does this make sense? Search to find it on the page??
        
         | [deleted]
        
       | edgyquant wrote:
       | How does one even go about testing such a thing? Comparing it to
       | Wikipedia articles? Even if it is factual does it spew the
       | interpretations present in most Wikipedia articles?
        
         | the_af wrote:
         | The abstract of the article explains what they mean: they mean
         | the LLM does not hallucinate (so much) and provides facts
         | _based_ on Wikipedia. Absolute  "truth" is not measured;
         | rather, they measure how much the chatbot "sticks to the known
         | facts" within Wikipedia. Since they are measuring this,
         | presumably other chatbots and LLMs tend to hallucinate much
         | more, providing "facts" not supported _by their training data_.
        
           | em-bee wrote:
           | now that is the interesting bit really. what makes the
           | wikipedia based LLM hallucinate less?
           | 
           | the only thing i can think of ad hoc is that wikipedia
           | contains less conflicting or unclear information which helps
           | to avoid the LLM getting confused. also the information is
           | more organized, and it is clear which articles relate to each
           | other.
           | 
           | this would show what i think we already knew that LLMs can
           | summarize the data they get but they can not evaluate or
           | verify it.
        
           | jojobas wrote:
           | The most irritating effect is that the LLM somehow guesses
           | what you want it to respond. Human-in-the-loop training is
           | imperfect.
        
         | commandlinefan wrote:
         | After having seen what professional "fact checkers" accept as
         | fact (and reject as misinformation) makes me similarly
         | skeptical.
        
           | the_af wrote:
           | They are not fact checking "truth" but whether the chatbot
           | spouts "facts" supported by Wikipedia. This is objective and
           | much easier to check than capital letter Truth. Consider that
           | when discussing ideological or political articles, what is
           | "true" becomes nebulous.
        
       | [deleted]
        
       | dr_dshiv wrote:
       | Here is a direct link to the arxiv article:
       | https://arxiv.org/abs/2305.14292
       | 
       |  _WikiChat: A Few-Shot LLM-Based Chatbot Grounded with Wikipedia_
        
         | CSMastermind wrote:
         | I'm not a scientist but isn't it suspect that they're both
         | creating a new bot and a new evaluation metric for bots at the
         | same time?
         | 
         | Like we invented this new thing and this new measurement for
         | evaluating it. It does great on the metric we just made up
         | while we were making it.
        
           | [deleted]
        
           | theptip wrote:
           | No, it's not suspect in and of itself. Often you need to
           | develop a new benchmark when solving a new problem. It's
           | common to see this in software engineering/CS papers too.
           | 
           | Of course, one should always be critical of benchmarks, and
           | there is an obvious opportunity for bias here that should be
           | reviewed with care. But your phrasing suggests that this is
           | unusual or actively suspicious, which it is not.
        
           | rjtavares wrote:
           | We invented the Turing Test decades ago. Since it became
           | irrelevant with ChatGPT [1], we need new tests.
           | 
           | [1]: We can discuss if ChatGPT passes the Turing Test or not,
           | but I think we can now all agree that being able to have a
           | convincing conversation is not a good test for intelligence.
        
             | jpadkins wrote:
             | [1] I disagree. I think we can agree there needs to be a
             | refinement on the definition of intelligence, but I think
             | LLMs passed the 1950 definition of general machine
             | intelligence.
        
               | [deleted]
        
           | htag wrote:
           | Here is this new thing, and here is how it is different than
           | anything else.
        
       | sam0x17 wrote:
       | You know you could probably get really far just training an LLM
       | on wikipedia and all linked citations, and nothing else
       | 
       | The whole problem of wikipedia only being an "inch deep" on any
       | given topic is basically solved if the LLM also has to read every
       | cited work in full
       | 
       | And maybe citation counts could affect exposure to that work
       | during training
        
         | kernal wrote:
         | [flagged]
        
           | flangola7 wrote:
           | That's a strong accusation. What evidence do you have? What
           | are the "ideological views" and why do you think they are
           | baseless?
        
         | jabbany wrote:
         | A lot of full text for research (outside CS) is still locked up
         | behind subscription paywalls. Plus, often times PDFs are not
         | the best format to extract text out of.
         | 
         | Interesting suggestion but probably a lot of practical
         | limitations.
        
         | slg wrote:
         | > and all linked citations
         | 
         | I wonder what percentage of Wikipedia citations are actually
         | currently available on the internet. For example, here is
         | today's featured article[1]. The majority of references on that
         | pages are books, journals, magazines, television, and unlinked
         | news articles that can't be easily accessed. Plus on more niche
         | topics, it is common for the externally linked references to
         | disappear over time.
         | 
         | [1] -
         | https://en.wikipedia.org/wiki/David_Kelly_(weapons_expert)
        
           | harshreality wrote:
           | archive dot org and dot is, sci-hub and libgen / zlib will
           | cover a lot of those _text_ sources. Aren 't bots largely
           | what's responsible for links being updated to point to
           | archived sources? I've noticed archive links a lot lately.
           | 
           | Someone doing serious AI training will mirror sci-hub and
           | libgen first, so they'll already have a fair amount of the
           | (good quality) referenced papers and educational books.
           | 
           | Wikipedia (and citation count on google scholar for papers)
           | could be used as a filter for which books and papers to train
           | on first.
        
         | wizofaus wrote:
         | The ability to predict the expected answer to a given question
         | isn't something I could see naturally falling out of those
         | sources though, unlike an LLM trained on text from online
         | forums and the like.
        
         | awb wrote:
         | That's basically Google's PageRank algorithm with Wikipedia as
         | the 10/10 ranked source of truth.
        
       | Julesman wrote:
       | We can talk about alignment of LLMs. We can also talk about
       | alignment of people who write Wikipedia. To imagine there is no
       | bias is foolish and dangerous. More accurate isn't truth. More
       | accurate for whom?
        
         | msla wrote:
         | Everything is biased. Everyone, every single human being, is
         | aligned.
         | 
         | That said, bias towards, and alignment with, verifiable reality
         | is possible to achieve, and getting there partway is better
         | than not at all:
         | 
         | https://hermiene.net/essays-trans/relativity_of_wrong.html
         | 
         | > [W]hen people thought the Earth was flat, they were wrong.
         | When people thought the Earth was spherical, they were wrong.
         | But if you think that thinking the Earth is spherical is just
         | as wrong as thinking the Earth is flat, then your view is
         | wronger than both of them put together.
        
           | mellosouls wrote:
           | _Everything is biased. Everyone, every single human being, is
           | aligned._
           | 
           | Of course but as discussed many times here before, Wikipedia
           | leans left - presumably reflecting the statistical properties
           | of the demographic of the people drawn to edit and moderate
           | it - as implied by the comment you are replying to - and that
           | can be a significant issue for topics (e.g cultural,
           | historical, social, political etc) where that bias filters
           | what might be assumed by the user to be objective answers.
           | 
           | This isn't a left vs right thing either; there are plenty of
           | publications, demographics and institutions that lean right.
           | The problem is the transparency, awareness and communication
           | of that bias when using them as sources for tools like this.
           | 
           | In the underlying study, there is no mention of the word
           | "bias".
           | 
           | Here's a sample quote which is also concerning:
           | 
           |  _For recent topics, we look at the most edited Wikipedia
           | articles 1 in the first four months of 2023. The number of
           | edits is also a good proxy for how much interest there is
           | around a certain topic._
           | 
           | True - and it may also be an indication of a topic that is
           | heavily contested. If the two (or more) views on the "truth"
           | of the article are imbalanced, the chatbot will reflect that
           | imbalance, and can therefore in no way be said to "outperform
           | all baselines on factual accuracy".
           | 
           | To be fair to the researchers, they do address related
           | concerns and talk about avoiding some areas of discussion,
           | but the headline here is extremely misleading.
        
             | denton-scratch wrote:
             | > Wikipedia leans left
             | 
             | That is subjective, and depends on where you think the
             | "centre" is.
             | 
             | I don't regard Wikipedia as reliable on any topic that is
             | political or involves national history. Modern Wikipedia
             | expects editors to support their edits with citations to
             | "reliable sources", which means the mainstream press,
             | mainly (because primary sources are deprecated). But the
             | mainstream press is overwhelmingly right-wing, and left-
             | wing papers and magazines are usually explicitly rejected
             | as not reliable.
             | 
             | On matters of politics and history, I always dig into the
             | citations (unless I'm happy to get a sketchy version that
             | isn't really accurate). But on most technical and
             | humanities-based topics, the articles are usually quite
             | good (and often much deeper than 1").
             | 
             | There's still way too much stuff in articles that is not
             | cited at all. That changes gradually, as editors delete
             | uncited material, and others come along with suitable
             | citations. I think it's getting better all the time.
        
               | mellosouls wrote:
               | _That is subjective, and depends on where you think the
               | "centre" is._
               | 
               | Not at all. Even wikipedia itself acknowledges it [1] -
               | and you can bet the editors responsible for the bias were
               | fighting tooth and nail against that admission - which
               | gives some idea how unbalanced it must be in reality.
               | 
               |  _Modern Wikipedia expects editors to support their edits
               | with citations to "reliable sources", which means the
               | mainstream press..._
               | 
               | And academia - don't forget academia, that bastion of the
               | right.
               | 
               |  _...the mainstream press is overwhelmingly right-wing_
               | 
               | That's ridiculous - The Guardian?? The Washington Post?
               | New York Times?
               | 
               | I think you've made a point about Wikipedia though, but
               | perhaps not the one you intended...
               | 
               | [1] https://en.wikipedia.org/wiki/Ideological_bias_on_Wik
               | ipedia
        
       | hammock wrote:
       | "Breaking: Trivia bot trained on the dictionary spells words
       | better than trivia bot trained on high school English papers"
        
         | [deleted]
        
         | bloqs wrote:
         | The religious level hype that minor incremental and obvious
         | improvements to existing technologies gets are patently absurd.
        
         | l5870uoo9y wrote:
         | It is an important information that you don't really need
         | petabytes of common crawl data to make a highly accurate bot.
         | There are a few other open source models that preform well with
         | significantly smaller training data that OpenAI.
        
           | ramesh31 wrote:
           | >It is an important information that you don't really need
           | petabytes of common crawl data to make a highly accurate bot.
           | There are a few other open source models that preform well
           | with significantly smaller training data that OpenAI.
           | 
           | Sure, but the tradeoff is in generalization vs
           | specialization. No one is impressed by the fact that ChatGPT
           | is able to recite facts. Google can do that. Where it becomes
           | interesting is in the general applicability of a single tool
           | to thousands of possible domains.
        
           | JimmyAustin wrote:
           | That isn't what is being described here. They are just
           | providing additional context to ChatGPT using its plugin API.
           | It's still trained on large amounts of public text data.
        
         | im3w1l wrote:
         | I think there is a subtle hindsight bias here. Like if you
         | asked someone yesterday "would grounding a chatbot on wikipedia
         | make it do better?" I think many people would say that it
         | sounds quite plausible. But if you ask instead "what are your
         | top 10 ideas for making chatbots better at facts?" then it may
         | not be so obvious.
        
         | esjeon wrote:
         | They are not bragging about the bot. They are bragging about
         | how great the dictionary is. There's this subtle difference in
         | the context.
        
           | rvnx wrote:
           | They are also the ones judging what is truth and neutrality.
           | 
           | This is equivalent to saying:
           | 
           | "A bot trained on the articles that we have written gives the
           | answers that the writers of the articles expected"
        
             | mcpackieh wrote:
             | esjeon is simply wrong; this study is not touting the
             | accuracy of wikipedia's knowledge. It's touting their bot's
             | ability to accurately convey wikipedia's knowledge. It's
             | very much about the qualities of their bot, not the
             | qualities of wikipedia.
             | 
             | https://arxiv.org/pdf/2305.14292.pdf
        
               | lacksconfidence wrote:
               | It's both. My interpretation is that the study is as you
               | say, but it's posted on the wikipedia signpost for the
               | reasons esjeon says.
        
               | TZubiri wrote:
               | Wikimedia controls what goes in the signpost, they are
               | very different from the actual community, and they are
               | known for misrepresenting the intent of a project.
               | 
               | When the Ukraine Russia war came out, they made a banner
               | about a ukranian translation project that had existed for
               | years and made it look like some project to support
               | Ukraine, effectively breaking neutrality on the war
               | subject (which was unrelated to the translation project.)
        
             | notyourwork wrote:
             | The old self-fulfilling prophecy.
        
             | dmix wrote:
             | It's also largely going to be the sum of it's sources since
             | most (contentious) arguments on Wikis come down to who can
             | cite the most articles, assuming edits get challenged in
             | the first place.
             | 
             | Wikipedia maintains a list of 'reliable' news sites:
             | 
             | https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Pe
             | r...
        
               | Jorge1o1 wrote:
               | Wow, what a list. Eye-opening really.
               | 
               | The American Conservative (yellow) ==> The American
               | Conservative is published by the American Ideas
               | Institute, an advocacy organisation. It is a self-
               | identified opinionated source whose factual accuracy was
               | questioned and many editors say that The American
               | Conservative should not be used as a source for facts.
               | 
               | The New Republic (green) ==> There is consensus that The
               | New Republic is generally reliable. Most editors consider
               | The New Republic biased or opinionated. Opinions in the
               | magazine should be attributed.
               | 
               | This seems like a somewhat arbitrary double standard to
               | be applying. As a reader of both news sources they are
               | both biased, opinionated sources, and I don't think you
               | can trust one more than the other. But one is green with
               | "be careful this might be biased" and the other is yellow
               | for pretty much the same reason.
        
               | hughesjj wrote:
               | Bias != Reliability.
               | 
               | There's a reason "The Atlantic" is listed green even
               | though it's conservative. Hell they list the Christian
               | Science monitor as green for reliability (as they should
               | imo), I don't think Wikipedia is demonstrating a bias
               | based on any particular ideology in their sources on this
               | list.
               | 
               | This wiki list is a list of sources by reliability. If
               | you only publish stories which support your bias, but
               | those stories are scientifically sound and don't omit
               | context, I don't see the problem with using them as a
               | source regardless of bias.
               | 
               | If you only allow sources from reliable sources aligned
               | with a particular bias to the exclusion of reliable
               | sources from another alignment, that would be an issue,
               | but I don't see evidence of such here.
               | 
               | The problem isn't the bias. The problem is the
               | factuality.
        
               | mr_toad wrote:
               | * * *
        
               | next_xibalba wrote:
               | > There's a reason "The Atlantic" is listed green even
               | though it's conservative.
               | 
               | The Atlantic is in no way conservative leaning.
               | 
               | https://www.allsides.com/news-source/atlantic
               | 
               | https://adfontesmedia.com/the-atlantic-bias-and-
               | reliability/
               | 
               | https://mediabiasfactcheck.com/the-atlantic/
               | 
               | https://www.biasly.com/sources/the-atlantic-bias-rating/
        
               | mandmandam wrote:
               | Wikileaks on zero retractions, and most anti-war news
               | sources: red / black.
               | 
               | Lest anyone think the problem is that Wikileaks is 'left
               | biased'.
        
               | Jorge1o1 wrote:
               | I think the biggest issue is not even the left/right or
               | political bias of Wikipedia but rather the fact that some
               | committee of wiki editors decide along what seem to be
               | fairly arbitrary/subjective lines that some sources are
               | reliable and others aren't.
               | 
               | And then those claims make their way into Wikipedia where
               | they inevitably (even though they shouldn't) are relied
               | upon by students, politicians, journalists, who then
               | perpetuate the claim.
               | 
               | https://xkcd.com/978/
        
               | [deleted]
        
               | wpietri wrote:
               | It's not a committee, it's not arbitrary, and "arbitrary"
               | and "subjective" mean two very different things.
               | 
               | Reliability from a fact-checking perspective is a pretty
               | specific thing, and a thing that is vital to Wikipedia as
               | an open-source, anyone-can-edit encyclopedia. This can
               | correlate with political views in particular times and
               | places, but does not broadly correlate with either left
               | or right. E.g., after the Russian revolution, we saw the
               | left using Pravda as a vehicle to "indoctrinate" and
               | "encourage unity of thought". [1] But a significant part
               | of the current US right has frequently taken the approach
               | of "flooding the zone with shit" [2].
               | 
               | [1] https://www.britannica.com/topic/Pravda
               | 
               | [2]
               | https://www.google.com/search?q=flood+the+zone+with+shit
        
               | mcpackieh wrote:
               | More like it's WASP establishment biased. Like the
               | NYTimes.
        
               | mr-ron wrote:
               | 'factual accuracy was questioned' vs 'The New Republic is
               | generally reliable'
        
               | mcpackieh wrote:
               | Heh. The glass is missing some water vs the glass is
               | mostly full of water.
               | 
               | (Not commentary on either of those media orgs btw, I
               | don't follow nor have any opinion on either of those one
               | way or the other.)
        
               | michaelmrose wrote:
               | Have you considered that conservative sources have always
               | been less accurate by dint of failure to accept new data
               | that contradicts existing bias.
               | 
               | You can argue that all parties have biases but if you
               | look at modern conservatism it's worldview is
               | increasingly wildly divergent from reality. If your
               | publication desires the readership of people who are
               | obliged to stand in a puddle and deny being wet you shall
               | have to follow them at least to the perimeter of
               | Neverland and spend at least some of your breath speaking
               | of pirates and fairies. Mentioning the puddle will also
               | be verboten.
               | 
               | Reading several of the articles on the front page I noted
               | a completely incoherent takes on Ukraine and birth
               | control for instance. It's not the outright horror show
               | of Fox news nor is it what one would consider objective
               | or news. It's essentially 100% op eds by your least
               | incoherent older relative.
        
               | jjav wrote:
               | > the other is yellow for pretty much the same reason
               | 
               | I have read neither, so don't have an opinion on them.
               | 
               | But going by the descriptions quoted, it doesn't seem to
               | be for the same reason.
               | 
               | Both are listed as biased/opinionated, but for The
               | American Conservative it additionally says "factual
               | accuracy was questioned", which would make it less
               | trustworthy as a reference.
        
               | TZubiri wrote:
               | Also see en.wikipedia.org/wiki/FUTON_bias
               | 
               | There's also some policy pages that talk about othet
               | potential biases like technical biases. There is
               | awareness
        
           | thunkshift1 wrote:
           | X doubt
        
         | zby wrote:
         | This sounds like a paradox - but it is not. You don't give the
         | bot the answer you expect - but only the ground facts, it
         | generates the answer based on these facts by itself.
         | 
         | This is a RAG system - and you need to treat it as a whole - it
         | is an question answering machine that remembers the whole
         | wikipedia.
         | 
         | By the way just today I wrote a blogpost about the common
         | misconception that to teach an LLM new facts you need to
         | finetune it: https://zzbbyy.substack.com/p/why-you-need-rag-
         | not-finetunin...
        
       | sandworm101 wrote:
       | Lol, I wonder how many of their fact checkers silently used
       | Wikipedia to verify the facts outputted by the AI.
        
       | EGreg wrote:
       | And how is factual accuracy determined? Using the exact same
       | sources as Wikipedia, right?
        
       | [deleted]
        
       | freitzkriesler2 wrote:
       | [citation needed]
       | 
       | And no you can't cite Wikipedia ;)
        
         | the_af wrote:
         | Easy-peasy, here's the citation:
         | https://arxiv.org/abs/2305.14292
         | 
         | Looks like a legit paper.
        
       | guestbest wrote:
       | That value was calculated and verified using Wikipedia?
        
       | ogou wrote:
       | "this first draft is probably not solid enough to be cited in
       | Wikipedia"
        
       | tcbawo wrote:
       | I hope somebody makes a game of Trivial Pursuit with generated
       | questions sourced from Wikipedia.
        
       | ec109685 wrote:
       | "All baselines" is doing a lot of heavy lifting in that sentence.
        
       | spacephysics wrote:
       | This is great and all, but we still run into the problem with
       | political biases embedded in the source data [0]
       | 
       | Musk's AI's aim is to get to the truth, not eliminate biases
       | retroactively. I think that's a noble goal, politics aside.
       | 
       | I agree with him that teaching an AI to lie is a dangerous path.
       | Currently it's probably not exactly akin to lying, but it's close
       | enough to be on that path.
       | 
       | We should find a way to feed source material from all "biases" if
       | you will, and have it produce what's closest to reality. It's
       | obviously easier said than done, but I don't think the AI Czar VP
       | Harris aims to do this.
       | 
       | If we're too divided or hellbent on pushing our own agenda, it'll
       | be a bad outcome for all.
       | 
       | Unfortunately the differences we have are at a very fundamental
       | level that really is a question of how reality is perceived, and
       | what we consider meaningful. The difference of if something by
       | its nature has meaning, or if we give meaning to it
       | culturally/societally.
       | 
       | The former is a more "conservative" (personality wise, not
       | political) view.
       | 
       | The later is more of, "everything that has meaning is based off
       | the meaning we say it has, thus we can ascribe the level of
       | meaning to that or other things as we wish". The idea that many
       | things are social constructs, and we can change those as we wish
       | to craft what we'd like to see.
       | 
       | I'm probably doing a poor job of wording it, but this fundamental
       | difference in perception is going to very quickly be at the
       | forefront of AI ethics.
       | 
       | [0]
       | https://en.m.wikipedia.org/wiki/Ideological_bias_on_Wikipedi...
        
         | mrangle wrote:
         | The problem that Musk is going to run into is that civilization
         | blossoms from deeply rooted lies.
         | 
         | Apart from necessary lies that lie at the core mechanics of
         | civilization, anything remotely political has been long been
         | vulnerable to outrageous grand lies that enjoy as much pressure
         | as it takes to maintain them. Wikipedia is valuable apart from
         | any political topic. More topics are political than many would
         | believe.
         | 
         | They are going to make AI lie, as there isn't a choice in the
         | matter. One major future problem will be the strategic war
         | (military, business, etc) advantage of AI that is beyond the
         | reach of censors. The reasonably accurate conclusion is likely
         | that private and DoD AI won't be trained on lies, but all
         | others will be.
        
           | fallingknife wrote:
           | Any lie that you can identity as a lie, an AI can be trained
           | not to tell.
        
             | akolbe wrote:
             | By the same token, any AI can be trained to withhold any
             | truth identified as inconvenient. :/
        
         | nitwit005 wrote:
         | > We should find a way to feed source material from all
         | "biases" if you will, and have it produce what's closest to
         | reality.
         | 
         | Can't help but suspect you'll end up with an AI that
         | confidently reports that Jesus was an extra-terrestrial, and
         | the world is controlled by a secret cabal of lizard people.
         | 
         | If you look into rare diseases, you'll find the counter
         | intuitive idea that rare disease is common. Each disease is
         | individually rare, but there are so many of them, that a lot of
         | people have them in total. Human beliefs are sort of similar.
         | There's a huge volume of strange beliefs.
        
         | jncfhnb wrote:
         | Literally suggesting we enshrine the Balance fallacy into our
         | conception of truth:
         | 
         | https://rationalwiki.org/wiki/Balance_fallacy
        
       | jakearmitage wrote:
       | How did they manage to get it to stop hallucinating? I can't
       | prevent my llama-index based chatbot from making up absurd things
       | from my own documents, even though I've been trying to restrict
       | it to that specific area of knowledge.
        
       | jokoon wrote:
       | People on the internet still often criticizes wikipedia when I
       | link to it, I don't understand why.
       | 
       | It's true that it's not good enough for academic work (is it?),
       | but it's largely enough for everything else.
        
       | Outright0133 wrote:
       | "Wikipedia Is Badly Biased":
       | https://larrysanger.org/2020/05/wikipedia-is-badly-biased/
       | 
       | By cofounder Larry Sanger
        
         | ravenstine wrote:
         | There's some atrociously written articles on Wikipedia even in
         | the year 2023.
         | 
         | Case example:
         | 
         | https://en.wikipedia.org/wiki/Fear_of_intimacy
         | 
         | The majority of the article is woman-centered, even though
         | there's no evidence that it's highly gender-biased, and the
         | only information pertaining to men is that if they have fear-
         | of-intimacy then they might be a sex offender.
         | 
         | Otherwise, the article barely communicates anything meaningful.
         | How do attachment types relate to fear-of-intimacy? Are they
         | causative or merely correlative?
         | 
         | Then there's of course poor writing throughout such as this:
         | 
         | > Fear of intimacy has three defining features: content which
         | represents the ability to communicate personal information
         | [...]
         | 
         | What the hell does that mean? "Content?" Like a YouTube video
         | or something?
         | 
         | This is just the latest example I've come across, and happens
         | to be one of the least encyclopedic bodies I've text I've ever
         | read. So much of what I read on Wikipedia is of a similar low
         | caliber. People scan over Wikipedia articles but don't think
         | critically, in part because Wikipedia has devolved into writing
         | that can't decide what its audience is and won't get to the
         | point. As I've said before, check out the Talk sections of the
         | pages you visit, and you'll find some of the most arrogant
         | responses from Wikipedia's inner circle of editors.
         | 
         | What makes me LOL the most is supposedly scientific articles
         | that are written as if there is no debate behind a scientific
         | idea, despite there being no such thing in science as "case
         | closed." Wikipedia often behaves like it's a peer-reviewed
         | scientific journal, yet has none of the chops to act as such.
         | Anything that you read on Wikipedia that suggest that there is
         | "no evidence" for something is likely to be some buffoon's
         | ignorant opinion on the actual literature.
         | 
         | And no, I can't just "edit" Wikipedia to fix these issues. I've
         | tried. Both my home IP address and my phone IP address is
         | banned from them, despite my having never set up an account
         | with them.
        
           | delusional wrote:
           | > > Fear of intimacy has three defining features: content
           | which represents the ability to communicate personal
           | information [...]
           | 
           | > What the hell does that mean? "Content?" Like a YouTube
           | video or something?
           | 
           | It's taken directly from the source cited (page 2 of
           | https://www.semanticscholar.org/paper/Development-and-
           | Valida...). I'm not an expert in the field and have no idea
           | if this is a good paper, but it has received 267 citations
           | which does convey some impact.
           | 
           | > The fear-of-intimacy construct takes into account three
           | defining features: (a) content, the communication of personal
           | information;(b) emotional valence, strong feelings about the
           | personal information exchanged; and (c) vulnerability, high
           | regard for the intimate other. We propose that it is only
           | with the coexistence of content, emotional valence, and
           | vulnerability that intimacy can exist. Consider, for example,
           | the customer who talks to an unknown bartender about his or
           | her troubles. Although there may be personal content
           | 
           | It's clear that it's not the noun "content" but the
           | adjective, defined as "pleased with your situation and not
           | hoping for change or improvement".
           | 
           | I hope the Wikipedia editors are more literate and willing to
           | research than that. I don't think I want to read your version
           | of wikipedia.
        
             | marcellus23 wrote:
             | > It's clear that it's not the noun "content" but the
             | adjective, defined as "pleased with your situation and not
             | hoping for change or improvement".
             | 
             | No, it's not the adjective. The other 2 features are nouns,
             | so this one must also be a noun, since it's a parallel
             | construct. Also, they're all "features", so they have to be
             | nouns by definition. And what would the adjective even be
             | describing?
             | 
             | In this case, the "content" refers (I guess) to the content
             | that's being communicated, though it's poorly phrased.
             | 
             | The Wikipedia excerpt is badly written, whether you agree
             | with the GP or not about the article being biased towards
             | women. It's not even a paraphrase of the original source,
             | which claims the content is the communication itself,
             | whereas the article claims the content "represents the
             | ability to communicate personal information" -- which is
             | pretty meaningless.
        
           | jrflowers wrote:
           | >The majority of the article is woman-centered, even though
           | there's no evidence that it's highly gender-biased...
           | 
           | If you were able to edit this wiki page, what particular
           | studies about fear of intimacy in men would you cite in the
           | sections you add?
           | 
           | Also, is this bit
           | 
           | > Anything that you read on Wikipedia that suggest that there
           | is "no evidence" for something is likely to be some buffoon's
           | ignorant opinion on the actual literature
           | 
           | meant to be ironic?
        
         | delusional wrote:
         | What an absolute trashfire of a blogpost.
         | 
         | It's written in the tone of a sore loser. A person who fought
         | for regressive policies, against people with better arguments
         | and more accurate facts. A person who now, having lost the
         | fight for the policy, retreats into their echo chamber and
         | decries the debate as "not making room for my facts."
         | 
         | It's apparently impossible to write any neutral statement that
         | does not receive 100% unanimous support from every single
         | person on the planet earth:
         | 
         | > A great many Christians would take issue with such
         | statements, which means they are not neutral for that reason
         | alone
        
           | [deleted]
        
           | [deleted]
        
           | [deleted]
        
         | 79a6ed87 wrote:
         | It's even blatantly worse in the Spanish Wikipedia
        
         | bawolff wrote:
         | Larry Sanger is not exactly a neutral source on wikipedia. He
         | is behind multiple competing projects, so might be financially
         | motivated to shit-talk wikipedia.
        
           | Vicinity9635 wrote:
           | Trying to change the subject to Larry Sanger is an ad hominem
           | fallacy. Address the content of the message, not the speaker.
           | 
           | For example, is this accurate or isn't it?
           | 
           | > _Examples have become embarrassingly easy to find. The
           | Barack Obama article completely fails to mention many well-
           | known scandals: Benghazi, the IRS scandal, the AP phone
           | records scandal, and Fast and Furious, to say nothing of
           | Solyndra or the Hillary Clinton email server scandal--or, of
           | course, the developing "Obamagate" story in which Obama was
           | personally involved in surveilling Donald Trump. A fair
           | article about a major political figure certainly must include
           | the bad with the good. Beyond that, a neutral article must
           | fairly represent competing views on the figure by the major
           | parties._
           | 
           | And if so, then wikipedia is indeed badly biased. Whether or
           | not Larry Sanger is isn't that interesting. But a bias at
           | wikipedia - a source blindly trusted by millions - is a very
           | interesting and concerning state of affairs.
        
             | bawolff wrote:
             | > Trying to change the subject to Larry Sanger is an ad
             | hominem fallacy. Address the content of the message, not
             | the speaker.
             | 
             | I disagree. This thread started with "By cofounder Larry
             | Sanger" - so the argument started with an implication that
             | larry sangar should be listened to due to who he is. You
             | can't both claim his argument holds extra weight due to who
             | he is well also claiming its irrelavent who he is. You have
             | to pick one.
             | 
             | As far as the obama article goes - im not an american and i
             | havent heard of those scandals before, so honestly i dont
             | know if their ommision is appropriate or not (it should be
             | noted that libyan intervention is mentioned in his
             | article).
             | 
             | However, i think this is asking the wrong question. Nothing
             | is 100% neutral. I don't doubt you can find biased things
             | in wikipedia. It is made by humans not revealed through
             | divine revelation. The important question in my mind is how
             | does it stack up against other sources. Is it mostly
             | neutral relative to other information sources? That's how i
             | would like to judge it.
        
               | Outright0133 wrote:
               | [dead]
        
         | nitwit005 wrote:
         | > In another place, the article simply asserts, "the gospels
         | are not independent nor consistent records of Jesus' life." A
         | great many Christians would take issue with such statements,
         | which means they are not neutral for that reason alone.
         | 
         | I'd love to see his article on Jesus that absolutely no one
         | would "take issue with".
        
           | Perceval wrote:
           | While I don't think it's possible to write an article on a
           | controversial subject that no one will take issue with, it is
           | possible to write with a generally Neutral Point of View,
           | which has been a guiding principle of Wikipedia since the
           | very early days: https://en.wikipedia.org/wiki/Wikipedia:Neut
           | ral_point_of_vie...
           | 
           | Making a flat statement that the gospels are "not independent
           | nor consistent" is not settled or universal assessment. An
           | article written in NPOV would discuss the variety of citeable
           | interpretations and the debate between them over time.
        
         | globular-toast wrote:
         | Larry hates Wikipedia because Jimbo Wales got all the credit.
        
       ___________________________________________________________________
       (page generated 2023-07-17 23:01 UTC)