[HN Gopher] List of Citogenesis Incidents
___________________________________________________________________
List of Citogenesis Incidents
Author : Thevet
Score : 120 points
Date : 2023-04-12 05:33 UTC (17 hours ago)
(HTM) web link (en.wikipedia.org)
(TXT) w3m dump (en.wikipedia.org)
| dszoboszlay wrote:
| > arbitrary addition to Coati, "also known as....the Brazilian
| aardvark"
|
| But via citogenesis, the coati really became also known as the
| Brazilian aardvark. So the original claim is true, and this
| wasn't really citogenesis after all. More like self fulfilling
| prophecy.
| thaumasiotes wrote:
| In linguistics a sentence that causes itself to be true is
| called _performative_. (The standard example being "we are at
| war", spoken by someone with the authority to declare war.)
|
| That doesn't really fit here, but it's a similar idea.
| Berniek wrote:
| The references in the comments suggest ChatGPT as providing this
| effect. But that is (or should be) unlikely, the "training" or
| moderation (tweaking?) should actually solve this problem. It
| should be relatively easy to separate it's own generation from
| sources. BUT where it will happen is when multiple instances of
| these language models compete with each other. ChatGPT quoting
| Bing or Bard output probably can't be reliably countered with
| internal training of ChatGTP, and the same goes for Bing & Bard
| and all the other myriad manifestations of these data mining
| techniques. (Unless they merge them togther?)
| 1970-01-01 wrote:
| Deadspin saw one of these and made with a step-by-step summary:
| https://deadspin.com/how-espn-manufactures-a-story-colin-kae...
| realworldperson wrote:
| [dead]
| petercooper wrote:
| I just asked ChatGPT who made the first cardboard box, and it too
| believes the first story on this list: "The first cardboard box
| was invented by Sir Malcolm Thornhill in England in 1817. He
| created a machine that could make sheets of paper and then fold
| them into boxes."
| TheRealPomax wrote:
| No, it doesn't. It doesn't believe anything, it's just
| generating a story for you that sounds credible enough for you
| to go "yes, this is what an answer would look like". That's its
| job. That's its _only_ job. Literally everything it says is
| fabricated, and if it happens to be the truth, _that 's a
| coincidence_.
| petercooper wrote:
| _Literally everything it says is fabricated_
|
| In what sense do you use the word "fabricated"? In the sense
| that it invented a falsehood with an intent to deceive, or in
| that it says things based upon prior exposure?
| irrational wrote:
| I swear this kind of stuff is only going to get worse as people
| come to use and rely on ChatGPT and its ilk more broadly.
|
| Dystopia? Idiocracy? I don't know, but I don't like it.
| [deleted]
| bobmaxup wrote:
| Another problem to be solved with more computation and human
| backed reinforcement learning, surely.
| irrational wrote:
| How? As can be seen from these Citogenesis Incidents, humans
| cannot even tell when other humans are making up stuff that
| sounds like it could be real. How will ChatGPT, et al do it?
| echelon wrote:
| It'll probably be a little bit like life before the web. You'd
| hear something and have no immediate way to verify the veracity
| of the claim. It's one of the reasons teachers pushed students
| to go to the library and use encyclopedias for citations when
| writing research papers.
|
| Our species has obviously managed to make it pretty far without
| facts for the longest time. But we've comfortably lived with
| easily verified facts for 20-30 years and are now faced with a
| return to uncertainty.
|
| If I had to guess, we'll see stricter controls on institutions
| such as Wikipedia that rely on credentialism and frequent
| auditing as a means to counter the new at-volume information
| creation capacity. But I don't really have the faintest idea of
| how this will turn out yet. It's wild to think about how much
| things are changing.
| tjohns wrote:
| My teachers were always clear that encyclopedias were not to
| be used as a primary source either. They're not even a
| secondary source. Encyclopedias are _tertiary_ sources.
|
| They're better than Wikipedia... but only barely.
|
| In the end you use Wikipedia and an encyclopedia the same
| way: to get a broad understanding of a topic as a mental
| framework, then look at the article's citations as a starting
| point to find actual, citable primary sources. (Plus the rest
| of the library's catalog/databases.)
| weaksauce wrote:
| exactly. information literacy starts with evaluating the
| sources. I have had numerous chats over the last few years
| where it's evident that people do not do due dillagence in
| their information gathering. it seems that either people
| aren't being taught this anymore or that they have given in
| to sloppy thinking.
| chess_buster wrote:
| Imagine a world where children have grown up, relying on
| ChatGPT for each and every question.
| throw124 wrote:
| Imagine that in five years from now, ChatGPT or one of its
| competitors will reach 98% factual accuracy in responses.
| Would you not like to rely on it for answering your
| questions?
| lm28469 wrote:
| Outside of maths and physics there is no such thing as
| factual truths
| dragonwriter wrote:
| Maths has no factual truths, only logical truths. Physics
| has no more or less factual truths than any other branch
| of science.
| irrational wrote:
| Isn't there? I didn't attend MIT. That is a factual
| truth.
| derekp7 wrote:
| Isn't MIT known for its math and physics?
| rep_lodsb wrote:
| Imagine that in five years, we will have cold fusion, world
| peace and FTL travel. ChatGPT told me so it must be true!
| VonGallifrey wrote:
| Saying this in a discussion about Citogenesis is funny to
| me. How would you even determine "factual accuracy"? Just
| look at the list. There are many instances where "reliable
| sources" repeated false information which was then used to
| "prove" that the information is reliable.
|
| As far as I am concerned AI responses will never be
| reliable without verification. Same as any human responses,
| but there you can at least verify credentials.
| paisawalla wrote:
| Scroll down TFA to the section called "terms that became
| real". When trolls or adversaries can use citogenesis to
| boostrap facts into the mainstream from a cold start, what
| does "98% factual accuracy" mean? At some point, you'll
| have to include the "formerly known as BS" facts.
| ttctciyf wrote:
| Apropos this, was tempted to submit
| https://www.youtube.com/watch?v=KfWVdXyPvWQ [1] after
| watching it last night, but maybe it's better to just leave
| it here, instead..
|
| 1: How A.I Will Self Destruct The Human Race (Camera
| Conspiracies channel)
| bobmaxup wrote:
| YouTube videos of stock imagery, memes, anecdotes, and
| speculation don't seem much better.
| bawolff wrote:
| A world where children ask questions to unreliable entities
| who guess when they don't know the answer?
|
| Pretty sure we just called it the 90s.
| weaksauce wrote:
| chatcpt outputs everything just so confident since it's
| basically just a bullshit generator. it's markov chain word
| bots on steroids.
| saghm wrote:
| There are people who do this too; I don't think that's a
| sufficient property to be a threat to humanity at large
| vimax wrote:
| It'll be a world where it's important to know the right
| question to ask
| irrational wrote:
| Even then, you have to know how to recognize that ChatGPT
| is feeding you made up information. In the case of these
| Citogenesis Incidents, 99% of the Wikipedia articles are
| legitimate. The trick is knowing what is the false 1%. How
| do you distinguish between the ChatGPT output that is true
| versus made up?
| 93po wrote:
| The 85% fatality rate for the water speed record had me go down a
| rabbit hole. The record hasn't been broken since 1978 (~315mph)
| and someone on reddit said it was because they stopped tracking
| the record due to so many deaths. I can't find any information
| online to corroborate this though.
| greenyoda wrote:
| Here's a fictitious citation that commonly appears on HN -
| "Dunning-Kruger effect":
|
| > _The expression "Dunning-Kruger effect" was created on
| Wikipedia in May 2006, in this edit.[1] The article had been
| created in July 2005 as Dunning-Kruger Syndrome. Neither of these
| terms appeared at that time in scientific literature; the
| "syndrome" name was created to summarise the findings of one 1999
| paper by David Dunning and Justin Kruger. The change to "effect"
| was not prompted by any sources, but by a concern that "syndrome"
| would falsely imply a medical condition. By the time the article
| name was criticised as original research in 2008, Google Scholar
| was showing a number of academic sources describing the Dunning-
| Kruger effect using explanations similar to the Wikipedia
| article._[2]
|
| [1]
| https://en.wikipedia.org/w/index.php?diff=55273744&diffmode=...
|
| [2]
| https://en.wikipedia.org/wiki/Wikipedia:List_of_citogenesis_...
| tesseract wrote:
| The spread of the "ranged weapon"/"melee weapon" classification
| terminology from the roleplaying games world into writings on
| real-world anthropology and military history (without
| acknowledgement of the direction of the borrowing!) is a
| personal pet peeve. I haven't been able to pinpoint Wikipedia,
| much less a specific article, as the source of this but it
| seems to have at the very least accelerated the trend.
| The28thDuck wrote:
| Does that mean the Dunning-Krueger effect has no basis in
| truth?
| JohnBerea wrote:
| I'm from the internet and I can assure you it has no basis in
| truth.
| maxbond wrote:
| ETA: see response
|
| Probably not no basis, Dunning and Krueger really did so
| research & found [retracted] a negative correlation between
| self-rated ability and performance on an aptitude test afaik
| [/retracted]. But it's often overgeneralized or taken to be
| some kind of law rather than an observation.
| dragonwriter wrote:
| > Dunning and Krueger really did so research & find a
| negative correlation between self-rated ability and
| performance on an aptitude test afaik
|
| No, they didn't.
|
| They found a positive linear relation with between actual
| and self-assessed relative performance, with the
| intersection point at around the 70th percentile. (That is,
| people on average report themselves closer to the 70th
| percentile than they are, those below erring higher and
| those above erring lower.)
|
| The (self-rated rank) - (actual rank) difference goes up as
| actual rank goes down, but that's not self-rated ability
| going up with reduced ability.
| maxbond wrote:
| You're right, I misremembered/misspoke. I've edited my
| comment to make that clear. Thank you for the correction.
| malkia wrote:
| Reminds me of
| https://www.reddit.com/r/Jokes/comments/2wpf2h/indian_joke_a...
|
| (sed /indian/native american/g)
| shaftoe444 wrote:
| Really hoped that the fact this came from xkcd was itself an
| example of citogenesis.
| renewiltord wrote:
| A friend of mine intercepted one that was actually shared in
| other places:
| https://en.wikipedia.org/w/index.php?title=Said_the_actress_...
| sva_ wrote:
| Somehow reminds me of large language models. If they will be
| trained on data after the release of, say, GPT-3, they'll
| probably be trained on outputs of that model.
| throwbadubadu wrote:
| Yes, first thing that comes to mind: How much worse will this
| become with LLMs in the loop - almost assuming that this page
| was even submitted for that thought?
| 93po wrote:
| It's a major area of focus to improve the hallucination (or
| whatever the technically correct term is) of the model. I would
| bet we're pretty close to GPT actually evaluating sources for
| information and making judgements in how to weight those
| sources. I suspect this is going to upset a _lot_ of people,
| and especially those in power.
| r3trohack3r wrote:
| Somehow I don't think this is going to be a problem. I can't
| exactly articulate why, but I'm going to try.
|
| The success of an LLM is quite subjective. We have metrics that
| try to quantitatively measure the performance of an LLM, but
| the "real" test are the users that the LLM does work for. Those
| users are ultimately human, even if there are layers and layers
| of LLMs collaborating under a human interface.
|
| I think what ultimately matters is that the output is
| considered high quality by the end user. I don't think that it
| actually matters if an input is AI generated or human generated
| when training a model, as long as the LLM continues producing
| high quality results. I think implicit in your argument is that
| the _quality_ of the _training set_ is going to deteriorate due
| to LLM generated content. But:
|
| 1) I don't know how much quality of the input actually impacts
| the outcome. Almost certainly an entire corpus of noise isn't
| going to generate signal when passed through an LLM, but what
| an acceptable signal/noise ratio is seems to be an unanswered
| question.
|
| 2) AI generated content doesn't necessarily mean it is low
| quality content. In fact, if we find a high quality training
| set yields substantially better AI, I'd rather have a training
| set of 100% AI generated content that is human reviewed to be
| high quality vs. one that is 100% human generated content but
| unfiltered for quality.
|
| I don't necessarily think this feedback loop, of LLM outputs
| feeding LLM inputs, is necessarily the problem people say it
| is. But might be wrong!
| aezart wrote:
| LLM output cannot be higher quality than the input (prompt +
| training data). The best possible outcome for an LLM is that
| the output is a correct continuation of the prompt. The
| output will usually be a less-than-perfect continuation.
|
| With small models, at least, you can watch LLM output degrade
| in real time as more text is generated, because the ratio of
| prompt to output in the context gets smaller with each new
| token. So the LLM is trying to imitate itself, more than it
| is trying to imitate the prompt. Bigger models can't fix this
| problem, they can just slow down the rate of degradation.
|
| It's bad enough when the model is stuck trying to imitate its
| output in the current context, but it'll be much worse if
| it's actually fed back in as training data. In that scenario,
| the bad data poisons all future output from the model, not
| just the current context.
| Kye wrote:
| This is interesting because it's essentially how human
| bullshitters work. The more they know, the longer they can
| convince you they know more than they do.
|
| https://xkcd.com/451/
|
| Imposter sophistication levels might be the way we rank
| these in the future.
| 93po wrote:
| Unless the LLM actually evaluates sources and assigns
| weights to sources.
___________________________________________________________________
(page generated 2023-04-12 23:01 UTC)