[HN Gopher] List of Citogenesis Incidents
       ___________________________________________________________________
        
       List of Citogenesis Incidents
        
       Author : Thevet
       Score  : 120 points
       Date   : 2023-04-12 05:33 UTC (17 hours ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | dszoboszlay wrote:
       | > arbitrary addition to Coati, "also known as....the Brazilian
       | aardvark"
       | 
       | But via citogenesis, the coati really became also known as the
       | Brazilian aardvark. So the original claim is true, and this
       | wasn't really citogenesis after all. More like self fulfilling
       | prophecy.
        
         | thaumasiotes wrote:
         | In linguistics a sentence that causes itself to be true is
         | called _performative_. (The standard example being  "we are at
         | war", spoken by someone with the authority to declare war.)
         | 
         | That doesn't really fit here, but it's a similar idea.
        
       | Berniek wrote:
       | The references in the comments suggest ChatGPT as providing this
       | effect. But that is (or should be) unlikely, the "training" or
       | moderation (tweaking?) should actually solve this problem. It
       | should be relatively easy to separate it's own generation from
       | sources. BUT where it will happen is when multiple instances of
       | these language models compete with each other. ChatGPT quoting
       | Bing or Bard output probably can't be reliably countered with
       | internal training of ChatGTP, and the same goes for Bing & Bard
       | and all the other myriad manifestations of these data mining
       | techniques. (Unless they merge them togther?)
        
       | 1970-01-01 wrote:
       | Deadspin saw one of these and made with a step-by-step summary:
       | https://deadspin.com/how-espn-manufactures-a-story-colin-kae...
        
       | realworldperson wrote:
       | [dead]
        
       | petercooper wrote:
       | I just asked ChatGPT who made the first cardboard box, and it too
       | believes the first story on this list: "The first cardboard box
       | was invented by Sir Malcolm Thornhill in England in 1817. He
       | created a machine that could make sheets of paper and then fold
       | them into boxes."
        
         | TheRealPomax wrote:
         | No, it doesn't. It doesn't believe anything, it's just
         | generating a story for you that sounds credible enough for you
         | to go "yes, this is what an answer would look like". That's its
         | job. That's its _only_ job. Literally everything it says is
         | fabricated, and if it happens to be the truth, _that 's a
         | coincidence_.
        
           | petercooper wrote:
           | _Literally everything it says is fabricated_
           | 
           | In what sense do you use the word "fabricated"? In the sense
           | that it invented a falsehood with an intent to deceive, or in
           | that it says things based upon prior exposure?
        
       | irrational wrote:
       | I swear this kind of stuff is only going to get worse as people
       | come to use and rely on ChatGPT and its ilk more broadly.
       | 
       | Dystopia? Idiocracy? I don't know, but I don't like it.
        
         | [deleted]
        
         | bobmaxup wrote:
         | Another problem to be solved with more computation and human
         | backed reinforcement learning, surely.
        
           | irrational wrote:
           | How? As can be seen from these Citogenesis Incidents, humans
           | cannot even tell when other humans are making up stuff that
           | sounds like it could be real. How will ChatGPT, et al do it?
        
         | echelon wrote:
         | It'll probably be a little bit like life before the web. You'd
         | hear something and have no immediate way to verify the veracity
         | of the claim. It's one of the reasons teachers pushed students
         | to go to the library and use encyclopedias for citations when
         | writing research papers.
         | 
         | Our species has obviously managed to make it pretty far without
         | facts for the longest time. But we've comfortably lived with
         | easily verified facts for 20-30 years and are now faced with a
         | return to uncertainty.
         | 
         | If I had to guess, we'll see stricter controls on institutions
         | such as Wikipedia that rely on credentialism and frequent
         | auditing as a means to counter the new at-volume information
         | creation capacity. But I don't really have the faintest idea of
         | how this will turn out yet. It's wild to think about how much
         | things are changing.
        
           | tjohns wrote:
           | My teachers were always clear that encyclopedias were not to
           | be used as a primary source either. They're not even a
           | secondary source. Encyclopedias are _tertiary_ sources.
           | 
           | They're better than Wikipedia... but only barely.
           | 
           | In the end you use Wikipedia and an encyclopedia the same
           | way: to get a broad understanding of a topic as a mental
           | framework, then look at the article's citations as a starting
           | point to find actual, citable primary sources. (Plus the rest
           | of the library's catalog/databases.)
        
             | weaksauce wrote:
             | exactly. information literacy starts with evaluating the
             | sources. I have had numerous chats over the last few years
             | where it's evident that people do not do due dillagence in
             | their information gathering. it seems that either people
             | aren't being taught this anymore or that they have given in
             | to sloppy thinking.
        
         | chess_buster wrote:
         | Imagine a world where children have grown up, relying on
         | ChatGPT for each and every question.
        
           | throw124 wrote:
           | Imagine that in five years from now, ChatGPT or one of its
           | competitors will reach 98% factual accuracy in responses.
           | Would you not like to rely on it for answering your
           | questions?
        
             | lm28469 wrote:
             | Outside of maths and physics there is no such thing as
             | factual truths
        
               | dragonwriter wrote:
               | Maths has no factual truths, only logical truths. Physics
               | has no more or less factual truths than any other branch
               | of science.
        
               | irrational wrote:
               | Isn't there? I didn't attend MIT. That is a factual
               | truth.
        
               | derekp7 wrote:
               | Isn't MIT known for its math and physics?
        
             | rep_lodsb wrote:
             | Imagine that in five years, we will have cold fusion, world
             | peace and FTL travel. ChatGPT told me so it must be true!
        
             | VonGallifrey wrote:
             | Saying this in a discussion about Citogenesis is funny to
             | me. How would you even determine "factual accuracy"? Just
             | look at the list. There are many instances where "reliable
             | sources" repeated false information which was then used to
             | "prove" that the information is reliable.
             | 
             | As far as I am concerned AI responses will never be
             | reliable without verification. Same as any human responses,
             | but there you can at least verify credentials.
        
             | paisawalla wrote:
             | Scroll down TFA to the section called "terms that became
             | real". When trolls or adversaries can use citogenesis to
             | boostrap facts into the mainstream from a cold start, what
             | does "98% factual accuracy" mean? At some point, you'll
             | have to include the "formerly known as BS" facts.
        
           | ttctciyf wrote:
           | Apropos this, was tempted to submit
           | https://www.youtube.com/watch?v=KfWVdXyPvWQ [1] after
           | watching it last night, but maybe it's better to just leave
           | it here, instead..
           | 
           | 1: How A.I Will Self Destruct The Human Race (Camera
           | Conspiracies channel)
        
             | bobmaxup wrote:
             | YouTube videos of stock imagery, memes, anecdotes, and
             | speculation don't seem much better.
        
           | bawolff wrote:
           | A world where children ask questions to unreliable entities
           | who guess when they don't know the answer?
           | 
           | Pretty sure we just called it the 90s.
        
           | weaksauce wrote:
           | chatcpt outputs everything just so confident since it's
           | basically just a bullshit generator. it's markov chain word
           | bots on steroids.
        
             | saghm wrote:
             | There are people who do this too; I don't think that's a
             | sufficient property to be a threat to humanity at large
        
           | vimax wrote:
           | It'll be a world where it's important to know the right
           | question to ask
        
             | irrational wrote:
             | Even then, you have to know how to recognize that ChatGPT
             | is feeding you made up information. In the case of these
             | Citogenesis Incidents, 99% of the Wikipedia articles are
             | legitimate. The trick is knowing what is the false 1%. How
             | do you distinguish between the ChatGPT output that is true
             | versus made up?
        
       | 93po wrote:
       | The 85% fatality rate for the water speed record had me go down a
       | rabbit hole. The record hasn't been broken since 1978 (~315mph)
       | and someone on reddit said it was because they stopped tracking
       | the record due to so many deaths. I can't find any information
       | online to corroborate this though.
        
       | greenyoda wrote:
       | Here's a fictitious citation that commonly appears on HN -
       | "Dunning-Kruger effect":
       | 
       | > _The expression "Dunning-Kruger effect" was created on
       | Wikipedia in May 2006, in this edit.[1] The article had been
       | created in July 2005 as Dunning-Kruger Syndrome. Neither of these
       | terms appeared at that time in scientific literature; the
       | "syndrome" name was created to summarise the findings of one 1999
       | paper by David Dunning and Justin Kruger. The change to "effect"
       | was not prompted by any sources, but by a concern that "syndrome"
       | would falsely imply a medical condition. By the time the article
       | name was criticised as original research in 2008, Google Scholar
       | was showing a number of academic sources describing the Dunning-
       | Kruger effect using explanations similar to the Wikipedia
       | article._[2]
       | 
       | [1]
       | https://en.wikipedia.org/w/index.php?diff=55273744&diffmode=...
       | 
       | [2]
       | https://en.wikipedia.org/wiki/Wikipedia:List_of_citogenesis_...
        
         | tesseract wrote:
         | The spread of the "ranged weapon"/"melee weapon" classification
         | terminology from the roleplaying games world into writings on
         | real-world anthropology and military history (without
         | acknowledgement of the direction of the borrowing!) is a
         | personal pet peeve. I haven't been able to pinpoint Wikipedia,
         | much less a specific article, as the source of this but it
         | seems to have at the very least accelerated the trend.
        
         | The28thDuck wrote:
         | Does that mean the Dunning-Krueger effect has no basis in
         | truth?
        
           | JohnBerea wrote:
           | I'm from the internet and I can assure you it has no basis in
           | truth.
        
           | maxbond wrote:
           | ETA: see response
           | 
           | Probably not no basis, Dunning and Krueger really did so
           | research & found [retracted] a negative correlation between
           | self-rated ability and performance on an aptitude test afaik
           | [/retracted]. But it's often overgeneralized or taken to be
           | some kind of law rather than an observation.
        
             | dragonwriter wrote:
             | > Dunning and Krueger really did so research & find a
             | negative correlation between self-rated ability and
             | performance on an aptitude test afaik
             | 
             | No, they didn't.
             | 
             | They found a positive linear relation with between actual
             | and self-assessed relative performance, with the
             | intersection point at around the 70th percentile. (That is,
             | people on average report themselves closer to the 70th
             | percentile than they are, those below erring higher and
             | those above erring lower.)
             | 
             | The (self-rated rank) - (actual rank) difference goes up as
             | actual rank goes down, but that's not self-rated ability
             | going up with reduced ability.
        
               | maxbond wrote:
               | You're right, I misremembered/misspoke. I've edited my
               | comment to make that clear. Thank you for the correction.
        
       | malkia wrote:
       | Reminds me of
       | https://www.reddit.com/r/Jokes/comments/2wpf2h/indian_joke_a...
       | 
       | (sed /indian/native american/g)
        
       | shaftoe444 wrote:
       | Really hoped that the fact this came from xkcd was itself an
       | example of citogenesis.
        
       | renewiltord wrote:
       | A friend of mine intercepted one that was actually shared in
       | other places:
       | https://en.wikipedia.org/w/index.php?title=Said_the_actress_...
        
       | sva_ wrote:
       | Somehow reminds me of large language models. If they will be
       | trained on data after the release of, say, GPT-3, they'll
       | probably be trained on outputs of that model.
        
         | throwbadubadu wrote:
         | Yes, first thing that comes to mind: How much worse will this
         | become with LLMs in the loop - almost assuming that this page
         | was even submitted for that thought?
        
         | 93po wrote:
         | It's a major area of focus to improve the hallucination (or
         | whatever the technically correct term is) of the model. I would
         | bet we're pretty close to GPT actually evaluating sources for
         | information and making judgements in how to weight those
         | sources. I suspect this is going to upset a _lot_ of people,
         | and especially those in power.
        
         | r3trohack3r wrote:
         | Somehow I don't think this is going to be a problem. I can't
         | exactly articulate why, but I'm going to try.
         | 
         | The success of an LLM is quite subjective. We have metrics that
         | try to quantitatively measure the performance of an LLM, but
         | the "real" test are the users that the LLM does work for. Those
         | users are ultimately human, even if there are layers and layers
         | of LLMs collaborating under a human interface.
         | 
         | I think what ultimately matters is that the output is
         | considered high quality by the end user. I don't think that it
         | actually matters if an input is AI generated or human generated
         | when training a model, as long as the LLM continues producing
         | high quality results. I think implicit in your argument is that
         | the _quality_ of the _training set_ is going to deteriorate due
         | to LLM generated content. But:
         | 
         | 1) I don't know how much quality of the input actually impacts
         | the outcome. Almost certainly an entire corpus of noise isn't
         | going to generate signal when passed through an LLM, but what
         | an acceptable signal/noise ratio is seems to be an unanswered
         | question.
         | 
         | 2) AI generated content doesn't necessarily mean it is low
         | quality content. In fact, if we find a high quality training
         | set yields substantially better AI, I'd rather have a training
         | set of 100% AI generated content that is human reviewed to be
         | high quality vs. one that is 100% human generated content but
         | unfiltered for quality.
         | 
         | I don't necessarily think this feedback loop, of LLM outputs
         | feeding LLM inputs, is necessarily the problem people say it
         | is. But might be wrong!
        
           | aezart wrote:
           | LLM output cannot be higher quality than the input (prompt +
           | training data). The best possible outcome for an LLM is that
           | the output is a correct continuation of the prompt. The
           | output will usually be a less-than-perfect continuation.
           | 
           | With small models, at least, you can watch LLM output degrade
           | in real time as more text is generated, because the ratio of
           | prompt to output in the context gets smaller with each new
           | token. So the LLM is trying to imitate itself, more than it
           | is trying to imitate the prompt. Bigger models can't fix this
           | problem, they can just slow down the rate of degradation.
           | 
           | It's bad enough when the model is stuck trying to imitate its
           | output in the current context, but it'll be much worse if
           | it's actually fed back in as training data. In that scenario,
           | the bad data poisons all future output from the model, not
           | just the current context.
        
             | Kye wrote:
             | This is interesting because it's essentially how human
             | bullshitters work. The more they know, the longer they can
             | convince you they know more than they do.
             | 
             | https://xkcd.com/451/
             | 
             | Imposter sophistication levels might be the way we rank
             | these in the future.
        
             | 93po wrote:
             | Unless the LLM actually evaluates sources and assigns
             | weights to sources.
        
       ___________________________________________________________________
       (page generated 2023-04-12 23:01 UTC)