[HN Gopher] Google says AI generated content is against guidelines
___________________________________________________________________
Google says AI generated content is against guidelines
Author : rolph
Score : 95 points
Date : 2022-04-09 16:50 UTC (6 hours ago)
(HTM) web link (www.searchenginejournal.com)
(TXT) w3m dump (www.searchenginejournal.com)
| ohashi wrote:
| AI for me, not for thee.
|
| Google shouldn't be the arbiter of what's ok or not on the
| internet. They use AI to take away all human recourse with the
| company but want to tell others not to use it? It's a pretty
| laughable position. Good luck trying to detect GPT-3 and the like
| when you compare with non native speakers of languages. Are you
| suddenly going to just block them too?
|
| If an AI can generate high quality content, why is that any less
| than human generated content. Human generate a ton of trash
| content, it's not inherently better.
|
| Those same models on copilot are generating useful and often good
| code for me on a daily basis. If someone told me it's not OK to
| use any copilot generated code as unethical/wrong I'd laugh in
| their face. It's basically saving me a google search to find
| snippets/examples of things I wasn't sure of.
|
| Maybe that's the threat? If we have access to AI directly (ala
| copilot) then I am googling less.
| shadowgovt wrote:
| That's the point. If it _is_ detectable, it 'll get downranked
| for being low-quality.
| deevolution wrote:
| What about high quality ai content? What if someday the
| content generated by ai is actually more useful/higher
| quality than the human generated content we just shoot
| outself in the foot because it's ai generated. Seems
| discriminatory to me!
| hackernewds wrote:
| Which seems fair. I do not want to be bombarded by AI
| generated media and content that can be created at infinite
| volume and speed. Google is taking a massive step here for
| the good.
| jimmaswell wrote:
| I've seen a few AI generated pages in top results that felt
| like having a stroke trying to read. I hope these can be
| eliminated.
| aaaaaaaaaaab wrote:
| Fair enough. Google-generated content is likewise against my
| guidelines.
| Trias11 wrote:
| If google can randomly penalize people and businesses with no
| chance of recourse
|
| It's certainly good idea to penalize google with intelligent AI-
| generating content
| sva_ wrote:
| So Google would need to build a discriminator that detects
| machine-generated content. It will be interesting to see these
| discriminators fight the generators of other big companies.
|
| I'll be taking a front-seat row watching that show, if it
| happens. Perhaps in the future, we'll have to deal with
| discriminators that approximate some originality-index. Will be
| fun fighting with those algorithms, to interact with the internet
| as a normal user (to some extent we already do - proving that
| you're human is becoming more and more tedious.)
| notahacker wrote:
| In practice I think it's more _Google now has another policy
| reason to banhammer prolific and irritating blogspammers_ than
| an arms race Google has a chance in.
|
| Google isn't yet effective at detecting blogspam generated by
| naive scripts that simply swap words in the source material for
| other words in a thesaurus. I don't think they're going to
| start picking up continuity errors, factual errors or
| "weirdness" in GPT-3 - which often satisfies human readers -
| any time soon.
| notreallyserio wrote:
| Google engineers can't even filter out GH or SO scraped sites
| like gitmemory, nor offer a way to let users block these
| sites. I'm not sure we should expect them to handle more
| advanced techniques like detecting word swaps any time soon.
| omnicognate wrote:
| s/can't/won't/
| notreallyserio wrote:
| Google search has been poor for years. I think it's time
| to say can't.
| zitterbewegung wrote:
| I am unsure about that but using GPT-3 in that manner would
| certainly trigger OpenAI's automatic or manual systems and
| would violate their ToS and your account would be locked.
| Oras wrote:
| With GPT-3, yes but how about other generators? There are
| lots of emerging services that spammers can use.
| htrp wrote:
| Does OpenAI actually have those systems? My impression is
| they are happy to take your money and just tell you not to
| do those un-nice things
| ganzuul wrote:
| You need to be computationally irreducible through self-
| attention and conscious self-interaction.
|
| By the same criterion we may need to consider the Wattage cost
| of intelligence when granting rights to AI. Their kernels needs
| to be aligned with evolutionary wisdom encoded in our highest
| motivations.
| mjburgess wrote:
| a conversation for Q and Commander Data
| ganzuul wrote:
| Filmed in front of a live audience! [Applause]
| OliverJones wrote:
| Neal Stephenson's "Fall, or Dodge In Hell" has a subplot driven
| by a news-story-generating AI run amok. The AI uses reader
| engagement as the metric (along with really great natural-
| language stuff). But it doesn't have any truth metric. So, at the
| time of the book, the widely believed AI's stories have
| constructed an alternate reality for many people, reinforcing the
| present polarization of media in the US.
|
| That presents quite a threat profile. It's far more pernicious
| than SEO script kiddles doing whatever passes for keyword
| stuffing in 2022.
|
| I hope the search crawler folks at Google are working hard to
| detect that sort of thing and prevent it from getting into
| indexes. Let's hope Neal Stephenson isn't as right about that
| threat as Arthur C. Clarke was about geosynchronous
| communications satellites.
| syrrim wrote:
| That's basically what the Q larp, or realrawnews, is already
| doing today. If it's profitable to do, it doesn't take a rogue
| AI to make up nonsense and spread it to the masses.
| bestcoder69 wrote:
| Seems like there should be a carve-out for content clearly marked
| as AI-generated. I wonder if the SEO hit is why I haven't been
| able to find too many others posting funny GPT-3 outputs like
| (Disclosure incoming:) mine.
|
| And if I can peddle my blog's 'content': here's Trump announcing
| he's (not?) trans.
|
| > I have been so concerned about this issue, I've held back from
| telling you that I'm transgender. I'm not transgender, but I'm so
| proud of the transgender community and their rights.
|
| > The Democrats have been so horrible to the transgender
| community. They've made them live in these little closets, ya'
| know? And they've tried to force them to use certain restrooms.
|
| Love messin' with gpt-3.
| mkl95 wrote:
| That AI generated content is automated SEO, which is mostly a
| bunch of heuristics to please Google's ranking algorithms. Blame
| the algorithms, not the people who reverse engineer them.
| admax88qqq wrote:
| Why blame the algorithms rather than the spammers who reverse
| engineer them to get their affiliate link littered ai generated
| "reviews" to the top of the search results.
|
| In the arms race between Google and spammers, honest websites
| sometimes get caught in the crossfire. For some reason lately
| lots of people want to blame google for this and not the
| spammers.
| sdoering wrote:
| Why must I'd be necessarily either/or, black/white,
| Google/Spammer?
|
| Why couldn't it be that reasonably both sides are to blame.
| Google enables the commodification of search results. Yes
| they claim that they want to show the best result. But best
| for whom? I don't believe that Google is a neutral party as
| they earn more if they promote sites that lead to additional
| advertising clicks. How should users know that there are
| often better results on pages 3 or 4 and below.
|
| And than there are the ones trying to make a living with
| minimal effort. No need to create great content. Good enough
| is sufficient. As long as the content is optimized and the
| page receives relevant traffic from search the revenue via
| affiliate links is secured.
|
| In this arm's race the other sites loose. They are the
| collateral between two fighting parties in this arms race.
| And also people loose. Loose great content and the diversity
| of the net.
| admax88qqq wrote:
| Thankfully Google operates in a free market where it is
| possible to compete by building a better search algorithm.
|
| I don't buy into the position that all this great content
| and diversity of content is lost because of Googles
| algorithm. It isn't as discoverable as the mainstream
| content that Google search returns but that's no different
| than being in a world where Google didn't exist. Except for
| perhaps in such a world people would have different content
| discovery habits.
|
| Complaining that Google doesn't surface your preferred set
| of "great content" is like complaining that prime time
| cable TV only shows lame sitcom reruns. It's true, but it
| doesn't prevent you from buying HBO, or Prime Video.
| supernovae wrote:
| Because we wouldn't be here if Google had supported other
| means of natural search engine inclusion through quality
| metrics vs their current only option of pay to play or
| cheating.
| adhesive_wombat wrote:
| I can't see any other reason for spam clone farms to outrank
| the sites they clone except either more incompetence then I'd
| ascribe to Google, or because the spam farm is full to the
| gunwhales with ads which earn money for Google.
| mkl95 wrote:
| Google make a large chunk of their money from ads. If you
| create some SaaS where users can automate their site's SEO by
| using GPT or whatever, and it is reasonably priced, you could
| end up competing with Google Ads. Google want to prevent
| that.
| endisneigh wrote:
| Can someone describe to me a search engine that's immune to such
| problems and also searches a large variety of old and new sites?
| majormajor wrote:
| We often talk about this problem as if we expect Google to
| solve a social and economic people problem purely through math
| alone.
|
| The appeal of open-to-all search as a way of navigating the web
| was that there was a huge long tail of interesting stuff that
| would be hard to manually index and categorize.
|
| If the long tail of interesting stuff has fully drowned in a
| sea of spam and crap, I'm not sure that it still makes that
| much sense over something smaller but human-curated.
|
| Perhaps the trick would be human curation with extensive and
| always-evolving AI tools to speed up the curation. You have to
| get past the filter to get in, versus being in by default
| unless you're blatant enough to get banned. There is a layer of
| human judgement in addition to the algorithm's score of the
| content, and additionally that gives you an extra scoring
| factor on the algorithms yourself - the humans should be able
| to help direct the development of the algorithm to fight spam
| more preemptively.
|
| Would a mix of that give us a bigger internet than the
| entirely-manual "web directory" days of 1997 or so, but a less-
| shit-filled one than today's?
| ocdtrekkie wrote:
| Oh the irony. The company that lets AI determine what it thinks
| is factual says using AI to generate content is bad.
| peterisza wrote:
| They want to use the internet to train their models. AI generated
| text would contaminate the training data.
| hidroto wrote:
| perhaps they don't want the datasets that they use to train the
| AI to be watered down by other AI generated content. after all
| alot of the text data was sourced from the internet.
| magicalist wrote:
| > _" For us these would, essentially, still fall into the
| category of automatically generated content which is something
| we've had in the Webmaster Guidelines since almost the
| beginning._
|
| > _And people have been automatically generating content in lots
| of different ways. And for us, if you're using machine learning
| tools to generate your content, it's essentially the same as if
| you're just shuffling words around, or looking up synonyms, or
| doing the translation tricks that people used to do. Those kind
| of things._
|
| > _My suspicion is maybe the quality of content is a little bit
| better than the really old school tools, but for us it's still
| automatically generated content, and that means for us it's still
| against the Webmaster Guidelines. So we would consider that to be
| spam. "_
|
| So are people reacting thinking this is a new policy or...?
| hackernewds wrote:
| "This is okay since it has been policy" is that same vibe as
| "We have to bump you off the plane because it is company
| policy"
| Traster wrote:
| I think it's difficult to marry this policy with the _other_
| stuff Google is claiming to be dramatically transformational.
| If Google came out and said "Hey, this self-driving stuff
| isn't really dissimilar from traditional driving assist." there
| would be some questions to answer. Which of course, is what the
| regulators actually say.
| npunt wrote:
| This internal tension between chasing AI tooling and avoiding AI-
| generated content is just a prelude to the bigger shift of search
| engines getting reinvented around generated results instead of
| found results.
|
| Fast forward 10+ years and for knowledge-related queries search
| is going to be more about generated results personalized to our
| level of understanding that at best quote pages, and more likely
| just reference them in footnotes as primary inputs.
|
| These knowledge-related queries are where most content farms, low
| quality blogs, and even many news sites get traffic from today.
| If the balance of power between offense (generating AI content)
| and defense (detecting AI content) continues to favor offense,
| there will be a strong incentive to just throw the whole thing
| out and go all-in on generated results.
|
| Big question is how incentives play out for the people gathering
| the knowledge about the world, which is the basis for generated
| results. Right now many/most make money with advertising, but so
| do content farms, and more generation means more starving of that
| revenue source. Wikipedia is an alternative model for this
| knowledge, but they only cover a certain portion of factual info
| that people want to know and if search uses it more, will become
| more of a single point of failure.
|
| Really interesting stuff ahead.
| asar wrote:
| Is this even technically possible? To me, this reads like an
| empty threat.
| ganzuul wrote:
| Hey would you mind solving some captchas? Nothing personal,
| just a little paranoid.
| asar wrote:
| Great point! I think with captchas, some humans might be able
| to identify obviously made up facts and very poor sentence
| structure to tag a text as low quality. But I don't think
| you'd be able to give a reliable assessment on whether a text
| was AI generated or not. Especially if it's a random group of
| people (unfamiliar with modern content generation) filling
| out the captchas.
| ganzuul wrote:
| Captchas can be made adversarial. Not sure if it's a good
| idea to try that with text since we don't know how humans
| react. Maybe that's what the Phenomenon is about?
| sumoboy wrote:
| So it's ok for google to generate <title> tags, google ad
| headlines, and email assistance with AI while new agencies for
| awhile robo generate articles, the real issue is google knows
| this is gaming seo and they will struggle against this which will
| only get better.
| [deleted]
| MrPatan wrote:
| AI for me, not for thee?
| ceejayoz wrote:
| Can't wait to be delisted from Google over a false positive with
| zero recourse.
| [deleted]
| Animats wrote:
| Most of the content Google generates is either AI-generated or
| plagarized.
| Shadonototra wrote:
| it is very important to cover the legal aspect of such thing now
|
| otherwise some dumb people will want ai generated content to be
| allowed everywhere
|
| i'm pretty sure they are doing their move now because it causes
| them ton of issues with YouTube
|
| allowing ai generated content, would mean allowing ai generated
| comments on youtube, wich is already happening and causes lot of
| issues
|
| if you can't tell what is AI generated and use
| comments/discussions/like/dislike in your algorithms for ranking
| videos, then it'll be very easy for 3rd parties to push and play
| the game, including ad revenue
|
| the inevitable will come sooner rather than later, get ready for
| your online passport!
| fny wrote:
| There's an interesting variant of the Turing test here: develop
| an AI sufficiently intelligent to distinguish human content from
| AI generated content.
|
| I might be wrong, but I think this might be a more difficult task
| than generating convincing dialogue: its very easy to generate
| text that statistically resembles human writing, and generators
| trained on certain topics (i.e. some science niche) might be
| impossible to flag.
| mensetmanusman wrote:
| In theory GTP-3 could fill the entire internet with approximately
| true but false information. Wikipedia, comment boxes, blogs,
| Wordpress, everything.
|
| When the percentage of human generated content approaches
| 0.0000%, what does that internet look like?
| aaaaaaaaaaab wrote:
| We'll go back to invite-only forums.
| WithinReason wrote:
| "There is another theory which states that this has already
| happened."
| imranq wrote:
| So your post was generated by a bot? And I suppose this reply
| was also generated by a bot? Its bots all the way down
| ChrisGranger wrote:
| It's a _Hitchhiker 's Guide to the Galaxy_ reference.
| EarlKing wrote:
| "It's turtles all the way down."
|
| In the process of remembering that little tidibt it
| reminded me of a short story I read once that was
| circulated without attribution for a few decades, i.e.
| Terry Bissom's "They're Made Out of Meat"[1] published in
| OMNI in 1990. "They're meat all the way through."
|
| [1] https://web.archive.org/web/20190501130711/http://www
| .terryb...
| amelius wrote:
| And how would GPT-3 evolve when it starts feeding on its own
| output?
| visarga wrote:
| Use another model to filter out generated text?
| amelius wrote:
| I assume they will use a GAN to evade Google's policy.
| BlueTemplar wrote:
| For the Web, we'll just go back to webrings and directly
| sharing links with people we know to be real ?
|
| (Another risk might be governments-enforced identification, no
| more pseudonymity !)
| dragonwriter wrote:
| > In theory GTP-3 could fill the entire internet with
| approximately true but false information.
|
| Much of the internet is currently full of not-even-
| approximately true (and often maliciously false) information,
| so I'm not particular worried about that.
| EarlKing wrote:
| Exactly like the one we have now: A cacophonous cesspit filled
| with the mental diarrhea of a million bots sharting their
| opinions into the void in a desperate bid to either sway your
| opinion to their cause (whether commerce or politics), or bury
| you under such an avalanche of bullshit that you remain
| paralyzed with indecision.
|
| And the result of this? Genuine conversation progressively
| retreats from the internet at large as it becomes overrun with
| the intellectual flatus of the bot wars, and people move
| further and further into silos in which bot behavior can be
| spotted and eliminated.
|
| Welcome to the future. How do you like it, gentlemen? All your
| posts are belong to us.
| mjburgess wrote:
| We find ways to route around it. You're presently using one.
| EarlKing wrote:
| Reread the second paragraph. We're in agreement.
| eslaught wrote:
| Really? I think HN has even fewer safeguards against AI-
| generated content than Google. No offense to dang and co.,
| and I'm sure there's more going on there than I'm aware of.
| But still, I'm pretty sure it would be trivial to set up an
| account and use GPT-3 to produce the content. The only
| reason I'd suspect this isn't happening is because there
| isn't a strong financial incentive to do so. In other
| words, HN avoids the spam because it's still too small to
| matter.
| gus_massa wrote:
| They will be downvoted into Oblidon. And then banned,
| shadow-banned, hell-banned and a few more creative ways
| of banning. The account, the IP, the site, and perhaps
| anyone that a bayesian filter put in the same bucket.
| lordnacho wrote:
| Can't be that hard to train GPT on HN comments. Plus if
| someone were to do it, they probably know of HN.
|
| I could definitely see someone already having trained a
| bot to write HN comments and posting them.
|
| What's anyone going to do about it? It's super hard to
| write a discriminator that works well enough to not
| destroy the site for everyone.
| dragonwriter wrote:
| > Can't be that hard to train GPT on HN comments.
|
| Yes, which will produce things that are stylistically
| similar to HN comments, but without any connection to
| external reality beyond the training data and prompt.
|
| That _might_ provide believable comments, but not things
| likely to be treated as _high-quality_ ones, and not
| virtual posters that respond well to things like
| moderation warnings from dang.
| babyshake wrote:
| We will likely find out. And it may happen in a way that is
| fast enough to be very disorienting.
| Hayarotle wrote:
| Above a certain percentage it's going to poison human-generated
| content too. You will have to discern between ai-generated
| content, ai-influenced-human generated content and genuine
| human-generated content.
|
| One could argue it's already happening. How many of the people
| we talk to everyday get their facts from SEO-spam websites and
| Google instant answers (which often sources its content from
| such websites)? Even if we avoid AI-generated content, we might
| be gettting fed it by proxy.
| ganzuul wrote:
| Human filtering of AI creativity might work, but deepfakes
| mismatch with that. Personally I decided to make it a pattern
| that I unsubscribe from channels that use deepfakes since I
| saw Internet Historian using it and possibly adding to the
| already crippling confusion regarding UAPs. - IH is not a
| credible source anyway, but you can easily use the clips they
| produced without attribution.
|
| I think the world will be a better place if everybody follows
| a similar pattern. The only reason to use deepfakes is if the
| victim who's identity is being stolen is not cooperating with
| you. - It's a new way to violate a person's integrity and
| their right to agency in our already fading grasp of reality.
| You could probably gaslight your girlfriend with it, if you
| are incomprehensibly evil.
| DaltonCoffee wrote:
| >You could probably gaslight your girlfriend with it, if
| you are incomprehensibly evil.
|
| I'm honestly a bit more concerned with people gaslighting
| courts.
|
| The technology isn't going away, unfortunately. Society
| will have to adapt to these new invasive norms, as they
| already have time and again in the past.
| ajsnigrutin wrote:
| > approximately true but false
|
| > When the percentage of human generated content approaches
| 0.0000%, what does that internet look like?
|
| ...approximately the same, but less false :)
| josefx wrote:
| Given that the AI are trained on human generated content to
| be human like without having understanding of the content, I
| would think approximately the same but even less correct.
| EarlKing wrote:
| A bot would say that.
| ajsnigrutin wrote:
| Are you accusing me of being a bot?!
|
| This is a lie, i sw
|
| <NUL>
|
| <NUL>
|
| Segmentation fault
| hprotagonist wrote:
| This is a minor plot-point in Anathem.
| thorum wrote:
| If I'm understanding the quoted interview correctly, Google is
| talking about AI generated spam - like when you ask GPT-3 to
| write you an article about XYZ topic and it spits out 500 words
| of well-written, plausible sounding gibberish - that you throw up
| on your website to try to rank in the search engines.
|
| However, they seem to be leaving open the possibility of AI-
| assisted writing, where a human comes up with the information and
| guides the AI as it puts that information into words.
|
| > _From our recommendation we still see it as automatically
| generated content. I think over time maybe this is something that
| will evolve in that it will become more of a tool for people.
| Kind of like you would use machine translation as a basis for
| creating a translated version of a website, but you still work
| through it manually._
|
| > _And maybe over time these AI tools will evolve in that
| direction that you use them to be more efficient in your writing
| or to make sure that you're writing in a proper way like the
| spelling and the grammar checking tools, which are also based on
| machine learning. But I don't know what the future brings there._
|
| In my opinion GPT-3 has already reached the point of being useful
| for this purpose - there are several GPT-3 based apps that do
| exactly what he's describing.
| amelius wrote:
| I'm glad that I'm not one of the human moderators having to
| read GPT-3 gibberish on a daily basis.
| uuyi wrote:
| I was a moderator for a large Internet forum. GPT-3 is far
| more coherent than a lot of humans.
| DonHopkins wrote:
| If it's simply a quality issue, then at the point that AI
| generated content becomes better than human generated
| content, will Google ban human generated content?
| thorum wrote:
| At that point they'll probably just have their own AI
| that generates the perfect response to any search query,
| and return it as the top result.
| MiddleEndian wrote:
| After getting this result
| https://i.redd.it/4wrj2xpp75s81.png a couple days ago, I
| am not confident that will be any time soon.
| [deleted]
| [deleted]
| neilv wrote:
| A friend (university science lab technician, in a field that
| didn't pay like CS) would make approx. $2/hour extra money in
| the evenings, from some Web site that directed her what topics
| to write articles about. Google the topic, rapidly skim,
| distill/rephrase it to a certain word count. (I suggested she
| could make a lot more working in a cafe, but she was physically
| exhausted from being on feet all day in lab, with lots of
| moving around heavy objects.)
|
| I guessed it was used for filler content for SEO sites.
|
| A question is whether that company would save money by using
| "AI" text generation, when they were paying real humans so
| little, for arguably higher quality.
| walrus01 wrote:
| Maybe this wouldn't be a problem if Google did better at
| distinguishing between seo stuffing content farms and actual
| websites.
| [deleted]
___________________________________________________________________
(page generated 2022-04-09 23:00 UTC)