[HN Gopher] Google: Auto-translated content not indexed
___________________________________________________________________
Google: Auto-translated content not indexed
Author : luxpir
Score : 37 points
Date : 2021-11-12 11:10 UTC (11 hours ago)
(HTM) web link (languageops.com)
(TXT) w3m dump (languageops.com)
| Aulig wrote:
| Yea, that's exactly what I'm experiencing at the moment. I
| created translations for my website with DeepL. One language I
| manually corrected, the other I left as it is. You can guess
| which one is ranking really well and bringing in lots of
| customers. The automatic translation basically didn't bring any
| customers at all.
|
| Now I'm hiring translators from upwork to improve the DeepL
| translations. I pay around $15 per hour. You can go even cheaper
| too if you want a translation to a language that is spoken in
| developing countries.
|
| It's about 50% cheaper to have an existing DeepL translation and
| asking the translators to proof-read as opposed to having them
| translate from scratch (even though I wouldve thought that they'd
| base their translation on an automatic translation first
| anyways).
| elif wrote:
| I see this a little differently. When I'm in a different country
| like Japan and I search for something, the results are very
| relevant.
|
| When I go home to the US and search "something Japan" or
| "something site:*.jp" the results are worse than useless and
| completely different from what I get in Japan.
|
| The internet is supposed to be our universal bridge, but here it
| is unwittingly segmenting us into fractured universes presented
| as uniform.
| smnrchrds wrote:
| It's a bit annoying that the default it like that, but you can
| change it. On the Google search results page, click on the gear
| button, go to `Search settings`, and set `Region Settings` to
| Japan.
| The_rationalist wrote:
| How do they even detect it? sounds computationally very expensive
| and imprecise
| 1cvmask wrote:
| So google will penalize you for using their own google
| translations? And then use humans for it. This is on a blog post
| of a human translation service. I wonder how true this is given
| the incentives of the source.
|
| How does google machine know that we are using humans versus
| machine translation?
| londons_explore wrote:
| If I were them, I would keep some kind of hash of any stuff
| they have translated, so that when they index it they know if
| its the output of their own translator.
|
| They probably also have ML models to try to detect machine
| translations too - ifa human can pick up a machine translation
| easily, a machine can probably be trained to detect it.
|
| I can see why they want to do that too - multilingual data is
| very useful for machine language understanding (the computer
| effectively has two independent ways to understand whats being
| said), but contaminating the data with machine translations
| makes it nearly useless.
| throwaway2077 wrote:
| >If I were them, I would keep some kind of hash of any stuff
| they have translated, so that when they index it they know if
| its the output of their own translator.
|
| that would be trivial to work around
| tut-urut-utut wrote:
| Then just don't use Google translator. There are some that are
| better quality anyway.
| franze wrote:
| i can say of myself that I kinda pioneered "auto translating on
| scale for SEO traffic" in 2008, 2009. at that time I was working
| for a startup incubator and one of our internal clients was
| tripwolf.com.
|
| we had major success with an automated, aggregated SEO strategy
| for 123people.com and wanted to apply the learnings to the travel
| information space.
|
| so we got high quality content from a lot of travel guide
| publishing houses and together with some other aggregation of
| yellow pages we translated the mostly german base content into
| en, fr, es, pt, .....
|
| and it worked.
|
| like crazy. for a short time we attracted more traffic than
| tripadvisor and yelp together (based on the competitive traffic
| data we had at that time). traffic (and my ego) exploded. we also
| did not go against any google guidelines, other than one: if you
| are spammy, you are spammy.
|
| the google guidelines were updated (automated translated content
| seen as "pure spam") and next days the hammer of google came
| down, on different section of the websites, different markets and
| on and on. (the company much later migrated to a native app
| business model and was quite successful for a few years). the
| portuguese content worked the longest, even a few years later
| still got substantial traffic.
|
| to my knowledge we were the biggest auto translate player at that
| time and we did it better than anyone else.
|
| but all in all, it was 2008/09 and at that time for online
| startups the difference between traffic channel and product was
| not yet a known given. getting traffic via google + ads were seen
| as a sustainable business case. it was not. we had no real
| product, too much focus on SEO. so all in all, that strategy was
| longterm negative value.
|
| nowadays I refuse to take any clients who do not have at least an
| MVP in place. cause even more traffic than your servers can
| handle will not save your startup, ever.
| skinkestek wrote:
| Now that you admitted this I'll admit I feel a mighty urge to
| downvote and/or flag you since I hate this kind of content
| deeply.
|
| That said I won't do it and I'll rather want people to tell
| these stories instead of keeping quiet.
|
| That said: everyone else, stop now before we need to dust off
| the venerable LOIC technology and nuke you from orbit ;-)
|
| (just joking, I won't do that but I also won't do business with
| anyone who pollutes the Internet with that kind of cr@p if I
| have alternatives.)
| jimmaswell wrote:
| Why do you hate content made available in another language
| through automatic means? I've read lots of things auto-
| translated into English and it's always at least acceptable.
| skinkestek wrote:
| Translation from languages I don't know to English is fine
| with me.
|
| Translation from well written text in languages I know to
| absolute garbage in my native language, that is what I
| hate.
|
| Edit: I see you have a point. I missed the fact that GP was
| translating German to English.
|
| I guess people who knows German slightly better than me (I
| can make myself understood and have used it as a working
| language for a very short while) will hate auto translated
| German almost as much as I and others hate autotranslated
| English, but not quite as much:
|
| A lot of the problem with autotranslated text is that the
| physical thing or software we are supoosed to use still has
| English user interfaces so we are left trying to decode
| what the original text said.
| ghaff wrote:
| There are obviously translation tools out there if you want
| to use them for yourself. But it seems fairly obvious to me
| that it's not desirable to flood the web and search results
| with a bunch of barely/sort-of understandable text that is
| the output of ML translation.
| franze wrote:
| doesn't work like that now for about a decade, and thats a
| good thing.
|
| it was at a time when I saw search (engine optimisation)
| mostly as a technical challenge. it's not - anymore, if it
| ever was.
| Zanfa wrote:
| Definitely a welcome change, low quality auto-translated results
| are one of the things that has become a major issue when trying
| to find anything in my native language. For a lot of searches,
| literally most of the results end up being an incoherent mess of
| spammy auto-translated links.
| aaron695 wrote:
| > Google: Auto-translated content not indexed
|
| The title is wrong. The article says it is indexed.
|
| This has been talked about before and Google have admitted/said
| they cannot just find machine translated content.
|
| This just says if they catch you humans might de-index you.
|
| Where the war exactly is, this article is not exposing.
|
| The article is this conversation -
| https://www.youtube.com/watch?v=qoISMxlhNTI&t=165s other videos
| have also talked about this with more hints.
|
| This is basically the future of GPT-3 nonsense that machines (And
| some humans) can't see is nonsense. At least currently with
| translate the output has value.
| patrakov wrote:
| Finally!
|
| Note that the problem with auto-translated StackOverflow clones
| in Russia is so severe that browser extensions and adblock lists
| were created just for this purpose. E.g.
| https://github.com/Nebula-Mechanica/Anti-AutoTranslation-Lis...
|
| And this kind of spam is one of the reasons why I switched from
| Google to DuckDuckGo for web search.
| ebanana wrote:
| i think it would be interesting to translate 98% of any websites
| to the native speakers tongue (the person viewing the content)
| but leave the remaining 2% as the original language (of the
| website author) the reason would be to eventually have everyone
| understood some key words in each the other's languages, its a
| wild concept. eventually the internet will have 1 main mashed up
| universal tongue
| mattowen_uk wrote:
| > _The best practice as of 2021 is to take your existing, best-
| performing content and get it professionally translated or re-
| written._
|
| Yeah because we've all got money to burn on hiring translators.
| Another example of small websites being eradicated from the
| [discoverable] Web.
| TonyTrapp wrote:
| I have to admit that I have never seen a small website using
| auto-translated content. I see that more frequently on the
| content mills that noone wants, on Microsoft documentation (oh
| the horrors!) and most recently on various shopping websites
| that want to show you offers from other countries (ebay fails
| terribly at this). I don't think small website owners are in
| trouble, the big ones are.
| iggldiggl wrote:
| > I see that more frequently on the content mills that noone
| wants
|
| Some of those Wikipedia-republishing SEO spam sites seem to
| take the English Wikipedia and then auto-translate that into
| other languages instead of directly mirroring each country's
| native Wikipedia. At least they're good for a laugh because
| whatever translation provider they're using still tends
| towards those hilariously literal translations that no longer
| happen that frequently with Google.
| ghaff wrote:
| Machine translation (and transcription) is better than nothing
| --especially for personal use. But it's pretty mediocre at
| best. So if you publish it as finished material I'd expect it
| to be treated like any other low quality content. You don't get
| some special pass because you're a "small website."
| karatinversion wrote:
| I don't really understand this complaint. You are running a
| small website, and you want it to be promoted to users in
| another language, but you don't want to spend any resources in
| translating or localizing the content?
| mattowen_uk wrote:
| Maybe I don't have the spare capital as I'm a fledgling
| start-up, but still want to be able to reach as many people
| as possible?
| karatinversion wrote:
| But do the people have any interest in seeing text that is
| machine translated without editing? MT is good, but you can
| still tell that test has gone through it; for me, reading
| it when I haven't triggered the translating yourself
| elicits the same reaction as spam emails with obvious
| typos.
| 6gvONxR4sf7o wrote:
| Users are sick of scale at all costs. Global scale with crappy
| service isn't a birthright.
| numpad0 wrote:
| _Cheer_ Please _vacate_ leave your _location_ site _almost it
| had been_ as it is. MT outputs aren 't as good as it might look
| when applied twice, and often needs to be compared against
| original text to make out intention. They also "sound" robotic
| and creepy.
|
| Once I've encountered a bunch of product advertised as "applies
| delight scam". Make a guess at it.
| withinboredom wrote:
| Thanks for your comment. I like it very much. It seems
| strange to me that people think that machine translated text
| is perfectly acceptable. Maybe people will appreciate it more
| after reading this post.
___________________________________________________________________
(page generated 2021-11-12 23:02 UTC)