[HN Gopher] Google: Auto-translated content not indexed
       ___________________________________________________________________
        
       Google: Auto-translated content not indexed
        
       Author : luxpir
       Score  : 37 points
       Date   : 2021-11-12 11:10 UTC (11 hours ago)
        
 (HTM) web link (languageops.com)
 (TXT) w3m dump (languageops.com)
        
       | Aulig wrote:
       | Yea, that's exactly what I'm experiencing at the moment. I
       | created translations for my website with DeepL. One language I
       | manually corrected, the other I left as it is. You can guess
       | which one is ranking really well and bringing in lots of
       | customers. The automatic translation basically didn't bring any
       | customers at all.
       | 
       | Now I'm hiring translators from upwork to improve the DeepL
       | translations. I pay around $15 per hour. You can go even cheaper
       | too if you want a translation to a language that is spoken in
       | developing countries.
       | 
       | It's about 50% cheaper to have an existing DeepL translation and
       | asking the translators to proof-read as opposed to having them
       | translate from scratch (even though I wouldve thought that they'd
       | base their translation on an automatic translation first
       | anyways).
        
       | elif wrote:
       | I see this a little differently. When I'm in a different country
       | like Japan and I search for something, the results are very
       | relevant.
       | 
       | When I go home to the US and search "something Japan" or
       | "something site:*.jp" the results are worse than useless and
       | completely different from what I get in Japan.
       | 
       | The internet is supposed to be our universal bridge, but here it
       | is unwittingly segmenting us into fractured universes presented
       | as uniform.
        
         | smnrchrds wrote:
         | It's a bit annoying that the default it like that, but you can
         | change it. On the Google search results page, click on the gear
         | button, go to `Search settings`, and set `Region Settings` to
         | Japan.
        
       | The_rationalist wrote:
       | How do they even detect it? sounds computationally very expensive
       | and imprecise
        
       | 1cvmask wrote:
       | So google will penalize you for using their own google
       | translations? And then use humans for it. This is on a blog post
       | of a human translation service. I wonder how true this is given
       | the incentives of the source.
       | 
       | How does google machine know that we are using humans versus
       | machine translation?
        
         | londons_explore wrote:
         | If I were them, I would keep some kind of hash of any stuff
         | they have translated, so that when they index it they know if
         | its the output of their own translator.
         | 
         | They probably also have ML models to try to detect machine
         | translations too - ifa human can pick up a machine translation
         | easily, a machine can probably be trained to detect it.
         | 
         | I can see why they want to do that too - multilingual data is
         | very useful for machine language understanding (the computer
         | effectively has two independent ways to understand whats being
         | said), but contaminating the data with machine translations
         | makes it nearly useless.
        
           | throwaway2077 wrote:
           | >If I were them, I would keep some kind of hash of any stuff
           | they have translated, so that when they index it they know if
           | its the output of their own translator.
           | 
           | that would be trivial to work around
        
         | tut-urut-utut wrote:
         | Then just don't use Google translator. There are some that are
         | better quality anyway.
        
       | franze wrote:
       | i can say of myself that I kinda pioneered "auto translating on
       | scale for SEO traffic" in 2008, 2009. at that time I was working
       | for a startup incubator and one of our internal clients was
       | tripwolf.com.
       | 
       | we had major success with an automated, aggregated SEO strategy
       | for 123people.com and wanted to apply the learnings to the travel
       | information space.
       | 
       | so we got high quality content from a lot of travel guide
       | publishing houses and together with some other aggregation of
       | yellow pages we translated the mostly german base content into
       | en, fr, es, pt, .....
       | 
       | and it worked.
       | 
       | like crazy. for a short time we attracted more traffic than
       | tripadvisor and yelp together (based on the competitive traffic
       | data we had at that time). traffic (and my ego) exploded. we also
       | did not go against any google guidelines, other than one: if you
       | are spammy, you are spammy.
       | 
       | the google guidelines were updated (automated translated content
       | seen as "pure spam") and next days the hammer of google came
       | down, on different section of the websites, different markets and
       | on and on. (the company much later migrated to a native app
       | business model and was quite successful for a few years). the
       | portuguese content worked the longest, even a few years later
       | still got substantial traffic.
       | 
       | to my knowledge we were the biggest auto translate player at that
       | time and we did it better than anyone else.
       | 
       | but all in all, it was 2008/09 and at that time for online
       | startups the difference between traffic channel and product was
       | not yet a known given. getting traffic via google + ads were seen
       | as a sustainable business case. it was not. we had no real
       | product, too much focus on SEO. so all in all, that strategy was
       | longterm negative value.
       | 
       | nowadays I refuse to take any clients who do not have at least an
       | MVP in place. cause even more traffic than your servers can
       | handle will not save your startup, ever.
        
         | skinkestek wrote:
         | Now that you admitted this I'll admit I feel a mighty urge to
         | downvote and/or flag you since I hate this kind of content
         | deeply.
         | 
         | That said I won't do it and I'll rather want people to tell
         | these stories instead of keeping quiet.
         | 
         | That said: everyone else, stop now before we need to dust off
         | the venerable LOIC technology and nuke you from orbit ;-)
         | 
         | (just joking, I won't do that but I also won't do business with
         | anyone who pollutes the Internet with that kind of cr@p if I
         | have alternatives.)
        
           | jimmaswell wrote:
           | Why do you hate content made available in another language
           | through automatic means? I've read lots of things auto-
           | translated into English and it's always at least acceptable.
        
             | skinkestek wrote:
             | Translation from languages I don't know to English is fine
             | with me.
             | 
             | Translation from well written text in languages I know to
             | absolute garbage in my native language, that is what I
             | hate.
             | 
             | Edit: I see you have a point. I missed the fact that GP was
             | translating German to English.
             | 
             | I guess people who knows German slightly better than me (I
             | can make myself understood and have used it as a working
             | language for a very short while) will hate auto translated
             | German almost as much as I and others hate autotranslated
             | English, but not quite as much:
             | 
             | A lot of the problem with autotranslated text is that the
             | physical thing or software we are supoosed to use still has
             | English user interfaces so we are left trying to decode
             | what the original text said.
        
             | ghaff wrote:
             | There are obviously translation tools out there if you want
             | to use them for yourself. But it seems fairly obvious to me
             | that it's not desirable to flood the web and search results
             | with a bunch of barely/sort-of understandable text that is
             | the output of ML translation.
        
           | franze wrote:
           | doesn't work like that now for about a decade, and thats a
           | good thing.
           | 
           | it was at a time when I saw search (engine optimisation)
           | mostly as a technical challenge. it's not - anymore, if it
           | ever was.
        
       | Zanfa wrote:
       | Definitely a welcome change, low quality auto-translated results
       | are one of the things that has become a major issue when trying
       | to find anything in my native language. For a lot of searches,
       | literally most of the results end up being an incoherent mess of
       | spammy auto-translated links.
        
       | aaron695 wrote:
       | > Google: Auto-translated content not indexed
       | 
       | The title is wrong. The article says it is indexed.
       | 
       | This has been talked about before and Google have admitted/said
       | they cannot just find machine translated content.
       | 
       | This just says if they catch you humans might de-index you.
       | 
       | Where the war exactly is, this article is not exposing.
       | 
       | The article is this conversation -
       | https://www.youtube.com/watch?v=qoISMxlhNTI&t=165s other videos
       | have also talked about this with more hints.
       | 
       | This is basically the future of GPT-3 nonsense that machines (And
       | some humans) can't see is nonsense. At least currently with
       | translate the output has value.
        
       | patrakov wrote:
       | Finally!
       | 
       | Note that the problem with auto-translated StackOverflow clones
       | in Russia is so severe that browser extensions and adblock lists
       | were created just for this purpose. E.g.
       | https://github.com/Nebula-Mechanica/Anti-AutoTranslation-Lis...
       | 
       | And this kind of spam is one of the reasons why I switched from
       | Google to DuckDuckGo for web search.
        
       | ebanana wrote:
       | i think it would be interesting to translate 98% of any websites
       | to the native speakers tongue (the person viewing the content)
       | but leave the remaining 2% as the original language (of the
       | website author) the reason would be to eventually have everyone
       | understood some key words in each the other's languages, its a
       | wild concept. eventually the internet will have 1 main mashed up
       | universal tongue
        
       | mattowen_uk wrote:
       | > _The best practice as of 2021 is to take your existing, best-
       | performing content and get it professionally translated or re-
       | written._
       | 
       | Yeah because we've all got money to burn on hiring translators.
       | Another example of small websites being eradicated from the
       | [discoverable] Web.
        
         | TonyTrapp wrote:
         | I have to admit that I have never seen a small website using
         | auto-translated content. I see that more frequently on the
         | content mills that noone wants, on Microsoft documentation (oh
         | the horrors!) and most recently on various shopping websites
         | that want to show you offers from other countries (ebay fails
         | terribly at this). I don't think small website owners are in
         | trouble, the big ones are.
        
           | iggldiggl wrote:
           | > I see that more frequently on the content mills that noone
           | wants
           | 
           | Some of those Wikipedia-republishing SEO spam sites seem to
           | take the English Wikipedia and then auto-translate that into
           | other languages instead of directly mirroring each country's
           | native Wikipedia. At least they're good for a laugh because
           | whatever translation provider they're using still tends
           | towards those hilariously literal translations that no longer
           | happen that frequently with Google.
        
         | ghaff wrote:
         | Machine translation (and transcription) is better than nothing
         | --especially for personal use. But it's pretty mediocre at
         | best. So if you publish it as finished material I'd expect it
         | to be treated like any other low quality content. You don't get
         | some special pass because you're a "small website."
        
         | karatinversion wrote:
         | I don't really understand this complaint. You are running a
         | small website, and you want it to be promoted to users in
         | another language, but you don't want to spend any resources in
         | translating or localizing the content?
        
           | mattowen_uk wrote:
           | Maybe I don't have the spare capital as I'm a fledgling
           | start-up, but still want to be able to reach as many people
           | as possible?
        
             | karatinversion wrote:
             | But do the people have any interest in seeing text that is
             | machine translated without editing? MT is good, but you can
             | still tell that test has gone through it; for me, reading
             | it when I haven't triggered the translating yourself
             | elicits the same reaction as spam emails with obvious
             | typos.
        
         | 6gvONxR4sf7o wrote:
         | Users are sick of scale at all costs. Global scale with crappy
         | service isn't a birthright.
        
         | numpad0 wrote:
         | _Cheer_ Please _vacate_ leave your _location_ site _almost it
         | had been_ as it is. MT outputs aren 't as good as it might look
         | when applied twice, and often needs to be compared against
         | original text to make out intention. They also "sound" robotic
         | and creepy.
         | 
         | Once I've encountered a bunch of product advertised as "applies
         | delight scam". Make a guess at it.
        
           | withinboredom wrote:
           | Thanks for your comment. I like it very much. It seems
           | strange to me that people think that machine translated text
           | is perfectly acceptable. Maybe people will appreciate it more
           | after reading this post.
        
       ___________________________________________________________________
       (page generated 2021-11-12 23:02 UTC)