[HN Gopher] Playing Around with Machine Translation
       ___________________________________________________________________
        
       Playing Around with Machine Translation
        
       Author : Thevet
       Score  : 20 points
       Date   : 2023-09-13 20:09 UTC (1 days ago)
        
 (HTM) web link (davidabell.substack.com)
 (TXT) w3m dump (davidabell.substack.com)
        
       | ZeroGravitas wrote:
       | I wonder if anyone at Project Gutenberg or similar is looking to
       | autogenerate translations of out of copyright classics.
       | 
       | Ice seen a few people here recommend against some copyright free
       | translations because a more recent translation is better.
       | 
       | Possibly the AI tools aren't yet as good as the best human
       | translators, but are they already better than what's available
       | copyright free?
        
         | OkayPhysicist wrote:
         | Fundamentally, translation is a harder problem than it gets
         | credit for. At the dictionary level, you're mostly alright, as
         | words and concepts tend to have pretty close direct
         | translations (though you'll often need context, as homynyms
         | exist). A step above that, forming full statements, you run
         | into some difficulty, due to the difficulty in translating the
         | subtleties of word choice, as you've got a fair amount of
         | cultural context to take into account. "Forgive me Father for I
         | have sinned" does not carry the connotations as "Sorry, Daddy,
         | I've been naughty". But a step up from there, translating
         | entire texts, you need a pretty complex theory of the mind, as
         | you need to take into account the context of the author's
         | perspective, their interpretation of their intended audience's
         | perspective, and your audience's perspective.
         | 
         | For example, my "Forgive/Sorry" joke earlier relies on the
         | context of both you and I being somewhat aware of A) the
         | Catholic church's process of confessional and B) the relatively
         | modern use of "Daddy" and "naughty" with sexual connotations,
         | which, forget language specific, is culturally specific, where
         | the joke would break if you tried telling it to, say, a place
         | that still used "Daddy" as perfectly normal way to address
         | one's father, or lacked the cultural norm of sexualizing
         | authority.
         | 
         | If someone was trying to translate this comment to another
         | language, they might have to completely alter that joke in
         | order for it to make any sense, at which point that entire last
         | paragraph would have to change, etc. Modern AI tools have
         | largely reached a point where the homonym problem isn't
         | crippling them anymore, but haven't really reached much beyond
         | that.
        
       | l0new0lf-G wrote:
       | Don't forget that English and French have much in common
       | lexilogically and grammatically, and even some slang must be
       | similar because of geographical proximity and cultural exchange.
       | 
       | I am nearly certain that no machine will ever be able to
       | accurately translate between languages with significant
       | linguistic distance (e.g. Japanese and Swedish).
       | 
       | I experience this first hand, whenever I translate from my native
       | Greek to English, especially if there is slang involved. Whenever
       | Google translate encounters long phrases in my texts, the result
       | is comical, not to mention that the emotions are not properly
       | conveyed.
       | 
       | I can only begin to imagine the inaccuracies in translations from
       | Mandarin.
       | 
       | Nevertheless, I never expected even fairly accurate translations
       | between even related languages such as French and English. It
       | indeed sends chills down the spine. It feels like there is some
       | form of actual intelligence involved.
        
         | acomjean wrote:
         | in the late 90s my Mom (bilingual, having migrated to the US)
         | did some work on the side for a translation agency (She worked
         | doing internationalization for DataGeneral, parametrics and
         | others) . She would evaluate translations for prospective
         | translators. She got a batch that was terrible. Turns out they
         | were machine translated. We have come a long way.
        
         | tralarpa wrote:
         | > I am nearly certain that no machine will ever be able to
         | accurately translate between languages with significant
         | linguistic distance (e.g. Japanese and Swedish).
         | 
         | Is that really the reason? Or rather the fact that there is
         | much less training data available?
        
           | naniwaduni wrote:
           | Languages even moderately distant tend to strain the concept
           | of an accurate translation in the first place for any
           | nontrivial utterance.
        
             | bugglebeetle wrote:
             | Eh, I speak both English and Japanese and I would say that
             | what counts as accurate translation is what is most
             | proximate to that threshold of fundamental dissimilarity.
             | Measuring translation accuracy for all languages the same
             | way is more the problem here. There is no such thing as 1-1
             | to translation. It's more like 1-(1+n) where n accounts for
             | said distance. For languages with shared origins, n can be
             | fairly small, while for those with entirely separate ones,
             | it can be quite large.
             | 
             | That being said, Japanese to English translation in things
             | like popular culture tends to take far too many liberties,
             | I expect because the culture around Japanese translation in
             | America has a very annoying, Orientalist bent, with people
             | getting off on their "expertise" about a fake exoticism.
        
               | naniwaduni wrote:
               | The fact that you have to have this conversation strongly
               | implies that the notion of an accurate translation is, no
               | surprise, already heavily strained; you've simply chosen
               | to aim for/accept "most accurate possible translation" as
               | the best you can do, while punting on choice of distance
               | metric and its scale.
        
       | og_kalu wrote:
       | It's weirdly flown under the radar but GPT style models as
       | translators are a lot better than state of the art Machine
       | Translators (Deepl, NLLB, Google etc).
       | 
       | Like above, you can already see the difference with close
       | language pairs where Google etc are already very good.
       | 
       | For pairs like English and Japanese etc, google et al will
       | happily devolve into half gibberish so the difference is even
       | more stark.
       | 
       | I did a number of examples a couple months back with
       | English/Chinese before 4 was released. Even then you can see it's
       | a lot better and 4 is as usual a lot better than 3.5.
       | 
       | https://github.com/ogkalu2/Human-parity-on-machine-translati...
        
         | luxpir wrote:
         | It's not under the radar because that's sadly not the case,
         | although it looks like it should work better on the surface.
         | Neural machine translation is just more consistent, doesn't
         | hallucinate and can be easily and cheaply trained over time.
         | 
         | They have some benefits, but as a lot of LLM research has
         | found, they are not production ready. Yet.
        
           | og_kalu wrote:
           | >It's not under the radar because that's sadly not the case
           | 
           | It is the case. Haven't seen anyone who uses both who thinks
           | otherwise.
           | 
           | also https://arxiv.org/abs/2301.13294. This benchmarks 3.5
           | which is quite a bit worse than 4 as it is against Google,
           | NLLB, Deepl
           | 
           | here https://arxiv.org/pdf/2304.02210.pdf, GPT wins
           | overwhelmingly with human evaluations. Seems like the
           | typically evaluation models aren't really cutting it anymore
           | especially BLEU
           | 
           | >Neural machine translation is just more consistent, doesn't
           | hallucinate
           | 
           | It's not more consistent. The 2nd bit is just wrong lol.
           | 
           | One of the biggest complaints of Deepl is the tendency to
           | make stuff up to make translations seem more natural.
           | 
           | Summarization and Translation are the tasks GPT models
           | hallucinate the least.
        
             | cj wrote:
             | I wrote a much longer reply but looks like you deleted the
             | downvoted comment and reposted it.
             | 
             | > It's not more consistent.
             | 
             | I think the parent was probably saying for a given input,
             | Google MT provides the same output.
             | 
             | What is the value of temperature/variability in a LLM
             | powered MT model?
             | 
             | I'd assume given the same inputs, you should only be given
             | the best output.
        
               | og_kalu wrote:
               | >I think the parent was probably saying for a given
               | input, Google MT provides the same output.
               | 
               | I don't care about being given the exact same output
               | (you're not getting deterministic translations from
               | people either). I care about quality translations.
               | Variability for GPT style translations is much more about
               | word choice and style than wrong or extremely different
               | translations. and if i really wanted to, i could guide
               | both (word choice, style) either with instructions or
               | examples.
        
               | naniwaduni wrote:
               | Good news! Per your second link, GPT-4 is a stunning
               | improvement up to "borderline passes quality control"!
        
             | yorwba wrote:
             | In my experience, ChatGPT tends to produce more fluent
             | output, but is less likely to closely follow the input. For
             | some high-resource language pairs, complete mistranslations
             | are rare, but for other languages, not so much. Of the ones
             | I can evaluate, Burmese is particularly error-prone:
             | 
             | ChatGPT translates ng[?]'kiu duttiy akh[?]ng'[?]re:
             | ttc[?]khu pe:khai'tty[?] //  as "I have received a second
             | warning.", which is incorrect. akh[?]ng'[?] https://en.wikt
             | ionary.org/wiki/%E1%80%A1%E1%80%81%E1%80%BD%E... does not
             | mean "warning", even though that is a likely completion of
             | "I have received a second " in English.
             | 
             | Google Translate gives me "Gave me a second chance.", which
             | closely matches the Burmese sentence down to dropping the
             | subject (common in Burmese, rare in English) which makes
             | the translation sound weird.
             | 
             | So any claim that ChatGPT is better/worse at translating
             | really needs to specify the languages involved and what
             | your goal for the translation is. (E.g. the benchmark paper
             | you link seems to focus on the ability to steer the
             | translation by providing additional context.)
        
               | og_kalu wrote:
               | I'm not making a claim for chatGPT so much as I am making
               | a claim for GPT style models.
               | 
               | It's not really a question of high resource vs low
               | resource languages so much as what languages ended up in
               | the training corpus.
               | 
               | 1. There's a lot of transfer learning going on with
               | predict the next token LLMs. A model trained on 500B
               | tokens of English and 50b tokens of French will speak
               | French far better than if it was trained on only 50b
               | tokens of French.
               | 
               | 2. You don't need parallel corpora for every single pair
               | you want to translate to. This means that GPT LLMs only
               | need single text data for the vast majority of languages.
               | Training most NMT models you would need Burmese/English
               | parallel data.
               | 
               | Both of the above combine to mean that not only is
               | quality demonstrably better, amount of data needed is
               | lower too.
               | 
               | GPT's burmese isn't worse because it's low resource. It's
               | because open ai made no specific attempt to include
               | burmese text.
               | 
               | They're not even trying. GPT-3 was 93% English with the
               | 2nd biggest holder less than 2%
        
               | FLSurfer wrote:
               | I used GPT-4 and this was the result:
               | 
               | Please translate this:
               | 
               | ng[?]'kiu duttiy akh[?]ng'[?]re: ttc[?]khu pe:khai'tty[?]
               | //
               | 
               | The sentence "ng[?]'kiu duttiy akh[?]ng'[?]re: ttc[?]khu
               | pe:khai'tty[?] // " translates to "They gave me a second
               | chance." in English.
        
             | luxpir wrote:
             | Your reply is not passing my sniff test. Your hype and bias
             | are showing.
             | 
             | It may present more fluent text, but if it doesn't know
             | it's strayed from the source text and you can't tell either
             | (because you don't understand source language) then you'll
             | end up with error-laden pseudo translations. At least with
             | NMT you know the errors are consistent.
             | 
             | I don't know who you know who thinks GPT is ahead, but
             | nobody in the very well funded translation industry has a
             | GPT powered translation engine for the key reason that it's
             | not ready for production. For a human to post edit MT,
             | we're mainly talking fixing broken vocab. You'd never
             | present raw MT to a client. It needs editing. Heavily. To
             | think LLM translation doesn't need editing is either coming
             | from someone not in the industry, or blinded by hype. And
             | the kind of editing issues are more insidious, like those
             | found in voice dictated texts. Homophones aren't flagged by
             | QA software because they are real words. Just like LLMs
             | make real sentences, except when they don't, but good luck
             | detecting that, and editing out the additional meaning it
             | has decided to inject.
             | 
             | Have you tried to run the GPT-4 API on a segmented xliff at
             | all? If the segmentation is bad, and full of tags, GPT4
             | breaks completely. It tries to close sentences that run
             | across segments, it can't handle tags in-line (the chatgpt
             | interface can, but you can't use that at scale).
             | 
             | It can do some impressive work, don't get me wrong, but I'm
             | not sure how hands-on you've really been if you think it's
             | a solved problem.
             | 
             | Production translation is a non-trivial output. The entire
             | industry hasn't released an LLM solution yet for
             | translation (excepting the rewording mini features). What
             | makes you think you know more than those on the ground? Or
             | have you developed something that's still in stealth?
             | 
             | EDIT: Oh wow, all of your 107 submissions to HN from the 6
             | months your account has existed, have been about AI and
             | LLMs. I guess I got the hype part right. As for industry
             | knowledge the jury is still out, but this could well be the
             | classic HN "I understand tech so obviously I understand
             | everything" play. Keep us posted!
        
               | naniwaduni wrote:
               | > It may present more fluent text, but if it doesn't know
               | it's strayed from the source text and you can't tell
               | either (because you don't understand source language)
               | then you'll end up with error-laden pseudo translations.
               | At least with NMT you know the errors are consistent.
               | 
               | To be fair, this is an infamous failure mode of neural
               | mtl too, and a big part of what makes the discourse
               | around GPT so ... evocative of the discourse in 2017.
        
           | [deleted]
        
           | benbreen wrote:
           | I've tried using ChatGPT to translate Latin texts from the
           | Renaissance. I know enough Latin (intermediate, but can
           | figure it out with a dictionary) to check it, and it was
           | very, very impressive. What blew me away was not just the
           | fluency of the translation but the fact that I could drop in
           | highly imperfect OCR'd text from Google Books, and it didn't
           | have any trouble making sense of garbled passages. This
           | ability makes it a really distinct advance on Google
           | Translate and the like, at least for my purposes.
           | 
           | Also, another underrated feature: I asked it to summarize
           | each page in a single sentence, while also picking out the
           | passages most relevant to my research question. It did a
           | great job.
        
             | luxpir wrote:
             | That is a truly exceptional use case, and I'm impressed
             | too.
             | 
             | I should have specified above that I'm referring to the
             | practicalities of professional translation workflows as
             | they currently exist for things like high volume flows in
             | dozens of formats, and translation memory leveraging etc.
        
         | jug wrote:
         | Yeah, GPT-4 is so good Iceland is using it for language
         | preservation of Icelandic as it translates text so well,
         | letting them enrich the language with new works.
         | 
         | Like you, I've definitely often though "Come ON!" about the
         | increasingly archaic Google Translate in light of DeepL etc. It
         | has really stagnated over the years.
        
       | edgarvaldes wrote:
       | Google Translate and Google Search have a preference for
       | acronyms.
       | 
       | I translate a lot of subtitles using GT, and every time a
       | characters asks "Who?" GT gives me the version for "World Health
       | Organization?" If a character is named "Mia", GT gives the
       | hilarious "Missing in action", etc.
       | 
       | Still, the combo between WhisperX, Google Translate and Subtitle
       | Edit are the Holy Grail I dream of just a year ago.
        
       ___________________________________________________________________
       (page generated 2023-09-14 23:01 UTC)