[HN Gopher] Playing Around with Machine Translation
___________________________________________________________________
Playing Around with Machine Translation
Author : Thevet
Score : 20 points
Date : 2023-09-13 20:09 UTC (1 days ago)
(HTM) web link (davidabell.substack.com)
(TXT) w3m dump (davidabell.substack.com)
| ZeroGravitas wrote:
| I wonder if anyone at Project Gutenberg or similar is looking to
| autogenerate translations of out of copyright classics.
|
| Ice seen a few people here recommend against some copyright free
| translations because a more recent translation is better.
|
| Possibly the AI tools aren't yet as good as the best human
| translators, but are they already better than what's available
| copyright free?
| OkayPhysicist wrote:
| Fundamentally, translation is a harder problem than it gets
| credit for. At the dictionary level, you're mostly alright, as
| words and concepts tend to have pretty close direct
| translations (though you'll often need context, as homynyms
| exist). A step above that, forming full statements, you run
| into some difficulty, due to the difficulty in translating the
| subtleties of word choice, as you've got a fair amount of
| cultural context to take into account. "Forgive me Father for I
| have sinned" does not carry the connotations as "Sorry, Daddy,
| I've been naughty". But a step up from there, translating
| entire texts, you need a pretty complex theory of the mind, as
| you need to take into account the context of the author's
| perspective, their interpretation of their intended audience's
| perspective, and your audience's perspective.
|
| For example, my "Forgive/Sorry" joke earlier relies on the
| context of both you and I being somewhat aware of A) the
| Catholic church's process of confessional and B) the relatively
| modern use of "Daddy" and "naughty" with sexual connotations,
| which, forget language specific, is culturally specific, where
| the joke would break if you tried telling it to, say, a place
| that still used "Daddy" as perfectly normal way to address
| one's father, or lacked the cultural norm of sexualizing
| authority.
|
| If someone was trying to translate this comment to another
| language, they might have to completely alter that joke in
| order for it to make any sense, at which point that entire last
| paragraph would have to change, etc. Modern AI tools have
| largely reached a point where the homonym problem isn't
| crippling them anymore, but haven't really reached much beyond
| that.
| l0new0lf-G wrote:
| Don't forget that English and French have much in common
| lexilogically and grammatically, and even some slang must be
| similar because of geographical proximity and cultural exchange.
|
| I am nearly certain that no machine will ever be able to
| accurately translate between languages with significant
| linguistic distance (e.g. Japanese and Swedish).
|
| I experience this first hand, whenever I translate from my native
| Greek to English, especially if there is slang involved. Whenever
| Google translate encounters long phrases in my texts, the result
| is comical, not to mention that the emotions are not properly
| conveyed.
|
| I can only begin to imagine the inaccuracies in translations from
| Mandarin.
|
| Nevertheless, I never expected even fairly accurate translations
| between even related languages such as French and English. It
| indeed sends chills down the spine. It feels like there is some
| form of actual intelligence involved.
| acomjean wrote:
| in the late 90s my Mom (bilingual, having migrated to the US)
| did some work on the side for a translation agency (She worked
| doing internationalization for DataGeneral, parametrics and
| others) . She would evaluate translations for prospective
| translators. She got a batch that was terrible. Turns out they
| were machine translated. We have come a long way.
| tralarpa wrote:
| > I am nearly certain that no machine will ever be able to
| accurately translate between languages with significant
| linguistic distance (e.g. Japanese and Swedish).
|
| Is that really the reason? Or rather the fact that there is
| much less training data available?
| naniwaduni wrote:
| Languages even moderately distant tend to strain the concept
| of an accurate translation in the first place for any
| nontrivial utterance.
| bugglebeetle wrote:
| Eh, I speak both English and Japanese and I would say that
| what counts as accurate translation is what is most
| proximate to that threshold of fundamental dissimilarity.
| Measuring translation accuracy for all languages the same
| way is more the problem here. There is no such thing as 1-1
| to translation. It's more like 1-(1+n) where n accounts for
| said distance. For languages with shared origins, n can be
| fairly small, while for those with entirely separate ones,
| it can be quite large.
|
| That being said, Japanese to English translation in things
| like popular culture tends to take far too many liberties,
| I expect because the culture around Japanese translation in
| America has a very annoying, Orientalist bent, with people
| getting off on their "expertise" about a fake exoticism.
| naniwaduni wrote:
| The fact that you have to have this conversation strongly
| implies that the notion of an accurate translation is, no
| surprise, already heavily strained; you've simply chosen
| to aim for/accept "most accurate possible translation" as
| the best you can do, while punting on choice of distance
| metric and its scale.
| og_kalu wrote:
| It's weirdly flown under the radar but GPT style models as
| translators are a lot better than state of the art Machine
| Translators (Deepl, NLLB, Google etc).
|
| Like above, you can already see the difference with close
| language pairs where Google etc are already very good.
|
| For pairs like English and Japanese etc, google et al will
| happily devolve into half gibberish so the difference is even
| more stark.
|
| I did a number of examples a couple months back with
| English/Chinese before 4 was released. Even then you can see it's
| a lot better and 4 is as usual a lot better than 3.5.
|
| https://github.com/ogkalu2/Human-parity-on-machine-translati...
| luxpir wrote:
| It's not under the radar because that's sadly not the case,
| although it looks like it should work better on the surface.
| Neural machine translation is just more consistent, doesn't
| hallucinate and can be easily and cheaply trained over time.
|
| They have some benefits, but as a lot of LLM research has
| found, they are not production ready. Yet.
| og_kalu wrote:
| >It's not under the radar because that's sadly not the case
|
| It is the case. Haven't seen anyone who uses both who thinks
| otherwise.
|
| also https://arxiv.org/abs/2301.13294. This benchmarks 3.5
| which is quite a bit worse than 4 as it is against Google,
| NLLB, Deepl
|
| here https://arxiv.org/pdf/2304.02210.pdf, GPT wins
| overwhelmingly with human evaluations. Seems like the
| typically evaluation models aren't really cutting it anymore
| especially BLEU
|
| >Neural machine translation is just more consistent, doesn't
| hallucinate
|
| It's not more consistent. The 2nd bit is just wrong lol.
|
| One of the biggest complaints of Deepl is the tendency to
| make stuff up to make translations seem more natural.
|
| Summarization and Translation are the tasks GPT models
| hallucinate the least.
| cj wrote:
| I wrote a much longer reply but looks like you deleted the
| downvoted comment and reposted it.
|
| > It's not more consistent.
|
| I think the parent was probably saying for a given input,
| Google MT provides the same output.
|
| What is the value of temperature/variability in a LLM
| powered MT model?
|
| I'd assume given the same inputs, you should only be given
| the best output.
| og_kalu wrote:
| >I think the parent was probably saying for a given
| input, Google MT provides the same output.
|
| I don't care about being given the exact same output
| (you're not getting deterministic translations from
| people either). I care about quality translations.
| Variability for GPT style translations is much more about
| word choice and style than wrong or extremely different
| translations. and if i really wanted to, i could guide
| both (word choice, style) either with instructions or
| examples.
| naniwaduni wrote:
| Good news! Per your second link, GPT-4 is a stunning
| improvement up to "borderline passes quality control"!
| yorwba wrote:
| In my experience, ChatGPT tends to produce more fluent
| output, but is less likely to closely follow the input. For
| some high-resource language pairs, complete mistranslations
| are rare, but for other languages, not so much. Of the ones
| I can evaluate, Burmese is particularly error-prone:
|
| ChatGPT translates ng[?]'kiu duttiy akh[?]ng'[?]re:
| ttc[?]khu pe:khai'tty[?] // as "I have received a second
| warning.", which is incorrect. akh[?]ng'[?] https://en.wikt
| ionary.org/wiki/%E1%80%A1%E1%80%81%E1%80%BD%E... does not
| mean "warning", even though that is a likely completion of
| "I have received a second " in English.
|
| Google Translate gives me "Gave me a second chance.", which
| closely matches the Burmese sentence down to dropping the
| subject (common in Burmese, rare in English) which makes
| the translation sound weird.
|
| So any claim that ChatGPT is better/worse at translating
| really needs to specify the languages involved and what
| your goal for the translation is. (E.g. the benchmark paper
| you link seems to focus on the ability to steer the
| translation by providing additional context.)
| og_kalu wrote:
| I'm not making a claim for chatGPT so much as I am making
| a claim for GPT style models.
|
| It's not really a question of high resource vs low
| resource languages so much as what languages ended up in
| the training corpus.
|
| 1. There's a lot of transfer learning going on with
| predict the next token LLMs. A model trained on 500B
| tokens of English and 50b tokens of French will speak
| French far better than if it was trained on only 50b
| tokens of French.
|
| 2. You don't need parallel corpora for every single pair
| you want to translate to. This means that GPT LLMs only
| need single text data for the vast majority of languages.
| Training most NMT models you would need Burmese/English
| parallel data.
|
| Both of the above combine to mean that not only is
| quality demonstrably better, amount of data needed is
| lower too.
|
| GPT's burmese isn't worse because it's low resource. It's
| because open ai made no specific attempt to include
| burmese text.
|
| They're not even trying. GPT-3 was 93% English with the
| 2nd biggest holder less than 2%
| FLSurfer wrote:
| I used GPT-4 and this was the result:
|
| Please translate this:
|
| ng[?]'kiu duttiy akh[?]ng'[?]re: ttc[?]khu pe:khai'tty[?]
| //
|
| The sentence "ng[?]'kiu duttiy akh[?]ng'[?]re: ttc[?]khu
| pe:khai'tty[?] // " translates to "They gave me a second
| chance." in English.
| luxpir wrote:
| Your reply is not passing my sniff test. Your hype and bias
| are showing.
|
| It may present more fluent text, but if it doesn't know
| it's strayed from the source text and you can't tell either
| (because you don't understand source language) then you'll
| end up with error-laden pseudo translations. At least with
| NMT you know the errors are consistent.
|
| I don't know who you know who thinks GPT is ahead, but
| nobody in the very well funded translation industry has a
| GPT powered translation engine for the key reason that it's
| not ready for production. For a human to post edit MT,
| we're mainly talking fixing broken vocab. You'd never
| present raw MT to a client. It needs editing. Heavily. To
| think LLM translation doesn't need editing is either coming
| from someone not in the industry, or blinded by hype. And
| the kind of editing issues are more insidious, like those
| found in voice dictated texts. Homophones aren't flagged by
| QA software because they are real words. Just like LLMs
| make real sentences, except when they don't, but good luck
| detecting that, and editing out the additional meaning it
| has decided to inject.
|
| Have you tried to run the GPT-4 API on a segmented xliff at
| all? If the segmentation is bad, and full of tags, GPT4
| breaks completely. It tries to close sentences that run
| across segments, it can't handle tags in-line (the chatgpt
| interface can, but you can't use that at scale).
|
| It can do some impressive work, don't get me wrong, but I'm
| not sure how hands-on you've really been if you think it's
| a solved problem.
|
| Production translation is a non-trivial output. The entire
| industry hasn't released an LLM solution yet for
| translation (excepting the rewording mini features). What
| makes you think you know more than those on the ground? Or
| have you developed something that's still in stealth?
|
| EDIT: Oh wow, all of your 107 submissions to HN from the 6
| months your account has existed, have been about AI and
| LLMs. I guess I got the hype part right. As for industry
| knowledge the jury is still out, but this could well be the
| classic HN "I understand tech so obviously I understand
| everything" play. Keep us posted!
| naniwaduni wrote:
| > It may present more fluent text, but if it doesn't know
| it's strayed from the source text and you can't tell
| either (because you don't understand source language)
| then you'll end up with error-laden pseudo translations.
| At least with NMT you know the errors are consistent.
|
| To be fair, this is an infamous failure mode of neural
| mtl too, and a big part of what makes the discourse
| around GPT so ... evocative of the discourse in 2017.
| [deleted]
| benbreen wrote:
| I've tried using ChatGPT to translate Latin texts from the
| Renaissance. I know enough Latin (intermediate, but can
| figure it out with a dictionary) to check it, and it was
| very, very impressive. What blew me away was not just the
| fluency of the translation but the fact that I could drop in
| highly imperfect OCR'd text from Google Books, and it didn't
| have any trouble making sense of garbled passages. This
| ability makes it a really distinct advance on Google
| Translate and the like, at least for my purposes.
|
| Also, another underrated feature: I asked it to summarize
| each page in a single sentence, while also picking out the
| passages most relevant to my research question. It did a
| great job.
| luxpir wrote:
| That is a truly exceptional use case, and I'm impressed
| too.
|
| I should have specified above that I'm referring to the
| practicalities of professional translation workflows as
| they currently exist for things like high volume flows in
| dozens of formats, and translation memory leveraging etc.
| jug wrote:
| Yeah, GPT-4 is so good Iceland is using it for language
| preservation of Icelandic as it translates text so well,
| letting them enrich the language with new works.
|
| Like you, I've definitely often though "Come ON!" about the
| increasingly archaic Google Translate in light of DeepL etc. It
| has really stagnated over the years.
| edgarvaldes wrote:
| Google Translate and Google Search have a preference for
| acronyms.
|
| I translate a lot of subtitles using GT, and every time a
| characters asks "Who?" GT gives me the version for "World Health
| Organization?" If a character is named "Mia", GT gives the
| hilarious "Missing in action", etc.
|
| Still, the combo between WhisperX, Google Translate and Subtitle
| Edit are the Holy Grail I dream of just a year ago.
___________________________________________________________________
(page generated 2023-09-14 23:01 UTC)