[HN Gopher] Show HN: Localization and translations should be cod...
___________________________________________________________________
Show HN: Localization and translations should be code, not data
Author : LeviticusMB
Score : 12 points
Date : 2022-07-05 20:15 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| verdverm wrote:
| The problem I see with this is that every language would need to
| replicate the code & logic.
|
| With data / config, the translations are recorded in one place
| and all consumers can get the update without code changes.
|
| The big thing I've been wondering / looking for is a shared, open
| source translation database. Anyone have links?
| capableweb wrote:
| > The big thing I've been wondering / looking for is a shared,
| open source translation database. Anyone have links?
|
| That's a neat idea. It'll be super useful for 80% of the cases,
| where context is that important. But for the rest of the 20%,
| context of where the translation will be used, is as important
| as the word itself. So you cannot always reuse the same
| translation in different contexts, as it'll sound unnatural
| then.
|
| Still, if there was a easy solution for being able to change
| between different options for the translation, having a shared
| open source translation database for projects to use, would be
| very valuable and useful.
| verdverm wrote:
| The (surmountable) problem is tree-shaking so you only
| include the translations you use
| capableweb wrote:
| If I can manage to store all the data from HN comments and
| submissions in 99 GB (31993925 "items", in a very naive
| way), we should be able to have a DB with most common
| translations for most web apps way below that, closer to
| 1GB, if some clever people do it :)
| lwouis wrote:
| Context-less translation can be done quite successfully these
| days with online services. You could simply make a few hundred
| calls to something like Google Translate and get good quality
| translations in multiple languages.
|
| This is built-in some of the top software translating platforms
| to "seed" the initial translation. A bulk kickstart that can
| optionally later be refined by human translators.
| antaviana wrote:
| As someone in the localization business, let me assure you
| that, with the current state of the art, using machine
| translation without any kind of human post-editing for UI is
| a terrible idea.
|
| That the UI is not in English does not mean that a non-
| English person will be able to understand it and use it
| successfully.
|
| You can only do it if you do not have any kind of support for
| those international users and if those users are not your
| real customers but merely statistics in the usage dashboard
| of a free product.
| samuelstros wrote:
| Since I am working on an open source localization solution
| (that makes localization of software effortless), having an
| open source "translation memory" database makes sense. I will
| keep this idea in my mind! :)
| msbarnett wrote:
| It's a neat idea but by intermixing code, presentation, and data
| you're going to run into a bunch of issues that the "traditional"
| approach avoids.
|
| For one thing, we get our translations by handing a yaml file to
| external contractors. They don't need to squint at a file full of
| code to distinguish the bits of english that need translating
| from the bits that don't - they just have to translate the right
| side of every key, and there's specialized tooling to help them
| with this.
|
| And for another, even in your toy example in the readme you've
| now lost a Single Source of Truth for certain presentation
| decisions. So now when some stakeholder comes to you and says
| they hate the italicization in the intro paragraph and to lose it
| ASAP, instead of taking the markup out of a common template that
| different data gets inserted into, you have to edit each
| language's version of the code to remove the markup (with all of
| the attendant ease of making errors that comes along when you
| lack a SPOT - easy to miss one language, etc). I'd expect these
| kinds of multiplication-of-edit problems to grow increasingly
| complex when you scale this approach beyond toy examples.
|
| Basically this seems really hard to scale to large products, and
| doesn't play well with division of labour.
| bananarchist wrote:
| > Single Source of Truth for certain presentation decisions.
|
| You can't have a single source of truth for presentation
| decisions in a multilingual product. Different languages have
| different typographic traditions, will demand different minimum
| container sizes based on word lengths and maybe this is
| shocking but they sometimes run in different directions. If you
| are not integrating the dev, design and localized copy editing
| roles on your team, your product is going to look like trash
| except where the primary language of the team is concerned.
|
| Translation can scale for large products, but localization
| cannot: until further notice, you can only do it the hard way,
| or the wrong way.
| msbarnett wrote:
| > You can't have a single source of truth for presentation
| decisions in a multilingual product. Different languages have
| different typographic traditions, will demand different
| minimum container sizes based on word lengths and maybe this
| is shocking but they sometimes run in different directions.
|
| Maybe this is shocking but I'm fluent in a language that is
| sometimes written veritcally.
|
| "You can't have one single common presentation for every
| translation" is true in an absolute sense but often not true
| in practice - eg) we hit most of Europe and North, Central,
| and South America with ~10 static translations rendered into
| one common presentational template, none of which run into
| any of the truly complex layout differences that right-to-
| left or vertical presentations would bring. We extensively QA
| all of the languages we _do_ support, and presentation issues
| are truly pretty damn rare. It 's your classic "80% of the
| result for 20% of the effort" tradeoff.
|
| Now, if you truly do need to localize in every language under
| the sun then yeah, something like this can make sense, as it
| gives you maximum flexibility wrt to varying your layout
| alongside the translation.
|
| But if you have _any_ simpler use-case (eg. supporting just
| English, Spanish, French and Portuguese will give you an
| enormous chunk of the planet with minimal overhead, as they
| have very similar word lengths and presentation requirements)
| then the approach here is just taking on all of the effort
| and maintenance overhead of the maximally-complex case when
| you have absolutely no need to.
| olodus wrote:
| "You tasked me with translating this scene, so since you gave me
| a general programming language I used a buffer overflow to break
| out into the animation engine and animate your characters to use
| sign language."
|
| Jokes aside I don't hate the idea and is actually quite positive
| to writing translation in code. I am a bit questioning of why you
| would need a new language for it though, why not use an existing
| programming language?
|
| As others pointed out here the biggest downside I can see is that
| it would be harder to outsource.
| [deleted]
| LeviticusMB wrote:
| Making localized web apps is such a pain and too often an
| afterthought. But what if it took almost no extra effort to make
| the app localized from the start?
|
| What if you could get static type checking, key documentation and
| code completion right in VS Code?
|
| And what if the translations could be generated using an actual
| programming language, and even represent HTML markup and not just
| plain strings?
| capableweb wrote:
| Sounds like a great idea for translators who are also
| programmers, or at least knows HTML (and syntax for logic,
| judging by your examples). But I haven't worked in any
| companies where the translators/the people doing localization
| have been programmers, they have just been translators. This
| will be more or less impossible for them to use efficiently, if
| at all.
| withinboredom wrote:
| One solution is to use your native language as the key. Bam,
| you have context in the code and when testing. No need for
| shenanigans (and this is how it was done until someone decided
| to popularize opaque keys in the last decade or so, in fact,
| most battled-hardened and old libraries expect it to be done
| that way). You can translate English to English (or whatever)
| if you want to be able to change the wording without having to
| retranslate everything... but then if you are changing the
| wording for the native language, don't you have to retranslate
| everything anyway?
| duskwuff wrote:
| > One solution is to use your native language as the key.
|
| That fails pretty badly in two cases:
|
| 1) If significant changes to the English (or whatever)
| version need to be made, keeping the original text may be
| more confusing than useful.
|
| 2) When the native-language version is ambiguous in a way
| that doesn't apply to other languages, e.g. when translating
| to languages with grammatical gender, or when a single
| English word can be used in multiple unrelated ways.
| layer8 wrote:
| ...then translators need to be programmers, or vice versa. That
| may not scale to many languages/large products.
|
| What would be useful is the ability to interactively see a
| systematic set of examples of what the templates one is editing
| evaluate to.
| azeirah wrote:
| The localization library I use supports most of this. Not all,
| it's not a general purpose programming language of course, but it
| supports variables and conditionals, which is basically enough to
| do almost anything.
|
| https://formatjs.io/docs/react-intl/api#message-syntax
| samuelstros wrote:
| Since months I am working on an open source localization solution
| that tackles both developer and translator facing problems.
| Treating translations as code completely leaves out translators,
| who in most cases can not code.
|
| I am working on making localization effortless via dev tools and
| a dedicated editor for translators. Both pillars have one common
| denominator: translations as data in source code. Treating
| translations as code would break that denominator and prevent a
| coherent end-to-end solution.
|
| Take a look at the repository https://github.com/inlang/inlang.
| The IDE extension already solves type safety, inline annotations,
| and (partially) extraction of hardcoded strings.
| rakshithbellare wrote:
| What would be process for handoff from translators to
| programmers?
| eternityforest wrote:
| I'm not quite sure I agree with the title. Having access to code
| when you need it is probably a good thing.
|
| But I think code is, in general, something to be avoided when
| declarative approaches are available.
|
| Declarative is easier for a computer to understand, it restricts
| the inputs to one domain the computer can deal with.
|
| You don't get the same classes of bugs with declarative. You
| could even do things like double checking with machine
| translation and flagging anything that doesn't match for human
| review.
|
| Plus, you don't need a programmer to do it. Security issues go
| away. You often achieve very good reuse with code only existing
| in one place without language variants.
|
| I'm sure there are great uses for this, but I have trouble
| thinking of even a single case where I'd prefer code to data in
| general.
___________________________________________________________________
(page generated 2022-07-05 23:01 UTC)