[HN Gopher] Mozilla releases local machine translation tools as ...
___________________________________________________________________
Mozilla releases local machine translation tools as part of Project
Bergamot
Author : Vinnl
Score : 355 points
Date : 2022-06-02 16:23 UTC (6 hours ago)
(HTM) web link (blog.mozilla.org)
(TXT) w3m dump (blog.mozilla.org)
| boberoni wrote:
| For i18n in my own projects, I typically use tools like gettext
| and involves lots of volunteers to do the translations. I might
| try out these neural machine translation tools to see how they
| fare. I also wonder if these machine translation tools are
| trained on a corpus of gettext datasets.
| bogwog wrote:
| This is awesome, but...
|
| > This set of requirements posed a number of technological
| challenges to the team: the translation engine was entirely
| written in programming languages that compile to native code. We
| needed a way to streamline the distribution of the project in
| order to avoid the overhead involved in providing builds
| compatible with all platforms supported by Firefox -- that would
| be impracticable to scale and maintain.
|
| Does Firefox really support so many different platforms and archs
| that CI builds are unrealistic?
| jelmervdl wrote:
| The upside of using WASM is that the extension itself can be
| easily ported to other browsers and platforms. The UI uses
| Firefox specific APIs but the parts that take the HTML from a
| page and push it through the translation engine would also work
| in any Chrome-based browser.
|
| (Edit: also free sandboxing of a blob of C++ code that needs to
| handle arbitrary input from the web!)
| dblohm7 wrote:
| (Former Mozilla employee, here)
|
| I'm completely speculating, but it's probably a matter of not
| wanting to complicate iterating on the translation engine by
| introducing a bunch of cruft from the Firefox build system
| (which, though it uses GNU make under the hood, is very much
| bespoke and complicated).
|
| Since the translation engine is intended to run on a product
| that hosts WASM, they might as well just build to that.
| whinvik wrote:
| Can we use this on mobile?
| jeroenhd wrote:
| The demo site works on mobile if you let it load the necessary
| content so if you're speaking from a web dev point of view:
| definitely.
|
| As for the addon, on Android you'll need to install an unstable
| version of Firefox and configures a custom addon list in an
| addons.mozilla.org account that includes it so you can download
| it.
|
| On iOS there isn't any option to download addons as far as I'm
| aware. On mobile Linux environments everything should work like
| on desktop.
| djvdq wrote:
| You can't download any addon for Firefox on iOS because it's
| almost Safari, only looking a bit different. All browsers on
| iOS has to use WebKit so FF is not really FF here on iOS.
| jelmervdl wrote:
| I think the Firefox extension might not work on mobile
| because it hooks into some undocumented addon apis to draw
| that translation bar UI. Those might not be available on
| mobile.
|
| The translation code itself should work on mobile. It's just
| some javascript & wasm (albeit with SIMD instructions not
| implemented in Safari's WASM vm...)
| Vinnl wrote:
| I just installed the extension on Fenix Nightly and indeed,
| it does not work.
| option wrote:
| What's wrong with using cloud without sending any user id with
| the request?
| vanilla_nut wrote:
| If local translation can help me use a website without a query
| to some cloud server... who needs the cloud? No backend that
| will experience downtime, and someday be decommissioned. No
| money sink of cloud processing pressuring the product to
| advertise or monetise in unscrupulous ways.
|
| I'm sure cloud processing is better in many ways. But if this
| is "good enough" I'd rather just do it all locally.
| [deleted]
| chungy wrote:
| There's likely far more identifiable information in the actual
| text than a user ID provides.
| no_time wrote:
| In the current status quo you either make use of an api by
| indentifying yourself with a key or a browser session that is
| fingerprintable in a gazillion ways. There is no such thing as
| "not sending user ID" or if there is, it has a totally
| negligible reach.
| drewzero1 wrote:
| Cloud assumes a constant, reliable internet connection, which
| is not the reality in most of the world. (Nor is it always
| desirable.)
| jffry wrote:
| If the data never leaves your device, then a third-party
| service never gets the opportunity to leak or misuse it. This
| is far more private.
|
| How many stories have you heard about breaches due to
| accidentally mis-configured logging in web services? Also in
| the news lately was Twitter misusing 2fa phone numbers for
| advertising purposes.
| toper-centage wrote:
| What if what I'm trying to translate is sensitive information
| in itself?
| kevin_thibedeau wrote:
| "The telescreen received and transmitted simultaneously. Any
| sound that Winston made, above the level of a very low whisper,
| would be picked up by it; moreover, so long as he remained
| within the field of vision which the metal plaque commanded, he
| could be seen as well as heard. There was of course no way of
| knowing whether you were being watched at any given moment."
| 0des wrote:
| "How often, or on what system, the Thought Police plugged in
| on any individual wire was guesswork. It was even conceivable
| that they watched everybody all the time."
| no_time wrote:
| This is incredible and super important. For all the blunders of
| Mozilla in the last decade, they still have some great projects.
| I am also grateful of them not scrapping common voice.
| _trampeltier wrote:
| Also important, because now it seems at least in germany, on
| Google translate, there is the translate website button
| missing. From Switzerland I saw the button lately when I tryed.
| I don't know if it is because go to cencored (russian) sites.
| My company blocks google translate anyway, probably because of
| the same reason.
| no_time wrote:
| Try copying the url into the translator's text field. It's
| how I've been using it for years.
| croes wrote:
| Are you sure?
|
| Under Google Ubersetzer I see three button: Text, Dokumente,
| Websites
| [deleted]
| coding123 wrote:
| In the long run, I am a super huge fan of Mozilla and Firefox.
| I am using it right now. After a 10 year stint of using Chrome
| exclusively I now use Firefox as my main driver. Unfortunately
| I still need to keep chrome around for weird situations where
| the website developer only tested in Chrome (Yes this still
| exists. A shopping cart in a popular website - cough Home Depot
| cough cough - that recently failed me in Firefox worked in
| Chrome. I haven't tried in a couple weeks hopefully that is
| fixed.)
| Shadonototra wrote:
| it's not their project, all they did was to write a form in JS
|
| the whole project is a EU funded one, all done in the
| university of Edinburgh
|
| https://cordis.europa.eu/project/id/825303
|
| you giving full credit to Mozilla is dishonest, to say the
| least
|
| it aligns to their past projects, including using Mullvad and
| slapping a Mozilla sticker on top of it to claim it as their
| own
|
| also it is super funny to read this:
|
| > H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in
| enabling and industrial technologies - Information and
| Communication Technologies (ICT) MAIN PROGRAMME
|
| little do they know, EU never learn
| stuartd wrote:
| All they did was 'write a form in JS'???
|
| > Our solution to that was to develop a high-level API around
| the machine translation engine, port it to WebAssembly, and
| optimize the operations for matrix multiplication to run
| efficiently on CPUs.
| nyanpasu64 wrote:
| It's unfortunate that it doesn't translate Japanese, and reading
| Japanese-only resources is a common hurdle in the retro game
| modding/development community.
| simonmales wrote:
| Great that you contribute your own language pairs.
| obert wrote:
| The sooner we move AI to 127.0.0.1 the better, enough with The
| Cloud powerhouses.
|
| Yes there's work to be done, resilience, power efficiency,
| responsiveness, but it's the right direction for everything that
| involves private computing.
| Vinnl wrote:
| I've been using the extension [1] for a bit and, while it doesn't
| support too many languages, for the ones it does it's pretty cool
| to have it all running locally.
|
| [1] https://addons.mozilla.org/firefox/addon/firefox-
| translation...
| baobob wrote:
| Do you know what the pipeline looks like for new language pairs
| being added? This is really, really, really awesome
|
| I'm also immediately curious about using it headless outside
| the browser
| jelmervdl wrote:
| The training pipeline is also on Github! [1]
|
| I was experimenting with running the wasm version of
| bergamot-translator (the translation engine used by the
| addon) in node [2].
|
| However, if you want more performance, using the Python
| library [3] or the native C++ interface [4] gets you further
| because the wasm build is limited to a single thread and thus
| a blocking interface, and can't use all the processor
| specific optimisations that are in the native builds.
|
| EDIT: Another option is using translateLocally [5], which is
| a Qt desktop app based on bergamot-translator. It has a
| native messaging API that is designed as a much faster
| alternative to the wasm build for browser extensions, but it
| can also be used from Python [6].
|
| [1] https://github.com/mozilla/firefox-translations-training
|
| [2] https://gist.github.com/jelmervdl/a4c8b6b92ad88a885e1cbd5
| 1c6...
|
| [3] https://colab.research.google.com/drive/1AHpgewVJBFaupwAb
| Zq0...
|
| [4] https://github.com/browsermt/bergamot-
| translator/blob/main/a...
|
| [5] https://github.com/XapaJIaMnu/translateLocally
|
| [6] https://github.com/XapaJIaMnu/translateLocally/blob/maste
| r/s...
| clairity wrote:
| neat, but it looks like it was just released, so how were you
| using it before?
|
| as an aside, pretty sad to see the project page,
| https://browser.mt/ , requiring not just javascript but
| specifically google connections to work. to 5 different google
| properties, no less.
| Vinnl wrote:
| I work at Mozilla, so got a sneak preview (and also the first
| bugs) :)
|
| (Of course technically the work was out there in the open
| already, since it's Mozilla.)
|
| Agreed about the Bergamot website. I suspect it's not by
| Mozilla, but I'll see if I can ask someone to take a look, as
| I don't think all those connections should be necessary.
| edko wrote:
| Do you know if this will be open-sourced, or if the repo is
| already available?
| space_fountain wrote:
| I think this is probably the source
| https://github.com/mozilla/firefox-translations
|
| edit: and for the actual translations
| https://github.com/mozilla/bergamot-translator
| clairity wrote:
| awesome, thanks! i also suspect it's by the EU coalition
| behind bergamot, so probably beyond mozilla's jurisdiction,
| but it doesn't hurt to ask.
| 0des wrote:
| .mt huh thats a new one for me
| lovelearning wrote:
| Stuck at "Loading translation engine..." from a long time. Tried
| German and Spanish. Can't tell if it's downloading some model
| data or something's failed. I suggest some kind of progress
| indicator.
| ainar-g wrote:
| Weird. I had a numeric progress indicator, and the model got
| downloaded in just a couple of seconds.
| Vinnl wrote:
| You might want to report that here, if it's not reported
| already: https://github.com/mozilla/firefox-translations/issues
| jeroenhd wrote:
| Looks lovely! Offline translations are very welcome in a world
| where the most important translation engines are also run by the
| world's biggest data hoarders.
|
| Sadly, the extension either doesn't work on mobile or Mozilla
| couldn't be bothered to add it to the whitelist.
| andrenatal1 wrote:
| Hi, I am part of the team who developed this and the author of
| the article. You can ask me anything about it if you have
| questions.
| msdrigg wrote:
| Chinese language is what I most commonly want to translate. Is
| there any planned support for this?
| unicornporn wrote:
| Google Translate code is present on many web sites to provide
| automatic translations of text. Could your translate code be
| uploaded to a server and embedded in web page to provide the
| same functionality?
| jelmervdl wrote:
| I'm not aware of any actively maintained projects that give
| you this out of the box, but these two could be starting
| points for such a project.
|
| Mozilla implemented a REST service based on (an earlier
| version of) bergamot-translator [1]. You could use that as a
| replacement for the WASM component in the addon's code.
|
| I also know of some full-page translation demo code that uses
| the python bindings of bergamot-translator [2]. That's
| basically a web proxy a la Goole Translate.
|
| Lastly, marian, the translation software that's being used,
| has a web server as well [3]. It does not support HTML
| though.
|
| EDIT: see also my earlier comment for using it with Node or
| Python [4], which you could use to implement a simple web
| API.
|
| [1] https://github.com/mozilla/translation-service
|
| [2] https://github.com/jerinphilip/tagtransfer
|
| [3] https://marian-nmt.github.io/docs/#web-server
|
| [4] https://news.ycombinator.com/item?id=31599231
| dnc wrote:
| Hi,
|
| This is an awesome project, congratulations!
|
| Could you share details about the machine translation engine
| that is used (or where to find out more about it)? Are there
| any plans to open source the extension code (with the
| WebAssembly optmizations that are mentioned in the article)?
|
| Thanks.
| jphilip wrote:
| A fork of marian-dev[1] is the underlying machine-translation
| engine:
|
| - https://github.com/browsermt/marian-dev
|
| Development of higher-level code wrapping around marian-dev
| make suitable for the browser-extension happens at:
|
| - https://github.com/browsermt/bergamot-translator
|
| Some of the WebAssembly optimizations are available in
| bergamot-translator/marian-dev. Rest are in Firefox source-
| code. A start point could be
| https://bugzilla.mozilla.org/show_bug.cgi?id=1720747.
|
| Extension code is open-source, and linked already in other
| comments: - https://github.com/mozilla/firefox-translations
|
| [1] https://github.com/marian-nmt/marian-dev
| baobob wrote:
| At least the code parts seem to be on GitHub:
| https://github.com/browsermt
| jarrell_mark wrote:
| really great stuff! any plans for this on firefox mobile?
| ainar-g wrote:
| Thanks for the extension!
|
| Are you planning on adding a "select some text - right click -
| Translate in a tooltip" feature? It'd be extremely useful for
| language learners.
| HellsMaddy wrote:
| +1. This was the first thing I tried to do and was surprised
| this feature doesn't exist. Most often, I don't encounter
| entire webpages in foreign languages, but rather small
| snippets of text.
|
| It seems there is an open issue for this:
| https://github.com/mozilla/firefox-translations/issues/358
| maxloh wrote:
| What is the dataset used for training the model? Where did the
| data come from?
| jelmervdl wrote:
| All of them are freely available. Most of them through mtdata
| [1]. The exact list of the datasets is in the firefox-
| translations-training pipeline configuration file [2].
|
| [1] https://pypi.org/project/mtdata/
|
| [2] https://github.com/mozilla/firefox-translations-
| training/blo...
| coder543 wrote:
| Is this open source? I don't see a github link anywhere, and
| I'm not sure if the models are freely usable.
|
| EDIT: maybe this is it: https://github.com/mozilla/firefox-
| translations-models
|
| also some info here: https://github.com/mozilla/firefox-
| translations-training
| jelmervdl wrote:
| Extension Github page: https://github.com/mozilla/firefox-
| translations
| ashkhn wrote:
| Hi! This is an amazing project and will be really useful! Thank
| you! I understand that the project is funded by EU so the focus
| is on European languages but are there any plans to add CJK or
| other languages ?
| cf wrote:
| What can we do as users or contributers to help improve the
| accuracy of this extension? It's already amazing and would love
| to see it get even better.
| schroeding wrote:
| It passes the "Turkey" <=> "turkey" test: "In _Turkey_ they
| sometimes eat _turkey_. " => "In der _Turkei_ essen sie manchmal
| _Truthahn_. " :D
|
| Super cool! Real-time translation, in the browser, running
| locally! And sure, not state of the art / on the level of deepl,
| but on the level of Google Translate, 2015ish, maybe? Amazing!
| mahmutc wrote:
| You should find another test case :)
| https://www.aljazeera.com/news/2022/6/2/un-registers-turkiye...
| riedel wrote:
| They actually put an umlaut into the official name to really
| make sure it won't be used correctly internationally?
| mahmutc wrote:
| I was thinking about ISO-3166 part, but it seems standard
| contains already some names with special character. i.e,
| Reunion. https://en.m.wikipedia.org/wiki/List_of_ISO_3166_c
| ountry_cod...
| BiteCode_dev wrote:
| Just reformulate:
|
| "Turkiye quit being called Turkey cold turkey"
| Erlangen wrote:
| However, "Turkey is not a common food in Turkey." != "Die
| Turkei ist kein gemeinsames Essen in der Turkei."
| mathstuf wrote:
| Well, that's technically ambiguous in English too. I don't
| think many people are eating their own country ;) .
| refulgentis wrote:
| Hmm, is it ambiguous then? Seems there's only one
| interpretation
| Lukas_Skywalker wrote:
| Also, ,,common" should be translated to ,,ublich" instead of
| ,,gemeinsam". ,,Gemeinsam" is more like ,,collective" as in
| ,,a collective effort".
| tralarpa wrote:
| As usual, deepl doesn't disappoint.
| [deleted]
| jordemort wrote:
| I love it. I wish it could translate Chinese to English.
| collsni wrote:
| Wow awesome!
| simlevesque wrote:
| I wonder why French is absent.
|
| Meanwhile they have Persian which is not even in the EU.
| cassepipe wrote:
| Should the EU languages get preferential treatment ?
| Mizza wrote:
| "This project has received funding from the European Union's
| Horizon 2020 research and innovation programme under grant
| agreement No 825303 ."
| geraltofrivia wrote:
| I wouldn't comment on the absence of French vis-a-vis other
| languages. It's just slightly surprising because English <->
| French is honestly a very widely studied translation sub-task,
| with an enormous amount of parallel corpora available for
| training these models.
| simlevesque wrote:
| Well, there's this and the fact that it is the second most
| popular language in the european union, which sponsor the
| project.
| coffeeblack wrote:
| tclancy wrote:
| How do you think translations work, exactly?
| mikevm wrote:
| That's funny. I've just tried to translate "fuck you" to Russian
| and I got "trakhat' tebia" while Google Translate gives the more
| accurate "poshel na khui".
| ainar-g wrote:
| In my experience, DeepL is still the undefeated leader when it
| comes to translating Russian obscenity, heh.
| spitfire wrote:
| Try "Russian warship, go fuck yourself!" instead. It should
| work better.
| numpad0 wrote:
| Machine translations are accurate as a trebuchet past 300
| yards, just a better than nothings. But they're great tool so
| long user is aware.
| guerrilla wrote:
| What I need from this is to be able to select text and just have
| it translated in a tooltip (or whatever.) This is what I'm using
| the Simple Translate Firefox add-on for but unfortunately it
| sends data to Google.
| filoleg wrote:
| It would be nice to have something like that for desktop, but
| on mobile, iOS handles it amazingly.
|
| You can select text almost anywhere (from browser to even from
| a screenshot/image; literally anywhere you are able to select
| text), and in a tooltip above the word, one of the few options
| is translate. I love the UX of it, as it is super intuitive and
| unobtrusive, and works pretty much instantaneously It runs
| fully locally, no connection required. Slides a native OS pane
| over the page to show possible translations along with
| pronunciations and other extra info.
|
| Sidenote: other features in that tooltip are pretty nifty too.
| Aside from the obvious copy/cut/paste/share, i found "look up"
| to be quite useful when i see a word I've not encountered
| before. It pulls another native OS pane that shows dictionary
| definitions and extra info like the wikipedia link. And the
| actual dictionary definitions are local too afaik.
| jeroenhd wrote:
| Android had the same feature, assuming apps don't disable the
| tooltip. Selecting text on my phone brings a nice context
| menu for cut/copy/paste/search/translate/encrypt (that last
| one was added by OpenKeychain, a PGP app).
|
| It doesn't come with a dictionary built in, but the search
| button becomes an online dictionary in a pinch. Any
| dictionary app could extend the menu to add a local
| dictionary, of course.
| rahimnathwani wrote:
| Android has this too. BUT:
|
| When my phone is on portrait mode (almost always), I don't
| see the translate option until I tap on the three dots.
|
| The translation isn't instant. It takes a second to show up,
| and then takes up the top of the screen.
|
| I'd much prefer a UI similar to the Zhongwen Chrome
| extension.
| guerrilla wrote:
| > It takes a second to show up, and then takes up the top
| of the screen.
|
| I think this could have to do with it not being local...
| guerrilla wrote:
| Yeah, Android has the same.
| baobob wrote:
| Awesome, tested the German model on dw.com, surprisingly fast and
| accurate.
___________________________________________________________________
(page generated 2022-06-02 23:00 UTC)