[HN Gopher] Using aligned word vectors for instant translations ...
___________________________________________________________________
Using aligned word vectors for instant translations with Python and
Rust
Author : beau
Score : 54 points
Date : 2021-06-10 20:22 UTC (2 hours ago)
(HTM) web link (instantdomainsearch.com)
(TXT) w3m dump (instantdomainsearch.com)
| PaulHoule wrote:
| Nice example.
|
| The short text and that fact that your application would tolerate
| or celebrate catchy neologisms plays to fasttext's strengths.
| beau wrote:
| Thank you!
| beau wrote:
| We've released the underlying Rust implementation here:
| https://github.com/InstantDomain/instant-distance with Python
| bindings at https://pypi.org/project/instant-distance -- feedback
| welcome!
| Fiahil wrote:
| I've not much to say on the actual lib, it seems great!
| However, don't feel compelled to put all your rust code into a
| single lib.rs. You can split your work into several files and
| use 'pub use' and 'mod' in lib.rs to re-export your functions &
| types into a public API of your choosing.
|
| cargo check and format time might also slightly improve!
| [deleted]
| denysvitali wrote:
| > For example, here are the results of translating the English
| word "hello":
|
| > Language: fr, Translation: bonjours
|
| > Language: fr, Translation: bonsoir
|
| > Language: fr, Translation: salutations
|
| > Language: it, Translation: buongiorno
|
| > Language: it, Translation: buonanotte
|
| > Language: fr, Translation: rebonjour
|
| > Language: it, Translation: auguri
|
| > Language: fr, Translation: bonjour,
|
| > Language: it, Translation: buonasera
|
| > Language: it, Translation: chiamatemi
|
| Is it just me or these machine translations are worse than ...
| Google Translate?
| beau wrote:
| These results are less accurate than Google Translate. But they
| are far faster to get, and far less expensive to generate:
| https://cloud.google.com/translate/pricing -- our goal is here
| is speed. We want to search through many possibilities as
| quickly as possible.
|
| The word vectors have been aligned in multiple languages. Using
| an approximate nearest neighbor search we are able to find the
| nearest vector to the input in multiple languages very quickly.
|
| To keep the example simple, we did not try to filter the data
| through hand-built language dictionaries. In fact, we simply
| drop words in other languages that also appear in the English
| .vec file. Words like "ciao" appear frequently enough in
| otherwise English sentences that the example code drops it from
| Italian, and so is not shown in the results:
|
| % curl -s "https://dl.fbaipublicfiles.com/fasttext/vectors-
| aligned/wiki..." | grep -n ciao 50393:ciao 0.0120 ...
|
| One improvement would be to filter out any words that do not
| appear in a hand-curated dictionary instead of filtering out
| words that already appear in English. We decided not to show
| how to do this because we'd already introduced a few concepts,
| like aligned word vectors, approximate nearest neighbour
| searches, and wanted to keep the example as simple as possible.
| toxik wrote:
| Google Translate is state of the art, so I'm not sure why that
| would be surprising. That said, is there something wrong with
| the translations offered?
| dataflow wrote:
| > That said, is there something wrong with the translations
| offered?
|
| I think in French hello = "bonjour" and hi = "salut"... not
| sure where "bonjours" and "salutations" came from.
| T-A wrote:
| The Italian "auguri" means "best wishes"; "chiamatemi"
| means "call me". Neither is a plausible translation of
| "hello". The obvious one, "ciao", is missing.
| ampdepolymerase wrote:
| It would be better to run the vectors through an attention
| layer if you want sentence to sentence translation.
| aitk wrote:
| At first glance at the title, I thought it was translating Python
| code to Rust code.
| [deleted]
___________________________________________________________________
(page generated 2021-06-10 23:00 UTC)