[HN Gopher] High-Fidelity Simultaneous Speech-to-Speech Translation
___________________________________________________________________
High-Fidelity Simultaneous Speech-to-Speech Translation
Author : Bluestein
Score : 30 points
Date : 2025-07-03 20:27 UTC (2 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| benlivengood wrote:
| Now to get the model to run in an earbud...
| lapink wrote:
| The model can actually run on an iPhone 16 Pro, so if the
| earbud is connected to one that could work!
| Bluestein wrote:
| That would be insane.-
|
| Thinking of it, the _whole_ "stack" from earbuds to phone to
| _cloud_ - even in just something so "commonplace" as
| Assistant or Alexa ...
|
| ... Is amazing: _All that computing power_ at our disposal.-
| wedn3sday wrote:
| For anyone else looking for examples:
| https://huggingface.co/spaces/kyutai/hibiki-samples
| AIorNot wrote:
| this is amazing - love to play with this- what about other
| languages besides french to english
| lapink wrote:
| Adding more languages is definitely planned! This was Tom (the
| first author) master's internship project with Kyutai, and it
| was easier to prototype the idea with a single pair. Also he
| will be presenting this work at ICML in two weeks if anyone is
| around and wants to learn more.
| iambateman wrote:
| This is why I wonder about the value of language learning for
| reasons other than "I'm really passionate about it."
|
| We are so close to interfaces that reduce the language barrier by
| a lot...
| rafale wrote:
| What about brain development and general intelligence.
| Knowledge will always have a value, or else we become slaves to
| the machine.
| cs702 wrote:
| Nice. I'm impressed.
|
| Translator jobs are going to go poof! overnight.
|
| Just sayin'.
| gagabity wrote:
| Yandex Browser has been doing this for Russian for a while, if
| you go to YT it offers to translate to Russian, it does multiple
| speakers and voices from what I remember. Not sure if all the
| technicalities are the same.
| Grosvenor wrote:
| This is so cool. The future is cool!
|
| I wonder how it will work on languages that have different
| grammatical structure than french/english? Like Finno-Ugric
| languages which have sort of a Yoda speech to them. Edit: In
| Finno-Ugric languages words later on in a sentence can completely
| change the meaning. Will be interesting to look at.
|
| It's considerate of them to name it after my favourite whisky.
| lapink wrote:
| The alignment between source and target is automatically
| inferred, basically by searching when the uncertainty over a
| given output word reduces the most once enough input words are
| seen. This is then lifted to the audio domain. In theory the
| same trick should work even with longer grammatical inversions
| between languages, although this will lead to larger delays. To
| be tested!
| jauntywundrkind wrote:
| Link to repo: https://github.com/kyutai-labs/hibiki
| totetsu wrote:
| All these Japanese project names and no Japanese support (ToT)
___________________________________________________________________
(page generated 2025-07-03 23:00 UTC)