[HN Gopher] High-Fidelity Simultaneous Speech-to-Speech Translation
       ___________________________________________________________________
        
       High-Fidelity Simultaneous Speech-to-Speech Translation
        
       Author : Bluestein
       Score  : 30 points
       Date   : 2025-07-03 20:27 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | benlivengood wrote:
       | Now to get the model to run in an earbud...
        
         | lapink wrote:
         | The model can actually run on an iPhone 16 Pro, so if the
         | earbud is connected to one that could work!
        
           | Bluestein wrote:
           | That would be insane.-
           | 
           | Thinking of it, the _whole_ "stack" from earbuds to phone to
           | _cloud_ - even in just something so  "commonplace" as
           | Assistant or Alexa ...
           | 
           | ... Is amazing: _All that computing power_ at our disposal.-
        
       | wedn3sday wrote:
       | For anyone else looking for examples:
       | https://huggingface.co/spaces/kyutai/hibiki-samples
        
       | AIorNot wrote:
       | this is amazing - love to play with this- what about other
       | languages besides french to english
        
         | lapink wrote:
         | Adding more languages is definitely planned! This was Tom (the
         | first author) master's internship project with Kyutai, and it
         | was easier to prototype the idea with a single pair. Also he
         | will be presenting this work at ICML in two weeks if anyone is
         | around and wants to learn more.
        
       | iambateman wrote:
       | This is why I wonder about the value of language learning for
       | reasons other than "I'm really passionate about it."
       | 
       | We are so close to interfaces that reduce the language barrier by
       | a lot...
        
         | rafale wrote:
         | What about brain development and general intelligence.
         | Knowledge will always have a value, or else we become slaves to
         | the machine.
        
       | cs702 wrote:
       | Nice. I'm impressed.
       | 
       | Translator jobs are going to go poof! overnight.
       | 
       | Just sayin'.
        
       | gagabity wrote:
       | Yandex Browser has been doing this for Russian for a while, if
       | you go to YT it offers to translate to Russian, it does multiple
       | speakers and voices from what I remember. Not sure if all the
       | technicalities are the same.
        
       | Grosvenor wrote:
       | This is so cool. The future is cool!
       | 
       | I wonder how it will work on languages that have different
       | grammatical structure than french/english? Like Finno-Ugric
       | languages which have sort of a Yoda speech to them. Edit: In
       | Finno-Ugric languages words later on in a sentence can completely
       | change the meaning. Will be interesting to look at.
       | 
       | It's considerate of them to name it after my favourite whisky.
        
         | lapink wrote:
         | The alignment between source and target is automatically
         | inferred, basically by searching when the uncertainty over a
         | given output word reduces the most once enough input words are
         | seen. This is then lifted to the audio domain. In theory the
         | same trick should work even with longer grammatical inversions
         | between languages, although this will lead to larger delays. To
         | be tested!
        
       | jauntywundrkind wrote:
       | Link to repo: https://github.com/kyutai-labs/hibiki
        
       | totetsu wrote:
       | All these Japanese project names and no Japanese support (ToT)
        
       ___________________________________________________________________
       (page generated 2025-07-03 23:00 UTC)