[HN Gopher] Kokoro WebGPU: Real-time text-to-speech 100% locally...
       ___________________________________________________________________
        
       Kokoro WebGPU: Real-time text-to-speech 100% locally in the browser
        
       Author : xenova
       Score  : 163 points
       Date   : 2025-02-07 15:30 UTC (7 hours ago)
        
 (HTM) web link (huggingface.co)
 (TXT) w3m dump (huggingface.co)
        
       | xenova wrote:
       | It took some time, but we finally got Kokoro TTS (v1.0) running
       | in-browser w/ WebGPU acceleration! This enables real-time text-
       | to-speech without the need for a server. Looking forward to your
       | feedback!
        
         | deivid wrote:
         | Amazing! I'm interested in models running locally and Kokoro
         | seems amazing. Are you aware of similar models but for Speech
         | to text?
        
           | Ono-Sendai wrote:
           | whisper
        
           | xenova wrote:
           | We have released a bunch of speech recognition demos (using
           | whisper, moonshine, and others). For example:
           | 
           | - https://huggingface.co/spaces/Xenova/whisper-web
           | 
           | - https://huggingface.co/spaces/Xenova/whisper-webgpu
           | 
           | - https://huggingface.co/spaces/Xenova/realtime-whisper-
           | webgpu
           | 
           | - https://huggingface.co/spaces/webml-community/moonshine-web
        
         | amelius wrote:
         | Now that's what I call "server-less" computing!
        
         | sebastiennight wrote:
         | This is brilliant. All we need now is for someone to code a
         | frontend for it so we can input an article's URL and have this
         | voice read it out loud... built-in local voices on MacOS are
         | not even close to this Kokoro model
        
       | waynenilsen wrote:
       | Incredible work! I have listened to several tts and to have this
       | be free and in complete control of the customer is absolutely
       | incredible. This will unlock new use cases
       | 
       | I made https://app.readaloudto.me/ as a hobby thing and now it
       | could be enhanced with a local tts option!
        
       | rado wrote:
       | Crashes the iPad Safari tab
        
         | zamadatix wrote:
         | Mobile Safari (includes iPad) does not like to dish out large
         | amounts of memory.
        
           | dindresto wrote:
           | Same on macOS Safari (Sequoia, Safari 18.3, M3 Pro, 18gb RAM)
        
         | oliwary wrote:
         | Worked on my Pixel 6a, albeit quite slowly (~30s for 4s audio).
         | Still really impressed.
        
           | darkwater wrote:
           | Yep, same here. Pixel 6a and Firefox, it takes a while but it
           | sounds pretty good
        
       | reach-vb wrote:
       | Brilliant job! Love how fast it is, I'm sure if the rapid pace of
       | speech ML continues we'll have Speech to Speech models directly
       | running in our browser!
        
         | dust42 wrote:
         | It's already there, Hibiki by Kyutai.org was released yesterday
         | with speech to speech, french to english on Iphone:
         | 
         | https://x.com/neilzegh/status/1887498102455869775
         | 
         | https://github.com/kyutai-labs/hibiki
        
       | fallinditch wrote:
       | Brave browser and Samsung Galaxy S22 ultra - gives horrible
       | screeching noises
        
         | Guillaume86 wrote:
         | Same with chrome on Zenfone 8
        
         | shaneofalltrad wrote:
         | same in MacOS intel Chrome browser.
        
         | magicalhippo wrote:
         | Firefox on Samsung S21, worked fine albeit slow, around 20-25s
         | for the demo text.
         | 
         | Quality sounded good compared to a lot of other small TTS
         | models I've tried.
        
           | nnadams wrote:
           | Yeah this only worked with Firefox on my phone. All other
           | browsers generated a screechy noise instead.
        
       | djeastm wrote:
       | Fyi I tried this on my Galaxy S21 with both Brave and Chrome
       | browsers and just got screeching noises in the audio
        
         | mewse-hn wrote:
         | the mere idea of voice software's error mode being
         | uncontrollable screeching is the most hilarious thing to me
        
       | Asmod4n wrote:
       | Sounds horribly in chrome with an amd gpu, why is that?
        
         | mdaniel wrote:
         | Are you somehow implying that everyone in the AI arms race
         | believes that only CUDA exists?! /s
         | 
         | But, in a more serious tone: the story that I hear about AMD
         | GPUs is that they are, in fact, shittier because AMD themselves
         | give fewer shits. GIGO
        
           | CyberDildonics wrote:
           | What is this comment saying? You think the results are
           | different just because of AMD hardware? If there is a
           | difference it would be a software bug.
        
       | bentt wrote:
       | Sounded perfect for me. Brave/Win11/3090
        
       | SubiculumCode wrote:
       | Kokoro gives pretty good voices and is quite light...making it
       | useful despite its lack of voice cloning capability. However, I
       | haven't figured out how to run it in the context of a tts server
       | without homebrewing the server...which maybe is easy? IDK.
        
         | phildougherty wrote:
         | https://github.com/remsky/Kokoro-FastAPI
        
       | yawnxyz wrote:
       | holy cow, how did they get the OpenAI voices like Alloy and Echo,
       | generated in-browser and sounding 99% the same?
       | 
       | this is astounding
        
       | butz wrote:
       | Generating audio takes a bit, but wow, 92MB model for really
       | decent sounding speech. Is there a way to plug this thing into
       | speech dispatcher on Linux and use for accessibility?
        
       | realsid wrote:
       | Amazing ! This is my first time witnessing a model of such
       | prowess run in browser. Curious about quantization and webml
        
       | moralestapia wrote:
       | This is great but far from real-time.
       | 
       | (I get the joke that for some definition of real-time this is
       | real-time).
       | 
       | The reason why I use an API is because time to first byte is the
       | most important metric in the apps I'm working on.
       | 
       | That aside, kudos for the great work and I'm sure one day the
       | latency on this will be super low as well.
        
       ___________________________________________________________________
       (page generated 2025-02-07 23:01 UTC)