[HN Gopher] Show HN: OWhisper - Ollama for realtime speech-to-text
       ___________________________________________________________________
        
       Show HN: OWhisper - Ollama for realtime speech-to-text
        
       Hello everyone. This is Yujong from the Hyprnote team
       (https://github.com/fastrepl/hyprnote).  We built OWhisper for 2
       reasons: (Also outlined in https://docs.hyprnote.com/owhisper/what-
       is-this)  (1). While working with on-device, realtime speech-to-
       text, we found there isn't tooling that exists to download / run
       the model in a practical way.  (2). Also, we got frequent requests
       to provide a way to plug in custom STT endpoints to the Hyprnote
       desktop app, just like doing it with OpenAI-compatible LLM
       endpoints.  The (2) part is still kind of WIP, but we spent some
       time writing docs so you'll get a good idea of what it will look
       like if you skim through them.  For (1) - You can try it now.
       (https://docs.hyprnote.com/owhisper/cli/get-started)
       bash       brew tap fastrepl/hyprnote && brew install owhisper
       owhisper pull whisper-cpp-base-q8-en       owhisper run whisper-
       cpp-base-q8-en            If you're tired of Whisper, we also
       support Moonshine :) Give it a shot (owhisper pull moonshine-onnx-
       base-q8)  We're here and looking forward to your comments!
        
       Author : yujonglee
       Score  : 84 points
       Date   : 2025-08-14 15:47 UTC (7 hours ago)
        
 (HTM) web link (docs.hyprnote.com)
 (TXT) w3m dump (docs.hyprnote.com)
        
       | yujonglee wrote:
       | Happy to answer any questions!
       | 
       | These are list of local models it supports:
       | 
       | - whisper-cpp-base-q8
       | 
       | - whisper-cpp-base-q8-en
       | 
       | - whisper-cpp-tiny-q8
       | 
       | - whisper-cpp-tiny-q8-en
       | 
       | - whisper-cpp-small-q8
       | 
       | - whisper-cpp-small-q8-en
       | 
       | - whisper-cpp-large-turbo-q8
       | 
       | - moonshine-onnx-tiny
       | 
       | - moonshine-onnx-tiny-q4
       | 
       | - moonshine-onnx-tiny-q8
       | 
       | - moonshine-onnx-base
       | 
       | - moonshine-onnx-base-q4
       | 
       | - moonshine-onnx-base-q8
        
         | phkahler wrote:
         | I thought whisper and others took large chunks (20-30 seconds)
         | of speech, or a complete wave file as input. How do you get
         | real-time transcription? What size chunks do you feed it?
         | 
         | To me, STT should take a continuous audio stream and output a
         | continuous text stream.
        
           | yujonglee wrote:
           | I use VAD to chunk audio.
           | 
           | Whisper and Moonshine both works in a chunk, but for
           | moonshine:
           | 
           | > Moonshine's compute requirements scale with the length of
           | input audio. This means that shorter input audio is processed
           | faster, unlike existing Whisper models that process
           | everything as 30-second chunks. To give you an idea of the
           | benefits: Moonshine processes 10-second audio segments 5x
           | faster than Whisper while maintaining the same (or better!)
           | WER.
           | 
           | Also for kyutai, we can input continuous audio in and get
           | continuous text out.
           | 
           | - https://github.com/moonshine-ai/moonshine - https://docs.hy
           | prnote.com/owhisper/configuration/providers/k...
        
             | mijoharas wrote:
             | Something like that, in a cli tool, that just gives text to
             | stdout would be perfect for a lot of use cases for me!
             | 
             | (maybe with an `owhisper serve` somewhere else to start the
             | model running or whatever.)
        
               | yujonglee wrote:
               | Are you thinking about the realtime use-case or batch
               | use-case?
               | 
               | For just transcribing file/audio,
               | 
               | `owhisper run <MODEL> --file a.wav` or
               | 
               | `curl httpsL//something.com/audio.wav | owhisper run
               | <MODEL>`
               | 
               | might makes sense.
        
               | mijoharas wrote:
               | agreed, both of those make sense, but I was thinking
               | realtime. (pipes can stream data, I'd like and find
               | useful something that can stream tts to stdout in
               | realtime.)
        
               | yujonglee wrote:
               | It's open-source. Happy to review & merge if you can send
               | us PR!
               | 
               | https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58
               | 625...
        
         | alkh wrote:
         | Sorry, maybe I missed it but I didn't see this list on your
         | website. I think it is a good idea to add this info there.
         | Besides that, thank you for the effort and your work! I will
         | definetely give it a try
        
           | yujonglee wrote:
           | got it. fyi if you run `owhisper pull --help`, this info is
           | printed
        
       | JP_Watts wrote:
       | I'd like to use this to transcribe meeting minutes with multiple
       | people. How could this program work for that use case?
        
         | yujonglee wrote:
         | If your use-case is meeting,
         | https://github.com/fastrepl/hyprnote is for you. OWhisper is
         | more like a headless version of it.
        
           | JP_Watts wrote:
           | Can you describe how it pick different voices? Does it need
           | separate audio channels, or does it recognize different
           | voices on the same audio input?
        
             | yujonglee wrote:
             | It separate mic/speaker as 2 channel. So you can reliably
             | get "what you said" vs "what you heard".
             | 
             | For splitting speaker within channel, we need AI model to
             | do that. It is not implemented yet, but I think we'll be in
             | good shape somewhere in September.
             | 
             | Also we have transcript editor that you can easily split
             | segment, assign speakers.
        
         | sxp wrote:
         | If you want to transcribe meeting notes, whisper isn't the best
         | tool because it doesn't separate the transcribe by speakers.
         | There are some other tools that do that, but I'm not sure what
         | the best local option is. I've used Google's cloud STT with the
         | diarization option and manually renamed "Speaker N" after the
         | fact.
        
       | solarkraft wrote:
       | Wait, this is cool.
       | 
       | I just spent last week researching the options (especially for my
       | M1!) and was left wishing for a standard, full-service (live)
       | transcription server for Whisper like OLlama has been for LLMs.
       | 
       | I'm excited to try this out and see your API (there seems to be a
       | standard vaccuum here due to openai not having a real time
       | transcription service, which I find to be a bummer)!
       | 
       | Edit: They seem to emulate the Deepgram API
       | (https://developers.deepgram.com/reference/speech-to-text-
       | api...), which seems like a solid choice. I'd definitely like to
       | see a standard emerging here.
        
         | yujonglee wrote:
         | Correct. About the deepgram-compatibility:
         | https://docs.hyprnote.com/owhisper/deepgram-compatibility
         | 
         | Let me know how it goes!
        
       | clickety_clack wrote:
       | Please find a way to add speaker diarization, with a way to
       | remember the speakers. You can do it with pyannote, and get a
       | vector embedding of each speaker that can be compared between
       | audio samples, but that's a year old now so I'm sure there's
       | better options now!
        
         | yujonglee wrote:
         | yeah that is on the roadmap!
        
       | mijoharas wrote:
       | Ok, cool! I was actually one of the people on the hyprnote HN
       | thread asking for a headless mode!
       | 
       | I was actually integrating some whisper tools yesterday. I was
       | wondering if there was a way to get a streaming response, and was
       | thinking it'd be nice if you can.
       | 
       | I'm on linux, so don't think I can test out owhisper right now,
       | but is that a thing that's possible?
       | 
       | Also, it looks like the `owhisper run` command gives it's output
       | as a tui. Is there an option for a plain text response so that we
       | can just pipe it to other programs? (maybe just `kill`/`CTRL+C`
       | to stop the recording and finalize the words).
       | 
       | Same question for streaming, is there a way to get a streaming
       | text output from owhisper? (it looks like you said you create a
       | deepgram compatible api, I had a quick look at the api docs, but
       | I don't know how easy it is to hook into it and get some nice
       | streaming text while speaking).
       | 
       | Oh yeah, and diarisation (available with a flag?) would be
       | awesome, one of the things that's missing from most of the
       | easiest to run things I can find.
        
         | mijoharas wrote:
         | Oh wait, maybe you do support linux for owhisper:
         | https://github.com/fastrepl/homebrew-hyprnote/blob/main/Form...
         | 
         | Can you help me out to find where the code you've built is? I
         | can see the folder in github[0], but I can't see the code for
         | the cli for instance? unless I'm blind.
         | 
         | [0] https://github.com/fastrepl/hyprnote/tree/main/owhisper
        
           | yujonglee wrote:
           | This is CLI entry point:
           | 
           | https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58625.
           | ..
        
         | yujonglee wrote:
         | > I'm on linux
         | 
         | I didn't tested on Linux yet, but we have linux build:
         | http://owhisper.hyprnote.com/download/latest/linux-x86_64
         | 
         | > also, it looks like the `owhisper run` command gives it's
         | output as a tui. Is there an option for a plain tex
         | 
         | `owhisper run` is more like way to quickly trying it out. But I
         | think piping is definitely something that should work.
         | 
         | > Same question for streaming, is there a way to get a
         | streaming text output from owhisper?
         | 
         | You can use Deepgram client to talk to `owhisper serve`.
         | (https://docs.hyprnote.com/owhisper/deepgram-compatibility) So
         | best resource might be Deepgram client SDK docs.
         | 
         | > diarisation
         | 
         | yeah on the roadmap
        
           | mijoharas wrote:
           | Nice stuff, had a quick test on linux and it works (built
           | directly, I didn't check out the brew). I ran into a small
           | issue with moonshine and opened an issue on github.
           | 
           | Great work on this! excited to keep an eye on things.
        
       | DiabloD3 wrote:
       | I suggest you don't brand this "Ollama for X". They've become a
       | commercial operation that is trying to FOSS-wash their actions
       | through using llama.cpp's code and then throwing their users
       | under the bus when they can't support them.
       | 
       | I see that you are also using llama.cpp's code? That's cool, but
       | make sure you become a member of that community, not an abuser.
        
         | yujonglee wrote:
         | yeah we use whisper.cpp for whisper inference. this is more
         | like a community-focused project, not a commercial product!
        
       | wanderingmind wrote:
       | Thank you for taking the time to build something and share it.
       | However what is the advantage of using this over whisper.cpp
       | stream that can also do real time conversion?
       | 
       | https://github.com/ggml-org/whisper.cpp/tree/master/examples...
        
       ___________________________________________________________________
       (page generated 2025-08-14 23:00 UTC)