[HN Gopher] Show HN: OWhisper - Ollama for realtime speech-to-text
___________________________________________________________________
Show HN: OWhisper - Ollama for realtime speech-to-text
Hello everyone. This is Yujong from the Hyprnote team
(https://github.com/fastrepl/hyprnote). We built OWhisper for 2
reasons: (Also outlined in https://docs.hyprnote.com/owhisper/what-
is-this) (1). While working with on-device, realtime speech-to-
text, we found there isn't tooling that exists to download / run
the model in a practical way. (2). Also, we got frequent requests
to provide a way to plug in custom STT endpoints to the Hyprnote
desktop app, just like doing it with OpenAI-compatible LLM
endpoints. The (2) part is still kind of WIP, but we spent some
time writing docs so you'll get a good idea of what it will look
like if you skim through them. For (1) - You can try it now.
(https://docs.hyprnote.com/owhisper/cli/get-started)
bash brew tap fastrepl/hyprnote && brew install owhisper
owhisper pull whisper-cpp-base-q8-en owhisper run whisper-
cpp-base-q8-en If you're tired of Whisper, we also
support Moonshine :) Give it a shot (owhisper pull moonshine-onnx-
base-q8) We're here and looking forward to your comments!
Author : yujonglee
Score : 84 points
Date : 2025-08-14 15:47 UTC (7 hours ago)
(HTM) web link (docs.hyprnote.com)
(TXT) w3m dump (docs.hyprnote.com)
| yujonglee wrote:
| Happy to answer any questions!
|
| These are list of local models it supports:
|
| - whisper-cpp-base-q8
|
| - whisper-cpp-base-q8-en
|
| - whisper-cpp-tiny-q8
|
| - whisper-cpp-tiny-q8-en
|
| - whisper-cpp-small-q8
|
| - whisper-cpp-small-q8-en
|
| - whisper-cpp-large-turbo-q8
|
| - moonshine-onnx-tiny
|
| - moonshine-onnx-tiny-q4
|
| - moonshine-onnx-tiny-q8
|
| - moonshine-onnx-base
|
| - moonshine-onnx-base-q4
|
| - moonshine-onnx-base-q8
| phkahler wrote:
| I thought whisper and others took large chunks (20-30 seconds)
| of speech, or a complete wave file as input. How do you get
| real-time transcription? What size chunks do you feed it?
|
| To me, STT should take a continuous audio stream and output a
| continuous text stream.
| yujonglee wrote:
| I use VAD to chunk audio.
|
| Whisper and Moonshine both works in a chunk, but for
| moonshine:
|
| > Moonshine's compute requirements scale with the length of
| input audio. This means that shorter input audio is processed
| faster, unlike existing Whisper models that process
| everything as 30-second chunks. To give you an idea of the
| benefits: Moonshine processes 10-second audio segments 5x
| faster than Whisper while maintaining the same (or better!)
| WER.
|
| Also for kyutai, we can input continuous audio in and get
| continuous text out.
|
| - https://github.com/moonshine-ai/moonshine - https://docs.hy
| prnote.com/owhisper/configuration/providers/k...
| mijoharas wrote:
| Something like that, in a cli tool, that just gives text to
| stdout would be perfect for a lot of use cases for me!
|
| (maybe with an `owhisper serve` somewhere else to start the
| model running or whatever.)
| yujonglee wrote:
| Are you thinking about the realtime use-case or batch
| use-case?
|
| For just transcribing file/audio,
|
| `owhisper run <MODEL> --file a.wav` or
|
| `curl httpsL//something.com/audio.wav | owhisper run
| <MODEL>`
|
| might makes sense.
| mijoharas wrote:
| agreed, both of those make sense, but I was thinking
| realtime. (pipes can stream data, I'd like and find
| useful something that can stream tts to stdout in
| realtime.)
| yujonglee wrote:
| It's open-source. Happy to review & merge if you can send
| us PR!
|
| https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58
| 625...
| alkh wrote:
| Sorry, maybe I missed it but I didn't see this list on your
| website. I think it is a good idea to add this info there.
| Besides that, thank you for the effort and your work! I will
| definetely give it a try
| yujonglee wrote:
| got it. fyi if you run `owhisper pull --help`, this info is
| printed
| JP_Watts wrote:
| I'd like to use this to transcribe meeting minutes with multiple
| people. How could this program work for that use case?
| yujonglee wrote:
| If your use-case is meeting,
| https://github.com/fastrepl/hyprnote is for you. OWhisper is
| more like a headless version of it.
| JP_Watts wrote:
| Can you describe how it pick different voices? Does it need
| separate audio channels, or does it recognize different
| voices on the same audio input?
| yujonglee wrote:
| It separate mic/speaker as 2 channel. So you can reliably
| get "what you said" vs "what you heard".
|
| For splitting speaker within channel, we need AI model to
| do that. It is not implemented yet, but I think we'll be in
| good shape somewhere in September.
|
| Also we have transcript editor that you can easily split
| segment, assign speakers.
| sxp wrote:
| If you want to transcribe meeting notes, whisper isn't the best
| tool because it doesn't separate the transcribe by speakers.
| There are some other tools that do that, but I'm not sure what
| the best local option is. I've used Google's cloud STT with the
| diarization option and manually renamed "Speaker N" after the
| fact.
| solarkraft wrote:
| Wait, this is cool.
|
| I just spent last week researching the options (especially for my
| M1!) and was left wishing for a standard, full-service (live)
| transcription server for Whisper like OLlama has been for LLMs.
|
| I'm excited to try this out and see your API (there seems to be a
| standard vaccuum here due to openai not having a real time
| transcription service, which I find to be a bummer)!
|
| Edit: They seem to emulate the Deepgram API
| (https://developers.deepgram.com/reference/speech-to-text-
| api...), which seems like a solid choice. I'd definitely like to
| see a standard emerging here.
| yujonglee wrote:
| Correct. About the deepgram-compatibility:
| https://docs.hyprnote.com/owhisper/deepgram-compatibility
|
| Let me know how it goes!
| clickety_clack wrote:
| Please find a way to add speaker diarization, with a way to
| remember the speakers. You can do it with pyannote, and get a
| vector embedding of each speaker that can be compared between
| audio samples, but that's a year old now so I'm sure there's
| better options now!
| yujonglee wrote:
| yeah that is on the roadmap!
| mijoharas wrote:
| Ok, cool! I was actually one of the people on the hyprnote HN
| thread asking for a headless mode!
|
| I was actually integrating some whisper tools yesterday. I was
| wondering if there was a way to get a streaming response, and was
| thinking it'd be nice if you can.
|
| I'm on linux, so don't think I can test out owhisper right now,
| but is that a thing that's possible?
|
| Also, it looks like the `owhisper run` command gives it's output
| as a tui. Is there an option for a plain text response so that we
| can just pipe it to other programs? (maybe just `kill`/`CTRL+C`
| to stop the recording and finalize the words).
|
| Same question for streaming, is there a way to get a streaming
| text output from owhisper? (it looks like you said you create a
| deepgram compatible api, I had a quick look at the api docs, but
| I don't know how easy it is to hook into it and get some nice
| streaming text while speaking).
|
| Oh yeah, and diarisation (available with a flag?) would be
| awesome, one of the things that's missing from most of the
| easiest to run things I can find.
| mijoharas wrote:
| Oh wait, maybe you do support linux for owhisper:
| https://github.com/fastrepl/homebrew-hyprnote/blob/main/Form...
|
| Can you help me out to find where the code you've built is? I
| can see the folder in github[0], but I can't see the code for
| the cli for instance? unless I'm blind.
|
| [0] https://github.com/fastrepl/hyprnote/tree/main/owhisper
| yujonglee wrote:
| This is CLI entry point:
|
| https://github.com/fastrepl/hyprnote/blob/8bc7a5eeae0fe58625.
| ..
| yujonglee wrote:
| > I'm on linux
|
| I didn't tested on Linux yet, but we have linux build:
| http://owhisper.hyprnote.com/download/latest/linux-x86_64
|
| > also, it looks like the `owhisper run` command gives it's
| output as a tui. Is there an option for a plain tex
|
| `owhisper run` is more like way to quickly trying it out. But I
| think piping is definitely something that should work.
|
| > Same question for streaming, is there a way to get a
| streaming text output from owhisper?
|
| You can use Deepgram client to talk to `owhisper serve`.
| (https://docs.hyprnote.com/owhisper/deepgram-compatibility) So
| best resource might be Deepgram client SDK docs.
|
| > diarisation
|
| yeah on the roadmap
| mijoharas wrote:
| Nice stuff, had a quick test on linux and it works (built
| directly, I didn't check out the brew). I ran into a small
| issue with moonshine and opened an issue on github.
|
| Great work on this! excited to keep an eye on things.
| DiabloD3 wrote:
| I suggest you don't brand this "Ollama for X". They've become a
| commercial operation that is trying to FOSS-wash their actions
| through using llama.cpp's code and then throwing their users
| under the bus when they can't support them.
|
| I see that you are also using llama.cpp's code? That's cool, but
| make sure you become a member of that community, not an abuser.
| yujonglee wrote:
| yeah we use whisper.cpp for whisper inference. this is more
| like a community-focused project, not a commercial product!
| wanderingmind wrote:
| Thank you for taking the time to build something and share it.
| However what is the advantage of using this over whisper.cpp
| stream that can also do real time conversion?
|
| https://github.com/ggml-org/whisper.cpp/tree/master/examples...
___________________________________________________________________
(page generated 2025-08-14 23:00 UTC)