[HN Gopher] Whisper.api: Open-source, self-hosted speech-to-text...
___________________________________________________________________
Whisper.api: Open-source, self-hosted speech-to-text with fast
transcription
Author : innovatorved
Score : 101 points
Date : 2023-08-22 17:48 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Dig1t wrote:
| >Get Your token
|
| If it's completely self-hosted why do I need to get a token?
| Where does the actual model run?
| innovatorved wrote:
| getToken is just an authentication layer for authenticating
| your request. If you want to self-host it, just clone the repo
| and please check the .env.example file.
| 1024core wrote:
| Does Android OS come with ASR?
| mkl wrote:
| Not that you can feed arbitrary audio files to without an app
| for that.
| pizzafeelsright wrote:
| This is not fully self-hosted so much as middle-ware, no?
| innovatorved wrote:
| It is completely self-hosted, but it currently supports only
| the tiny and base models. You can soon expect support for large
| models. For any requests, you can create an issue.
| [deleted]
| edgarvaldes wrote:
| Related to whisper: whisperX is a god send. I can finally watch
| old or uncommon tv series with subtitles.
| jcims wrote:
| Oh dang, diarization? How well does it work?
| geekodour wrote:
| Nice! This will be very useful for me. Think I can run this
| locally can spin a basic telegram bot around it for personal use.
|
| One issue I faced with all the whisper based transcript
| generators is that there seems to be no good way to make
| editing/correcting the generated text with word level timestamp.
| I created a small web based tool[0] for that.
|
| By any chance if anyone is looking to edit transcripts generated
| using whisper, you'd probably find it useful.
|
| [0] https://github.com/geekodour/wscribe-editor
| distantsounds wrote:
| how is this open source, or self-hosted, when it requires an API
| key and a login from a third party?
| innovatorved wrote:
| No, it is not a third-party. It is a just PostgreSQL database
| for logging everything. You can simply visit the /docs
| endpoint. It is just for authentication so that you can work
| with different users. One Again its completely self hosted
| d-lisp wrote:
| What's the point of logging everything ? I don't understand,
| isn't it possible to just deal with authentification locally
| ?
| freedomben wrote:
| I'm guessing the installation instructions just need a bit of
| love. It could seem confusing to see a token request to
| https://innovatorved-whisper-
| api.hf.space/api/v1/users/get_t...
| innovatorved wrote:
| https://innovatorved-whisper-api.hf.space/docs
|
| Just visit the swagger create account and then gettoken to
| grab a token
| nchudleigh wrote:
| This is awesome.
|
| For anyone confused about the project, it is using whisper.cpp, a
| C-based runner and translation of the open whisper model from
| OpenAI. It is built by the team behind GGML and llama.cpp.
| https://github.com/ggerganov
|
| You can fork this code, run it on your own server, and hit the
| API. The server itself will use FFmpeg to convert the audio file
| into the required format and run the C translation of the whisper
| model against the file.
|
| By doing this you can separate yourself from the requirement of
| paying the fee that OpenAI charges for their Whisper service and
| fully own your translations. The models that the author has
| supplied here are rather small but should run decent on a CPU. If
| you want to go to larger model sizes you would likely need to
| change the compilation options and use a server with a GPU.
|
| Similar to this project, my product https://superwhisper.com is
| using these whisper.cpp models to provide really good Dictation
| on macOS.
|
| Its runs really fast on the M series chips. Most of this message
| was dictated using superwhisper.
|
| Congrats to the author of this project. Seems like a useful
| implementation of the whisper.cpp project.
|
| I wonder if they would accept it upstream in the examples.
| LeoPanthera wrote:
| So is "real time" translation a thing yet? I've long wanted to be
| able watch non-english television and have the audio translated
| into English subtitles. It's doable for pre-recorded things, but
| not for live.
|
| An iPhone app that could do this from the microphone would also
| be amazing. Google Translate and it's various competitors from
| Microsoft/Apple are nearly there, but they all stop listening
| inbetween sentences. Something that just listened constantly,
| printing translated text onto the screen, would be amazing.
| videogreg93 wrote:
| I've been using the Microsoft Speech api for an app and so far
| it's been surprisingly very good for realtime speech to text.
| innovatorved wrote:
| Just wait for a couple of weeks. I am working on speech-to-
| speech translation. Instead of subtitles, you can listen to it
| directly. I am also working on subtitles.
| LeoPanthera wrote:
| But I don't want that. I just want a live stream of
| translated text.
| xigency wrote:
| You can do this with PowerPoint actually. I bumped
| something once and Japanese subtitles popped up following
| what I was saying in my confusion.
| innovatorved wrote:
| Many of you are asking if the project is completely self-hosted
| and does not rely on any third-party services. Yes, it is
| completely self-hosted and does not rely on any third-party
| services. The user is for authentication, so no one can use the
| service without authentication.
| Animats wrote:
| Huh?
|
| _" This project provides an API with user level access support
| to transcribe speech to text using a finetuned and processed
| Whisper ASR model."_
|
| Why is this a service at all? Why not just a library? Or a
| subprocess?
| devmor wrote:
| From what I can see, it runs in a docker container and uses
| an HTTP server to handle interaction.
| mkl wrote:
| Getting an authentication token does rely on a third-party
| service, if the README instructions are correct. It requires
| sending an email address to that third party.
| ChrisArchitect wrote:
| Not to be confused with
|
| _Whisper - open source speech recognition by OpenAI_
| https://news.ycombinator.com/item?id=34985848
| 3abiton wrote:
| I thought that was the same. I still don't see the difference.
| innovatorved wrote:
| [flagged]
| devmor wrote:
| It is the same, this is a self-hosted solution.
| innovatorved wrote:
| https://openai.com/research/whisper
| v7n wrote:
| Many live streamers, and platforms, would love to have custom
| real-time transcription elements. I actually looked into this
| exact project of yours when I thought about creating such a
| thing.
|
| Even if it meant delaying the broadcast for a second while
| transcribing the accessibility value could be immense.
| innovatorved wrote:
| Whisper API - Speech to Text Transcription
|
| This open source project provides a self-hostable API for speech
| to text transcription using a finetuned Whisper ASR model. The
| API allows you to easily convert audio files to text through HTTP
| requests. Ideal for adding speech recognition capabilities to
| your applications.
|
| Key features:
|
| - Uses a finetuned Whisper model for accurate speech recognition
| - Simple HTTP API for audio file transcription - User level
| access with API keys for managing usage - Self-hostable code for
| your own speech transcription service - Quantized model
| optimization for fast and efficient inference - Open source
| implementation for customization and transparency
| rrsp wrote:
| Are you able to provide more information on the fine tuning?
| Any improvement in WER and what language it was fine tuned in
| and the size of the dataset used?
| brianjking wrote:
| What was the fine tune?
|
| How does this compare to what is possible using
| https://goodsnooze.gumroad.com/l/macwhisper for example?
|
| Thanks!
| stavros wrote:
| This looks great, does recognition use the GPU? What's the
| speed you get on it?
| atajwala wrote:
| Any plans to add phrase timestamps, channel separation and
| other equivalent ASR features to make this API more
| approachable?
| innovatorved wrote:
| I am working on the timestamp feature. You will be able to
| see the option for timestamp soon.
| atajwala wrote:
| Appreciate it! All the best.
___________________________________________________________________
(page generated 2023-08-22 23:01 UTC)