[HN Gopher] Whisper.api: Open-source, self-hosted speech-to-text...
       ___________________________________________________________________
        
       Whisper.api: Open-source, self-hosted speech-to-text with fast
       transcription
        
       Author : innovatorved
       Score  : 101 points
       Date   : 2023-08-22 17:48 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Dig1t wrote:
       | >Get Your token
       | 
       | If it's completely self-hosted why do I need to get a token?
       | Where does the actual model run?
        
         | innovatorved wrote:
         | getToken is just an authentication layer for authenticating
         | your request. If you want to self-host it, just clone the repo
         | and please check the .env.example file.
        
       | 1024core wrote:
       | Does Android OS come with ASR?
        
         | mkl wrote:
         | Not that you can feed arbitrary audio files to without an app
         | for that.
        
       | pizzafeelsright wrote:
       | This is not fully self-hosted so much as middle-ware, no?
        
         | innovatorved wrote:
         | It is completely self-hosted, but it currently supports only
         | the tiny and base models. You can soon expect support for large
         | models. For any requests, you can create an issue.
        
         | [deleted]
        
       | edgarvaldes wrote:
       | Related to whisper: whisperX is a god send. I can finally watch
       | old or uncommon tv series with subtitles.
        
         | jcims wrote:
         | Oh dang, diarization? How well does it work?
        
       | geekodour wrote:
       | Nice! This will be very useful for me. Think I can run this
       | locally can spin a basic telegram bot around it for personal use.
       | 
       | One issue I faced with all the whisper based transcript
       | generators is that there seems to be no good way to make
       | editing/correcting the generated text with word level timestamp.
       | I created a small web based tool[0] for that.
       | 
       | By any chance if anyone is looking to edit transcripts generated
       | using whisper, you'd probably find it useful.
       | 
       | [0] https://github.com/geekodour/wscribe-editor
        
       | distantsounds wrote:
       | how is this open source, or self-hosted, when it requires an API
       | key and a login from a third party?
        
         | innovatorved wrote:
         | No, it is not a third-party. It is a just PostgreSQL database
         | for logging everything. You can simply visit the /docs
         | endpoint. It is just for authentication so that you can work
         | with different users. One Again its completely self hosted
        
           | d-lisp wrote:
           | What's the point of logging everything ? I don't understand,
           | isn't it possible to just deal with authentification locally
           | ?
        
           | freedomben wrote:
           | I'm guessing the installation instructions just need a bit of
           | love. It could seem confusing to see a token request to
           | https://innovatorved-whisper-
           | api.hf.space/api/v1/users/get_t...
        
             | innovatorved wrote:
             | https://innovatorved-whisper-api.hf.space/docs
             | 
             | Just visit the swagger create account and then gettoken to
             | grab a token
        
       | nchudleigh wrote:
       | This is awesome.
       | 
       | For anyone confused about the project, it is using whisper.cpp, a
       | C-based runner and translation of the open whisper model from
       | OpenAI. It is built by the team behind GGML and llama.cpp.
       | https://github.com/ggerganov
       | 
       | You can fork this code, run it on your own server, and hit the
       | API. The server itself will use FFmpeg to convert the audio file
       | into the required format and run the C translation of the whisper
       | model against the file.
       | 
       | By doing this you can separate yourself from the requirement of
       | paying the fee that OpenAI charges for their Whisper service and
       | fully own your translations. The models that the author has
       | supplied here are rather small but should run decent on a CPU. If
       | you want to go to larger model sizes you would likely need to
       | change the compilation options and use a server with a GPU.
       | 
       | Similar to this project, my product https://superwhisper.com is
       | using these whisper.cpp models to provide really good Dictation
       | on macOS.
       | 
       | Its runs really fast on the M series chips. Most of this message
       | was dictated using superwhisper.
       | 
       | Congrats to the author of this project. Seems like a useful
       | implementation of the whisper.cpp project.
       | 
       | I wonder if they would accept it upstream in the examples.
        
       | LeoPanthera wrote:
       | So is "real time" translation a thing yet? I've long wanted to be
       | able watch non-english television and have the audio translated
       | into English subtitles. It's doable for pre-recorded things, but
       | not for live.
       | 
       | An iPhone app that could do this from the microphone would also
       | be amazing. Google Translate and it's various competitors from
       | Microsoft/Apple are nearly there, but they all stop listening
       | inbetween sentences. Something that just listened constantly,
       | printing translated text onto the screen, would be amazing.
        
         | videogreg93 wrote:
         | I've been using the Microsoft Speech api for an app and so far
         | it's been surprisingly very good for realtime speech to text.
        
         | innovatorved wrote:
         | Just wait for a couple of weeks. I am working on speech-to-
         | speech translation. Instead of subtitles, you can listen to it
         | directly. I am also working on subtitles.
        
           | LeoPanthera wrote:
           | But I don't want that. I just want a live stream of
           | translated text.
        
             | xigency wrote:
             | You can do this with PowerPoint actually. I bumped
             | something once and Japanese subtitles popped up following
             | what I was saying in my confusion.
        
       | innovatorved wrote:
       | Many of you are asking if the project is completely self-hosted
       | and does not rely on any third-party services. Yes, it is
       | completely self-hosted and does not rely on any third-party
       | services. The user is for authentication, so no one can use the
       | service without authentication.
        
         | Animats wrote:
         | Huh?
         | 
         |  _" This project provides an API with user level access support
         | to transcribe speech to text using a finetuned and processed
         | Whisper ASR model."_
         | 
         | Why is this a service at all? Why not just a library? Or a
         | subprocess?
        
           | devmor wrote:
           | From what I can see, it runs in a docker container and uses
           | an HTTP server to handle interaction.
        
         | mkl wrote:
         | Getting an authentication token does rely on a third-party
         | service, if the README instructions are correct. It requires
         | sending an email address to that third party.
        
       | ChrisArchitect wrote:
       | Not to be confused with
       | 
       |  _Whisper - open source speech recognition by OpenAI_
       | https://news.ycombinator.com/item?id=34985848
        
         | 3abiton wrote:
         | I thought that was the same. I still don't see the difference.
        
           | innovatorved wrote:
           | [flagged]
        
           | devmor wrote:
           | It is the same, this is a self-hosted solution.
        
         | innovatorved wrote:
         | https://openai.com/research/whisper
        
       | v7n wrote:
       | Many live streamers, and platforms, would love to have custom
       | real-time transcription elements. I actually looked into this
       | exact project of yours when I thought about creating such a
       | thing.
       | 
       | Even if it meant delaying the broadcast for a second while
       | transcribing the accessibility value could be immense.
        
       | innovatorved wrote:
       | Whisper API - Speech to Text Transcription
       | 
       | This open source project provides a self-hostable API for speech
       | to text transcription using a finetuned Whisper ASR model. The
       | API allows you to easily convert audio files to text through HTTP
       | requests. Ideal for adding speech recognition capabilities to
       | your applications.
       | 
       | Key features:
       | 
       | - Uses a finetuned Whisper model for accurate speech recognition
       | - Simple HTTP API for audio file transcription - User level
       | access with API keys for managing usage - Self-hostable code for
       | your own speech transcription service - Quantized model
       | optimization for fast and efficient inference - Open source
       | implementation for customization and transparency
        
         | rrsp wrote:
         | Are you able to provide more information on the fine tuning?
         | Any improvement in WER and what language it was fine tuned in
         | and the size of the dataset used?
        
         | brianjking wrote:
         | What was the fine tune?
         | 
         | How does this compare to what is possible using
         | https://goodsnooze.gumroad.com/l/macwhisper for example?
         | 
         | Thanks!
        
         | stavros wrote:
         | This looks great, does recognition use the GPU? What's the
         | speed you get on it?
        
         | atajwala wrote:
         | Any plans to add phrase timestamps, channel separation and
         | other equivalent ASR features to make this API more
         | approachable?
        
           | innovatorved wrote:
           | I am working on the timestamp feature. You will be able to
           | see the option for timestamp soon.
        
             | atajwala wrote:
             | Appreciate it! All the best.
        
       ___________________________________________________________________
       (page generated 2023-08-22 23:01 UTC)