[HN Gopher] OTranscribe: A free and open tool for transcribing a...
       ___________________________________________________________________
        
       OTranscribe: A free and open tool for transcribing audio interviews
        
       Author : zerojames
       Score  : 368 points
       Date   : 2024-08-09 07:31 UTC (15 hours ago)
        
 (HTM) web link (otranscribe.com)
 (TXT) w3m dump (otranscribe.com)
        
       | jagermo wrote:
       | fantastic tool; I used it a lot to transcribe interviews during
       | plane travels where there was no internet, and I needed to fill
       | the time. Really useful to have if you do a lot of interviews
        
         | dotancohen wrote:
         | From the homepage:
         | 
         | > A free web app to take the pain out of transcribing recorded
         | interviews
         | 
         | How did you use a web app on the plane with no internet?
        
           | Havoc wrote:
           | It's MIT licensed so presumably self hosted
        
           | grandfunction wrote:
           | Ran the server on his or her laptop...
           | 
           | You don't need the internet to use a web browser
        
           | tampueroc wrote:
           | The web app saves an offline copy for use the first time you
           | open it.
           | https://otranscribe.com/help/#can_i_use_otranscribe_offline
        
           | jagermo wrote:
           | it works offline if you preload the website :)
        
       | TrojanHookworm wrote:
       | Use this a lot. It's nice and simple and has exactly the tools
       | you need (playback speed control, easy pause/play) and nothing
       | more. Greatly prefer it over automatic transcription tools give
       | you 40 pages of 'umm's and 'ahhhh's to filter through and edit.
        
         | stavros wrote:
         | Can you not give the transcript to an LLM to remove the umms
         | and ahhs?
        
           | BiteCode_dev wrote:
           | People not used to AI have blind spots that prevent them from
           | seing evident use case like this.
           | 
           | I'm always surprised at the amazed look of my friends when
           | they see me concretely use the tool. They just didn't picture
           | it until they saw it in action.
        
             | stavros wrote:
             | It's not even people not used to AI, I developed a tool
             | that uses AI to do something, and then kind of couldn't be
             | bothered to fix some of the output manually. It only
             | occurred to me days later that I can ask the AI to fix it.
        
       | phoronixrly wrote:
       | https://github.com/oTranscribe/oTranscribe
        
       | cube2222 wrote:
       | I needed to do this this week (transcribe an interview with
       | multiple speakers) and used
       | https://github.com/MahmoudAshraf97/whisper-diarization
       | 
       | Worked excellent.
       | 
       | It generates both a file that just contains a line per
       | uninterrupted speaker speech prefixed with the speaker number, as
       | well as a file with timestamps which I believe would be used as
       | subtitles.
        
         | RamblingCTO wrote:
         | I had better success with whisperx, as whisper-dia does
         | sometimes have weird issues I couldn't resolve:
         | https://github.com/m-bain/whisperX
        
           | cube2222 wrote:
           | iirc whisper-diarization uses whisperx under the hood.
           | 
           | I'll be honest, I haven't dived much into this as I just
           | needed something transcribed quickly, but when I was looking
           | at WhisperX I couldn't find a CLI that would just out of the
           | box give me a text file with a line per speaker statement
           | (not per word).
        
             | RamblingCTO wrote:
             | I use it like this:
             | 
             | whisperx $file int8 --min_speakers 3 --max_speakers 3
             | --language de --hf_token $token --diarize
        
             | stavros wrote:
             | > iirc whisper-diarization uses whisperx under the hood.
             | 
             | It seems like it does:
             | 
             | https://github.com/MahmoudAshraf97/whisper-
             | diarization/blob/...
        
         | adipasquale wrote:
         | I have had very good results using Spectropic [1], a hosted
         | Whisper Diarization API service as a platform. I found it cheap
         | and way easier and faster than setting up and using whisper-
         | diarization on my M1. Audiogest [2] is a web service built upon
         | Spectropic, I have not yet used it.
         | 
         | disclaimer : I am not affiliated in any way, just a happy
         | customer! I had some nice mail exchanges after bug reports with
         | the (I believe solo-)developer behind these tools.
         | 
         | ---
         | 
         | [1] https://spectropic.ai/
         | 
         | [2] https://audiogest.app/
        
           | thomasmol wrote:
           | Thanks for the shout-out and kind words!
           | 
           | Thomas here, maker of Spectropic and Audiogest. I am indeed
           | focused on building a simple and reliable Whisper +
           | diarization API. Also working on providing fine-tuned
           | versions of Whisper of non-English languages through the API.
           | 
           | Feel free to reach out to me if anyone is interested in this!
        
             | dchuk wrote:
             | Great looking API. Are you able to, or do you have plans,
             | for there to be automatic speaker identification based on
             | labeled samples of their voices? It would be great to
             | basically have a library of known speakers that are auto
             | matched when transcribing
        
               | thomasmol wrote:
               | Thanks! That is something I might offer in the future and
               | is definitely possible with a library like pyannote.
               | Would be really cool to add for sure.
               | 
               | I am also experimenting with post-processing transcripts
               | with LLMs to infer speaker names from a transcript. It
               | works pretty decent already but it's still a bit
               | expensive. I have this feature available under the
               | 'enhanced' model if you want to check it out:
               | https://docs.spectropic.ai/models/transcribe/enhanced
        
             | ukuina wrote:
             | Hi! Any plans to support streaming transcription with
             | diarization?
        
         | H8crilA wrote:
         | I often subtitle old, obscure, foreign language movies with
         | Whisper. Or random clips found on foreign Telegram/Twitter
         | channels. Paired up with some GPT for translation it works
         | great!
         | 
         | You can do this locally if you have enough (V)RAM, but I prefer
         | the OpenAI API, as usually I don't have enough at hand. And the
         | various Llamas aren't really quality on par with GPT-4. If you
         | only need Whisper, and no translation, then local execution is
         | indeed very viable. High quality Whisper fits in 4GB of (V)RAM.
        
         | wanderingmind wrote:
         | The problem with using OpenAI whisper is that its too slow on
         | CPU only machines. Whisper.CPP is blazing fast compared to
         | Whisper and I wish people build better diarization on top of
         | that.
        
           | stavros wrote:
           | What's OpenAI Whisper vs whisper.cpp? Do you mean whisper-
           | diarization uses the API?
        
             | Zambyte wrote:
             | https://github.com/openai/whisper
             | 
             | vs
             | 
             | https://github.com/ggerganov/whisper.cpp
             | 
             | They are two inference engines for running the whisper ASR
             | model, each with their own API AFAIK.
        
               | stavros wrote:
               | Ah I see, thanks. Hm, I would imagine that it's not hard
               | to make something that works with both (the surface area
               | of the API should be fairly small, I imagine), odd that
               | projects use the former and not the latter.
        
           | aidenn0 wrote:
           | Another advantage of Whisper.CPP is that it can use cublas to
           | accelerate models too large for your GPU memory; I can run
           | the medium and large models with cublas on my 1050, but only
           | the small if I use the pure GPU mode.
        
         | hubraumhugo wrote:
         | Fascinating how traditionally very complex and hard ML problems
         | are slowly becomming commodities with AI:
         | 
         | - transcription
         | 
         | - machine translation
         | 
         | - OCR
         | 
         | - image recognition
        
         | terribleperson wrote:
         | Does it hallucinate when there's dead air?
        
       | choya-love wrote:
       | Any new language support in the future? Fingers crossed for
       | japanese
        
         | fabianmg wrote:
         | Am I missing something?. For what I checked it supports every
         | language, as is yourself the one transcribing by hand. This is
         | just an UI to watch the video or audio while you're typing it.
        
         | comradesmith wrote:
         | https://tactiq.io is made for meetings, but also does uploaded
         | transcripts and supports Japanese!
        
       | ilt wrote:
       | I currently use Aiko's free iOS app which does offline
       | transcription using OpenAI's Whisper model. It has been working
       | pretty well for me so far. It can export in formats like SRT,
       | TXT, CSV, JSON and text with timestamps too.
       | https://sindresorhus.com/aiko
        
       | nullbar wrote:
       | Maybe it isn't perfectly clear, but OTranscribe isn't an
       | automatic speech-to-text tool, but instead, a UI for assisting in
       | manual transcribing.
       | 
       | So no AI here, folks.
        
         | space_oddity wrote:
         | Yep, it's designed to assist with manual transcription
        
       | BetterWhisper wrote:
       | If you are looking for something automatic that also allows you
       | to interact with your transcripts chatgpt style then I would
       | recommend https://www.videototextai.com/
        
         | Terretta wrote:
         | That cookies box though... Dark pattern (accept lots + accept
         | all, fake drag affordance, covering a quarter of the page) for
         | cookies doesn't bode well for privacy protections around the
         | transcripts.
        
           | BetterWhisper wrote:
           | You are allowed to delete any transcription you make and with
           | that we do not keep any copy of the transcripts :) . The
           | cookie banner is there to comply with the EU laws.
        
       | kimoz wrote:
       | Anyone knows a free tool for generating subtitles for movies and
       | series videos ?
        
         | doug_life wrote:
         | https://github.com/McCloudS/subgen worked very well for me. I
         | had a TV series where somehow the last few seasons timestamps
         | did not match up with subtitle files I could find online. I
         | used subgen and it worked surprisingly well.
        
         | BrunoJo wrote:
         | You can try https://www.transcripo.com/ for free
        
         | drtgh wrote:
         | SubtitleEdit is one of the most complete and has many online
         | tutorials from users.
         | 
         | Make sure they are recent tutorials because they will probably
         | mention how to use the automated generation tools/plugins that
         | wasn't available years ago.
         | 
         | https://github.com/SubtitleEdit/subtitleedit
        
       | teddyh wrote:
       | See also _TranscriberAG_ : <https://transag.sourceforge.net/>
        
       | dmitrykan wrote:
       | I'm working on the tool, that includes AI. My original target is
       | to test it on my https://www.youtube.com/c/VectorPodcast by
       | offering something that Lex Fridman does for his episodes.
       | 
       | Current features: 1. Download from YT 2. Transcribe using Vosk
       | (output has time codes included) 3. Speaker diarization using
       | pyannote - this isn't perfect and needs a bit more ironing out.
       | 
       | What needs to be done: 4. Store the transcription in a search
       | engine (can include vectors) 5. Implement a webapp
       | 
       | If anyone here is interested to join forces, let me know.
        
       | jrochkind1 wrote:
       | Kinda surprised to not have AI integration.
       | 
       | You do still need to proof and QA even AI results, if you want a
       | publication quality result, and do things like attribute who is
       | speaking when (at least Whisper can't do that), and correct
       | "unusual" last names and things. So I feel like people using AI
       | still need good tools for the correcting/finishing/proofing too,
       | that would be similar to the tools for non-assisted
       | transcription.
        
         | MattieTK wrote:
         | This was written a really long time ago by a former WSJ
         | Graphics reporter (Elliot Bentley) who is now at Datawrapper.
         | 
         | It is now operated by Muckrock and hasn't seen changes made to
         | it in a while.
         | 
         | That's why it doesn't have any of these integrations, the
         | technology just didn't exist.
        
           | jrochkind1 wrote:
           | Aha, good to know! That's actually important context, that
           | this is not a recent release, and doesn't necessarily have a
           | lot of ongoing development.
        
       | ciaran00 wrote:
       | Talio.ai allows you to do this with chatGPT style chat with the
       | transcript plus numerous other features https://talio.ai
        
       | bcherny wrote:
       | Looks cool! Unclear from the docs, but does it support non-
       | English languages? How about mixed-language interviews?
        
         | avodonosov wrote:
         | Yes! Any language you understand is supported!
        
       | avodonosov wrote:
       | I made a similar tool for making tables of contents for youtube
       | videos: https://youtoc.by/
       | 
       | Not developing it actively after I created tables of contents for
       | the several videos I needed, years ago. If I ever need it again,
       | I will probably work on mobile UI (aka responsive)
        
       | tkgally wrote:
       | I was curious how good a transcription I could get from what may
       | be the best multimoldal LLM currently, Gemini-1.5-Pro-
       | Experiment-0801, so I had it transcribe five minutes of an
       | interview between Ezra Klein and Nancy Pelosi from earlier today.
       | The results are here:
       | 
       | https://www.gally.net/temp/20240809geminitranscription/index...
       | 
       | Aside from some minor punctuation and capitalization issues,
       | Gemini's transcription looks nearly perfect to me. There were
       | only one or two words that I think it misheard. If I had
       | transcribed the audio myself, I would have made more mistakes
       | than that.
       | 
       | One passage struck me in particular:                 And then he
       | comes up with "weird," which becomes viral and the rest, and here
       | he is.
       | 
       | How did Gemini know to put "weird" in quotation marks, to
       | indicate--correctly--that the speaker was referring to Walz's use
       | of the word as a word? According to Politico, Walz first used the
       | word in that context in the media on July 23.
       | 
       | https://www.politico.com/news/2024/07/26/trump-vance-weird-0...
        
       | kgdiem wrote:
       | I started making an open source macOS app to do this with whisper
       | and potentially pyannote.
       | 
       | It is functional but a bit slow. I think using whisper directly
       | instead of swift bindings will help a lot.
       | 
       | Really interested in adding diarisation but having a lot of
       | trouble converting Pyannote to CoreML. Pyannote runs so slowly
       | with torch on CPU. Haven't gotten around putting my latest work
       | for that on GitHub yet.
       | 
       | Happy to accept contributions --
       | 
       | Some priorities right now:
       | 
       | * Fixing signing for local builds
       | 
       | * Replace swift whisper with whisper cpp
       | 
       | * Allowing users to provide their own models
       | 
       | https://github.com/Stack-Studio-Digital-Collective/Auditif
        
       | accidbuddy wrote:
       | Anyone knows one with transcription and translate in real time?
       | 
       | Nowadays, I use libretranslate/libretranslate and pluja/whishper
       | to do this, but not at real time.
        
         | Bayko wrote:
         | Ah this brings back memories. When I was in college with
         | limited money, I used to pirate movies and most of them didn't
         | have subtitles and I used to daydream of writing a VLC plug-in
         | which would real time generate subtitles. But I had better
         | things to do like play video games...
        
           | space_oddity wrote:
           | Many of us have had those ambitious tech ideas...
        
       | leiferik wrote:
       | You're always welcome to try my service TurboScribe
       | https://turboscribe.ai/ if you need a transcript of an
       | audio/video file. It's 100% free up to 3 files per day (30
       | minutes per file) and the paid plan is unlimited and transcribes
       | files up to 10 hours long each. It also supports speaker
       | recognition, common export formats (TXT, DOCX, PDF, SRT, CSV), as
       | well as some AI tools for working with your transcript.
        
         | rsingel wrote:
         | This looks great. Did you have an API or plan to release one?
        
           | leiferik wrote:
           | Thanks! Nothing to announce on the API front right now, but
           | appreciate you asking :)
        
       | justinclift wrote:
       | From their FAQ:                   Does oTranscribe automatically
       | convert audio into text?                  Sorry! It doesn't.
       | oTranscribe makes the manual task of transcribing         audio a
       | lot less painful. But you still have to do the transcription.
        
       | btown wrote:
       | Are there any open-source or paid apps/shareware/freeware that
       | can:
       | 
       | - Transcribe word-by-word in real time as audio is recorded
       | 
       | - Work entirely locally
       | 
       | - Use relatively recent open-source local models?
       | 
       | I've been using otter.ai for real-time meeting transcriptions -
       | letting me multitask and instantly catch up if I'm asked a
       | question by skimming the most recent few seconds worth of the
       | transcript - but it's far from perfect and occasionally their
       | real-time service has significant transcription delays, not to
       | mention it requires internet connectivity.
       | 
       | Most of the Whisper-based apps out there, though, as well as
       | (when I last checked) the whisper.cpp demo code, require an
       | entire recording to be ingested at once. There are others that
       | rely on e.g. Apple's dictation frameworks, which is a bit dated
       | in capability at the moment.
       | 
       | Anything folks are using out there?
        
         | uohzxela wrote:
         | I have built my own local-first solution to transcribe entirely
         | locally in real time word by word, driven by a different need
         | (I'm hard of hearing). It's my daily driver for transcribing
         | meetings, interviews, etc. Because of its local-first
         | capability, I do not have to worry about privacy concerns when
         | transcribing meetings at work as all data stays on my machine.
         | It's about as fast as Otter.ai although there's definitely room
         | for improvements in terms of UX and speed. Caveat is that it
         | only works on MacBooks with Apple silicon. Happy to chat over
         | email (see my HN profile).
        
           | WaitWaitWha wrote:
           | I have some staff with combined hearing and visual needs.
           | Have you researched the one-, two- all-party consent
           | requirements? Asking because I hope to identify transcription
           | as "non-recording".
        
             | btown wrote:
             | California has an exception for hearing aids and other
             | similar devices, but it's unclear if transcription aids
             | count, or if this has been tested in court.
             | https://codes.findlaw.com/ca/penal-code/pen-sect-632/ (Not
             | a lawyer, this is not legal advice.)
        
               | noah_buddy wrote:
               | If it were ephemeral? Would that change this? Say
               | recording the meeting locally a 5 minute frame then
               | updating a meeting summary?
        
               | smeej wrote:
               | Do you mean ephemeral, or are you actually wondering
               | about something implanted under the skin? I'd think/hope
               | if it goes under the skin, it ends up in "hearing aid"
               | territory. I'm less sure about if it doesn't persist.
        
               | noah_buddy wrote:
               | Yup, typo, sorry
        
               | brimwats wrote:
               | it's more likely in cochlear implant territory (there are
               | different laws and regulations for implants vs aids
               | depending on locale)
        
             | hansvm wrote:
             | Two/all-party consent are hacky workarounds for the actual
             | harm being inflicted (valid goals including not having your
             | microwave inform Google's ad servers, not recording out-of-
             | context jokes as evidence to imprison people, ... --
             | invalid goals caught up in the collateral damage include
             | topics like the current one about hearing issues (note that
             | a sufficiently accurate transcription service has all the
             | same privacy problems 2-party consent tries to protect
             | against, maybe more since it's more easily searchable)).
             | 
             | I'd be in favor of some startup pulling an Uber or AirBnB
             | and blatantly violating those laws to the benefit of the
             | deaf or elderly if it meant we could get something better
             | on the books.
        
             | CyberDildonics wrote:
             | What did your own research turn up?
        
           | smeej wrote:
           | I was so excited until the very end. I have the wrong
           | hardware.
        
         | baby_souffle wrote:
         | > Are there any open-source or paid apps/shareware/freeware
         | 
         | Google Pixel phones have this feature and it works _very_ well.
        
           | neves wrote:
           | Have you tried for non English languages?
           | 
           | New Microsoft Surfaces have this feature but just works for
           | English
        
           | ericjmorey wrote:
           | How is that feature accessed? Or what does Google call it so
           | I can search for it.
        
             | Groxx wrote:
             | There's a captioning button under the volume slider, and I
             | think it's called "live captions" or something in settings.
             | Just tap the button and it'll start.
             | 
             | https://support.google.com/accessibility/android/answer/935
             | 0...
        
             | abecedarius wrote:
             | Live Transcribe in the accessibility settings. AFAIK it's
             | available on any fairly recent Android phone. I bought a
             | Pixel tablet for no other reason but to run it -- nothing
             | else I've tried comes close for local-only continuous
             | transcribe-as-they-speak. (iOS has a similar feature also
             | under accessibility; it's good but not at the same level.
             | Of course I'd love to see an open-source solution.)
             | 
             | This was for English. One problem it took me a while to
             | realize: when I switched it to transcribe a secondary
             | language, it was not doing it on-device anymore. You can
             | tell the difference by setting airplane mode.
        
         | andrei-akopian wrote:
         | futo.org has FOSS voice input android app (voiceinput.futo.org)
         | and live captions (https://github.com/abb128/LiveCaptions) for
         | Linux. They specifically developed their own model that does
         | fast real time transcriptions.
         | 
         | Not sure if that helps for your specific usecase.
        
         | smeej wrote:
         | I've been using Transcribro[0] on Android/GrapheneOS. It's FOSS
         | and only local, and while it's not word-for-word real-time, it
         | doesn't have to wait for the whole audio to be uploaded before
         | it can work. This is on a Pixel 5a, so hardly impressive
         | hardware.
         | 
         | It works well enough that I use it with Telegram to shove
         | messages over to my Linux machine when I don't feel like typing
         | them out, which is such an unsophisticated hack, but is getting
         | the job done. I spent a couple hours trying to find a Linux-
         | native alternative, or even get this running in Waydroid, and
         | couldn't find anything that worked as well, so I decided not to
         | let the "smooth" become the enemy of the "good enough to get
         | the job done."
         | 
         | [0] https://github.com/soupslurpr/Transcribro
        
         | alfredgg wrote:
         | I helped coding oTranscribe+ [0], which does something similar
         | to what you are asking for. Using ElectronJS and the current,
         | at that moment, version of oTranscribe, there is this desktop
         | application. It also exists as web version and PWA [1].
         | 
         | Language models were those from BSC (Barcelona Supercomputing
         | Center) at the time. The transcription is done via WASM, using
         | Vosk [2] as base.
         | 
         | I hope it fits.
         | 
         | [0] https://github.com/projecte-aina/oTranscribe-plus [1]
         | https://otranscribe.bsc.es/ [2]
         | https://github.com/alphacep/vosk-api
        
           | smeej wrote:
           | Is there a way to get it to punctuate? Or does it only jot
           | down words?
        
             | alfredgg wrote:
             | It does not punctuate. It just transcribes what is in an
             | audio stream.
        
               | smeej wrote:
               | Punctuating is enough of the hard work of generating a
               | transcript that it's not very useful to me without this,
               | unfortunately. What use cases are you planning for where
               | the words themselves are the desired output?
        
         | ukuina wrote:
         | Yes! WhisperKit's TestFlight app does all three on Apple
         | Silicon: https://www.takeargmax.com/blog/whisperkit
         | 
         | I wish they had Speaker Diarization, but they are waiting for
         | upstream Whisper to add it:
         | https://github.com/argmaxinc/WhisperKit/issues/31
        
       | ulrischa wrote:
       | Pretty amazing what a webapp an do. I whished there were more
       | lile them and not all these native apps
        
       | matejmecka wrote:
       | Just pitching in a transcription tool that lets you transcribe
       | video and audio files using Whisper and WASM in your browser, and
       | get a .txt, .srt, .vtt file. Maybe in the future support for
       | Whisper Turbo?
       | 
       | https://video2srt.ccextractor.org/
       | 
       | Disclaimer: Working on this project.
        
       | neves wrote:
       | Does anybody tested it with Brazilian Portuguese? It is a hard
       | problem, since we have too many accents.
        
         | dmd wrote:
         | I don't understand what the issue is. You don't know how to
         | type the different diacritical marks? Or the textbox isn't
         | accepting them? (Which seems like it would be a browser issue,
         | not an issue with the site.)
        
           | gjvnq wrote:
           | I think that the commentor meant accents as in regional
           | dialects.
        
             | dmd wrote:
             | What would that have to do with anything? Isn't that a
             | problem for the person doing the transcribing?
        
       | space_oddity wrote:
       | oTranscribe is a free option for transcription but in many cases
       | it's just too simple
        
       | bilater wrote:
       | If you just want quick transcriptions of YouTube video this works
       | pretty well https://www.you-tldr.com/
        
       ___________________________________________________________________
       (page generated 2024-08-09 23:00 UTC)