[HN Gopher] OTranscribe: A free and open tool for transcribing a...
___________________________________________________________________
OTranscribe: A free and open tool for transcribing audio interviews
Author : zerojames
Score : 368 points
Date : 2024-08-09 07:31 UTC (15 hours ago)
(HTM) web link (otranscribe.com)
(TXT) w3m dump (otranscribe.com)
| jagermo wrote:
| fantastic tool; I used it a lot to transcribe interviews during
| plane travels where there was no internet, and I needed to fill
| the time. Really useful to have if you do a lot of interviews
| dotancohen wrote:
| From the homepage:
|
| > A free web app to take the pain out of transcribing recorded
| interviews
|
| How did you use a web app on the plane with no internet?
| Havoc wrote:
| It's MIT licensed so presumably self hosted
| grandfunction wrote:
| Ran the server on his or her laptop...
|
| You don't need the internet to use a web browser
| tampueroc wrote:
| The web app saves an offline copy for use the first time you
| open it.
| https://otranscribe.com/help/#can_i_use_otranscribe_offline
| jagermo wrote:
| it works offline if you preload the website :)
| TrojanHookworm wrote:
| Use this a lot. It's nice and simple and has exactly the tools
| you need (playback speed control, easy pause/play) and nothing
| more. Greatly prefer it over automatic transcription tools give
| you 40 pages of 'umm's and 'ahhhh's to filter through and edit.
| stavros wrote:
| Can you not give the transcript to an LLM to remove the umms
| and ahhs?
| BiteCode_dev wrote:
| People not used to AI have blind spots that prevent them from
| seing evident use case like this.
|
| I'm always surprised at the amazed look of my friends when
| they see me concretely use the tool. They just didn't picture
| it until they saw it in action.
| stavros wrote:
| It's not even people not used to AI, I developed a tool
| that uses AI to do something, and then kind of couldn't be
| bothered to fix some of the output manually. It only
| occurred to me days later that I can ask the AI to fix it.
| phoronixrly wrote:
| https://github.com/oTranscribe/oTranscribe
| cube2222 wrote:
| I needed to do this this week (transcribe an interview with
| multiple speakers) and used
| https://github.com/MahmoudAshraf97/whisper-diarization
|
| Worked excellent.
|
| It generates both a file that just contains a line per
| uninterrupted speaker speech prefixed with the speaker number, as
| well as a file with timestamps which I believe would be used as
| subtitles.
| RamblingCTO wrote:
| I had better success with whisperx, as whisper-dia does
| sometimes have weird issues I couldn't resolve:
| https://github.com/m-bain/whisperX
| cube2222 wrote:
| iirc whisper-diarization uses whisperx under the hood.
|
| I'll be honest, I haven't dived much into this as I just
| needed something transcribed quickly, but when I was looking
| at WhisperX I couldn't find a CLI that would just out of the
| box give me a text file with a line per speaker statement
| (not per word).
| RamblingCTO wrote:
| I use it like this:
|
| whisperx $file int8 --min_speakers 3 --max_speakers 3
| --language de --hf_token $token --diarize
| stavros wrote:
| > iirc whisper-diarization uses whisperx under the hood.
|
| It seems like it does:
|
| https://github.com/MahmoudAshraf97/whisper-
| diarization/blob/...
| adipasquale wrote:
| I have had very good results using Spectropic [1], a hosted
| Whisper Diarization API service as a platform. I found it cheap
| and way easier and faster than setting up and using whisper-
| diarization on my M1. Audiogest [2] is a web service built upon
| Spectropic, I have not yet used it.
|
| disclaimer : I am not affiliated in any way, just a happy
| customer! I had some nice mail exchanges after bug reports with
| the (I believe solo-)developer behind these tools.
|
| ---
|
| [1] https://spectropic.ai/
|
| [2] https://audiogest.app/
| thomasmol wrote:
| Thanks for the shout-out and kind words!
|
| Thomas here, maker of Spectropic and Audiogest. I am indeed
| focused on building a simple and reliable Whisper +
| diarization API. Also working on providing fine-tuned
| versions of Whisper of non-English languages through the API.
|
| Feel free to reach out to me if anyone is interested in this!
| dchuk wrote:
| Great looking API. Are you able to, or do you have plans,
| for there to be automatic speaker identification based on
| labeled samples of their voices? It would be great to
| basically have a library of known speakers that are auto
| matched when transcribing
| thomasmol wrote:
| Thanks! That is something I might offer in the future and
| is definitely possible with a library like pyannote.
| Would be really cool to add for sure.
|
| I am also experimenting with post-processing transcripts
| with LLMs to infer speaker names from a transcript. It
| works pretty decent already but it's still a bit
| expensive. I have this feature available under the
| 'enhanced' model if you want to check it out:
| https://docs.spectropic.ai/models/transcribe/enhanced
| ukuina wrote:
| Hi! Any plans to support streaming transcription with
| diarization?
| H8crilA wrote:
| I often subtitle old, obscure, foreign language movies with
| Whisper. Or random clips found on foreign Telegram/Twitter
| channels. Paired up with some GPT for translation it works
| great!
|
| You can do this locally if you have enough (V)RAM, but I prefer
| the OpenAI API, as usually I don't have enough at hand. And the
| various Llamas aren't really quality on par with GPT-4. If you
| only need Whisper, and no translation, then local execution is
| indeed very viable. High quality Whisper fits in 4GB of (V)RAM.
| wanderingmind wrote:
| The problem with using OpenAI whisper is that its too slow on
| CPU only machines. Whisper.CPP is blazing fast compared to
| Whisper and I wish people build better diarization on top of
| that.
| stavros wrote:
| What's OpenAI Whisper vs whisper.cpp? Do you mean whisper-
| diarization uses the API?
| Zambyte wrote:
| https://github.com/openai/whisper
|
| vs
|
| https://github.com/ggerganov/whisper.cpp
|
| They are two inference engines for running the whisper ASR
| model, each with their own API AFAIK.
| stavros wrote:
| Ah I see, thanks. Hm, I would imagine that it's not hard
| to make something that works with both (the surface area
| of the API should be fairly small, I imagine), odd that
| projects use the former and not the latter.
| aidenn0 wrote:
| Another advantage of Whisper.CPP is that it can use cublas to
| accelerate models too large for your GPU memory; I can run
| the medium and large models with cublas on my 1050, but only
| the small if I use the pure GPU mode.
| hubraumhugo wrote:
| Fascinating how traditionally very complex and hard ML problems
| are slowly becomming commodities with AI:
|
| - transcription
|
| - machine translation
|
| - OCR
|
| - image recognition
| terribleperson wrote:
| Does it hallucinate when there's dead air?
| choya-love wrote:
| Any new language support in the future? Fingers crossed for
| japanese
| fabianmg wrote:
| Am I missing something?. For what I checked it supports every
| language, as is yourself the one transcribing by hand. This is
| just an UI to watch the video or audio while you're typing it.
| comradesmith wrote:
| https://tactiq.io is made for meetings, but also does uploaded
| transcripts and supports Japanese!
| ilt wrote:
| I currently use Aiko's free iOS app which does offline
| transcription using OpenAI's Whisper model. It has been working
| pretty well for me so far. It can export in formats like SRT,
| TXT, CSV, JSON and text with timestamps too.
| https://sindresorhus.com/aiko
| nullbar wrote:
| Maybe it isn't perfectly clear, but OTranscribe isn't an
| automatic speech-to-text tool, but instead, a UI for assisting in
| manual transcribing.
|
| So no AI here, folks.
| space_oddity wrote:
| Yep, it's designed to assist with manual transcription
| BetterWhisper wrote:
| If you are looking for something automatic that also allows you
| to interact with your transcripts chatgpt style then I would
| recommend https://www.videototextai.com/
| Terretta wrote:
| That cookies box though... Dark pattern (accept lots + accept
| all, fake drag affordance, covering a quarter of the page) for
| cookies doesn't bode well for privacy protections around the
| transcripts.
| BetterWhisper wrote:
| You are allowed to delete any transcription you make and with
| that we do not keep any copy of the transcripts :) . The
| cookie banner is there to comply with the EU laws.
| kimoz wrote:
| Anyone knows a free tool for generating subtitles for movies and
| series videos ?
| doug_life wrote:
| https://github.com/McCloudS/subgen worked very well for me. I
| had a TV series where somehow the last few seasons timestamps
| did not match up with subtitle files I could find online. I
| used subgen and it worked surprisingly well.
| BrunoJo wrote:
| You can try https://www.transcripo.com/ for free
| drtgh wrote:
| SubtitleEdit is one of the most complete and has many online
| tutorials from users.
|
| Make sure they are recent tutorials because they will probably
| mention how to use the automated generation tools/plugins that
| wasn't available years ago.
|
| https://github.com/SubtitleEdit/subtitleedit
| teddyh wrote:
| See also _TranscriberAG_ : <https://transag.sourceforge.net/>
| dmitrykan wrote:
| I'm working on the tool, that includes AI. My original target is
| to test it on my https://www.youtube.com/c/VectorPodcast by
| offering something that Lex Fridman does for his episodes.
|
| Current features: 1. Download from YT 2. Transcribe using Vosk
| (output has time codes included) 3. Speaker diarization using
| pyannote - this isn't perfect and needs a bit more ironing out.
|
| What needs to be done: 4. Store the transcription in a search
| engine (can include vectors) 5. Implement a webapp
|
| If anyone here is interested to join forces, let me know.
| jrochkind1 wrote:
| Kinda surprised to not have AI integration.
|
| You do still need to proof and QA even AI results, if you want a
| publication quality result, and do things like attribute who is
| speaking when (at least Whisper can't do that), and correct
| "unusual" last names and things. So I feel like people using AI
| still need good tools for the correcting/finishing/proofing too,
| that would be similar to the tools for non-assisted
| transcription.
| MattieTK wrote:
| This was written a really long time ago by a former WSJ
| Graphics reporter (Elliot Bentley) who is now at Datawrapper.
|
| It is now operated by Muckrock and hasn't seen changes made to
| it in a while.
|
| That's why it doesn't have any of these integrations, the
| technology just didn't exist.
| jrochkind1 wrote:
| Aha, good to know! That's actually important context, that
| this is not a recent release, and doesn't necessarily have a
| lot of ongoing development.
| ciaran00 wrote:
| Talio.ai allows you to do this with chatGPT style chat with the
| transcript plus numerous other features https://talio.ai
| bcherny wrote:
| Looks cool! Unclear from the docs, but does it support non-
| English languages? How about mixed-language interviews?
| avodonosov wrote:
| Yes! Any language you understand is supported!
| avodonosov wrote:
| I made a similar tool for making tables of contents for youtube
| videos: https://youtoc.by/
|
| Not developing it actively after I created tables of contents for
| the several videos I needed, years ago. If I ever need it again,
| I will probably work on mobile UI (aka responsive)
| tkgally wrote:
| I was curious how good a transcription I could get from what may
| be the best multimoldal LLM currently, Gemini-1.5-Pro-
| Experiment-0801, so I had it transcribe five minutes of an
| interview between Ezra Klein and Nancy Pelosi from earlier today.
| The results are here:
|
| https://www.gally.net/temp/20240809geminitranscription/index...
|
| Aside from some minor punctuation and capitalization issues,
| Gemini's transcription looks nearly perfect to me. There were
| only one or two words that I think it misheard. If I had
| transcribed the audio myself, I would have made more mistakes
| than that.
|
| One passage struck me in particular: And then he
| comes up with "weird," which becomes viral and the rest, and here
| he is.
|
| How did Gemini know to put "weird" in quotation marks, to
| indicate--correctly--that the speaker was referring to Walz's use
| of the word as a word? According to Politico, Walz first used the
| word in that context in the media on July 23.
|
| https://www.politico.com/news/2024/07/26/trump-vance-weird-0...
| kgdiem wrote:
| I started making an open source macOS app to do this with whisper
| and potentially pyannote.
|
| It is functional but a bit slow. I think using whisper directly
| instead of swift bindings will help a lot.
|
| Really interested in adding diarisation but having a lot of
| trouble converting Pyannote to CoreML. Pyannote runs so slowly
| with torch on CPU. Haven't gotten around putting my latest work
| for that on GitHub yet.
|
| Happy to accept contributions --
|
| Some priorities right now:
|
| * Fixing signing for local builds
|
| * Replace swift whisper with whisper cpp
|
| * Allowing users to provide their own models
|
| https://github.com/Stack-Studio-Digital-Collective/Auditif
| accidbuddy wrote:
| Anyone knows one with transcription and translate in real time?
|
| Nowadays, I use libretranslate/libretranslate and pluja/whishper
| to do this, but not at real time.
| Bayko wrote:
| Ah this brings back memories. When I was in college with
| limited money, I used to pirate movies and most of them didn't
| have subtitles and I used to daydream of writing a VLC plug-in
| which would real time generate subtitles. But I had better
| things to do like play video games...
| space_oddity wrote:
| Many of us have had those ambitious tech ideas...
| leiferik wrote:
| You're always welcome to try my service TurboScribe
| https://turboscribe.ai/ if you need a transcript of an
| audio/video file. It's 100% free up to 3 files per day (30
| minutes per file) and the paid plan is unlimited and transcribes
| files up to 10 hours long each. It also supports speaker
| recognition, common export formats (TXT, DOCX, PDF, SRT, CSV), as
| well as some AI tools for working with your transcript.
| rsingel wrote:
| This looks great. Did you have an API or plan to release one?
| leiferik wrote:
| Thanks! Nothing to announce on the API front right now, but
| appreciate you asking :)
| justinclift wrote:
| From their FAQ: Does oTranscribe automatically
| convert audio into text? Sorry! It doesn't.
| oTranscribe makes the manual task of transcribing audio a
| lot less painful. But you still have to do the transcription.
| btown wrote:
| Are there any open-source or paid apps/shareware/freeware that
| can:
|
| - Transcribe word-by-word in real time as audio is recorded
|
| - Work entirely locally
|
| - Use relatively recent open-source local models?
|
| I've been using otter.ai for real-time meeting transcriptions -
| letting me multitask and instantly catch up if I'm asked a
| question by skimming the most recent few seconds worth of the
| transcript - but it's far from perfect and occasionally their
| real-time service has significant transcription delays, not to
| mention it requires internet connectivity.
|
| Most of the Whisper-based apps out there, though, as well as
| (when I last checked) the whisper.cpp demo code, require an
| entire recording to be ingested at once. There are others that
| rely on e.g. Apple's dictation frameworks, which is a bit dated
| in capability at the moment.
|
| Anything folks are using out there?
| uohzxela wrote:
| I have built my own local-first solution to transcribe entirely
| locally in real time word by word, driven by a different need
| (I'm hard of hearing). It's my daily driver for transcribing
| meetings, interviews, etc. Because of its local-first
| capability, I do not have to worry about privacy concerns when
| transcribing meetings at work as all data stays on my machine.
| It's about as fast as Otter.ai although there's definitely room
| for improvements in terms of UX and speed. Caveat is that it
| only works on MacBooks with Apple silicon. Happy to chat over
| email (see my HN profile).
| WaitWaitWha wrote:
| I have some staff with combined hearing and visual needs.
| Have you researched the one-, two- all-party consent
| requirements? Asking because I hope to identify transcription
| as "non-recording".
| btown wrote:
| California has an exception for hearing aids and other
| similar devices, but it's unclear if transcription aids
| count, or if this has been tested in court.
| https://codes.findlaw.com/ca/penal-code/pen-sect-632/ (Not
| a lawyer, this is not legal advice.)
| noah_buddy wrote:
| If it were ephemeral? Would that change this? Say
| recording the meeting locally a 5 minute frame then
| updating a meeting summary?
| smeej wrote:
| Do you mean ephemeral, or are you actually wondering
| about something implanted under the skin? I'd think/hope
| if it goes under the skin, it ends up in "hearing aid"
| territory. I'm less sure about if it doesn't persist.
| noah_buddy wrote:
| Yup, typo, sorry
| brimwats wrote:
| it's more likely in cochlear implant territory (there are
| different laws and regulations for implants vs aids
| depending on locale)
| hansvm wrote:
| Two/all-party consent are hacky workarounds for the actual
| harm being inflicted (valid goals including not having your
| microwave inform Google's ad servers, not recording out-of-
| context jokes as evidence to imprison people, ... --
| invalid goals caught up in the collateral damage include
| topics like the current one about hearing issues (note that
| a sufficiently accurate transcription service has all the
| same privacy problems 2-party consent tries to protect
| against, maybe more since it's more easily searchable)).
|
| I'd be in favor of some startup pulling an Uber or AirBnB
| and blatantly violating those laws to the benefit of the
| deaf or elderly if it meant we could get something better
| on the books.
| CyberDildonics wrote:
| What did your own research turn up?
| smeej wrote:
| I was so excited until the very end. I have the wrong
| hardware.
| baby_souffle wrote:
| > Are there any open-source or paid apps/shareware/freeware
|
| Google Pixel phones have this feature and it works _very_ well.
| neves wrote:
| Have you tried for non English languages?
|
| New Microsoft Surfaces have this feature but just works for
| English
| ericjmorey wrote:
| How is that feature accessed? Or what does Google call it so
| I can search for it.
| Groxx wrote:
| There's a captioning button under the volume slider, and I
| think it's called "live captions" or something in settings.
| Just tap the button and it'll start.
|
| https://support.google.com/accessibility/android/answer/935
| 0...
| abecedarius wrote:
| Live Transcribe in the accessibility settings. AFAIK it's
| available on any fairly recent Android phone. I bought a
| Pixel tablet for no other reason but to run it -- nothing
| else I've tried comes close for local-only continuous
| transcribe-as-they-speak. (iOS has a similar feature also
| under accessibility; it's good but not at the same level.
| Of course I'd love to see an open-source solution.)
|
| This was for English. One problem it took me a while to
| realize: when I switched it to transcribe a secondary
| language, it was not doing it on-device anymore. You can
| tell the difference by setting airplane mode.
| andrei-akopian wrote:
| futo.org has FOSS voice input android app (voiceinput.futo.org)
| and live captions (https://github.com/abb128/LiveCaptions) for
| Linux. They specifically developed their own model that does
| fast real time transcriptions.
|
| Not sure if that helps for your specific usecase.
| smeej wrote:
| I've been using Transcribro[0] on Android/GrapheneOS. It's FOSS
| and only local, and while it's not word-for-word real-time, it
| doesn't have to wait for the whole audio to be uploaded before
| it can work. This is on a Pixel 5a, so hardly impressive
| hardware.
|
| It works well enough that I use it with Telegram to shove
| messages over to my Linux machine when I don't feel like typing
| them out, which is such an unsophisticated hack, but is getting
| the job done. I spent a couple hours trying to find a Linux-
| native alternative, or even get this running in Waydroid, and
| couldn't find anything that worked as well, so I decided not to
| let the "smooth" become the enemy of the "good enough to get
| the job done."
|
| [0] https://github.com/soupslurpr/Transcribro
| alfredgg wrote:
| I helped coding oTranscribe+ [0], which does something similar
| to what you are asking for. Using ElectronJS and the current,
| at that moment, version of oTranscribe, there is this desktop
| application. It also exists as web version and PWA [1].
|
| Language models were those from BSC (Barcelona Supercomputing
| Center) at the time. The transcription is done via WASM, using
| Vosk [2] as base.
|
| I hope it fits.
|
| [0] https://github.com/projecte-aina/oTranscribe-plus [1]
| https://otranscribe.bsc.es/ [2]
| https://github.com/alphacep/vosk-api
| smeej wrote:
| Is there a way to get it to punctuate? Or does it only jot
| down words?
| alfredgg wrote:
| It does not punctuate. It just transcribes what is in an
| audio stream.
| smeej wrote:
| Punctuating is enough of the hard work of generating a
| transcript that it's not very useful to me without this,
| unfortunately. What use cases are you planning for where
| the words themselves are the desired output?
| ukuina wrote:
| Yes! WhisperKit's TestFlight app does all three on Apple
| Silicon: https://www.takeargmax.com/blog/whisperkit
|
| I wish they had Speaker Diarization, but they are waiting for
| upstream Whisper to add it:
| https://github.com/argmaxinc/WhisperKit/issues/31
| ulrischa wrote:
| Pretty amazing what a webapp an do. I whished there were more
| lile them and not all these native apps
| matejmecka wrote:
| Just pitching in a transcription tool that lets you transcribe
| video and audio files using Whisper and WASM in your browser, and
| get a .txt, .srt, .vtt file. Maybe in the future support for
| Whisper Turbo?
|
| https://video2srt.ccextractor.org/
|
| Disclaimer: Working on this project.
| neves wrote:
| Does anybody tested it with Brazilian Portuguese? It is a hard
| problem, since we have too many accents.
| dmd wrote:
| I don't understand what the issue is. You don't know how to
| type the different diacritical marks? Or the textbox isn't
| accepting them? (Which seems like it would be a browser issue,
| not an issue with the site.)
| gjvnq wrote:
| I think that the commentor meant accents as in regional
| dialects.
| dmd wrote:
| What would that have to do with anything? Isn't that a
| problem for the person doing the transcribing?
| space_oddity wrote:
| oTranscribe is a free option for transcription but in many cases
| it's just too simple
| bilater wrote:
| If you just want quick transcriptions of YouTube video this works
| pretty well https://www.you-tldr.com/
___________________________________________________________________
(page generated 2024-08-09 23:00 UTC)