[HN Gopher] YouTubeTranscript.com
___________________________________________________________________
YouTubeTranscript.com
Author : fragmede
Score : 174 points
Date : 2022-12-18 16:38 UTC (6 hours ago)
(HTM) web link (youtubetranscript.com)
(TXT) w3m dump (youtubetranscript.com)
| EGreg wrote:
| Who built this?
|
| We want to partner with you on a topl that autogenerates clips of
| any video based on the topic start and end
| [deleted]
| dukeofdoom wrote:
| Something like this would be nice to be able to search local
| videos for specific keywords spoken too.
| breck wrote:
| https://youtubetranscript.com/?v=DvxxdZpMFHg
|
| "Error: transcripts disabled for that video"
|
| Why?
| arboles wrote:
| Youtube didn't generate captions for that video
| banana_giraffe wrote:
| If you want an CLI version of a similar idea, you can use yt-dlp
| and some simple jq to pull down the captions for a file:
| curl `\ yt-dlp -j
| "https://www.youtube.com/watch?v=aeWyp2vXxqA" | \ jq -r
| '.automatic_captions.en[] | select(.ext=="json3") | .url'`
| ptspts wrote:
| Not all YouTube videos with spoken text have automatic
| captions.
| arboles wrote:
| https://news.ycombinator.com/item?id=34041455
| modeless wrote:
| A supremely useful site that searches YouTube transcripts is
| https://youglish.com. It shows you pronunciations in context for
| any word or name.
| arboles wrote:
| Thanks for the link! This site actually has a database of
| youtube transcripts unlike OP. Shame you can't search fixed
| strings, like two words in exact order. Though it seems
| genuinely useful for learning pronunciation as advertised.
| c7DJTLrn wrote:
| Pretty nice. The sliver of content still worth watching on
| YouTube doesn't have repetitive stuff or padding to make it to
| the 10 min mark though.
|
| If you go to the homepage with clear cookies it's just endless
| amounts of utterly dogshit cookie cutter content. Same clickbait
| thumbnails with a person pulling an idiotic expression. Even the
| videos masquerading as educational are entertainment at best. If
| I had kids I'd do everything in my power to keep them away from
| YouTube.
| SkeuomorphicBee wrote:
| Why it is hard-coded to English? When I try to transcribe a video
| in any other language it throws the error:
|
| > No transcripts were found for any of the requested language
| codes: ('en',) For this video ([...]) transcripts are available
| in the following languages: [...]
|
| It even knows what language is available, so why no dump that
| instead?
| aardvarkr wrote:
| Probably because it's a hackathon style project that was
| slapped together and isn't intended to support every use case.
| I'd recommend reaching out to the author with your feedback
| darepublic wrote:
| What I've wanted it search by transcript of past videos I've
| watched. With something like this it seems reasonable to imagine
| having a set up where every video you navigate to gets
| transcribed and test is indexed for later search
| amelius wrote:
| How can this be so fast? I tried it with two random urls, and the
| transcripts were instant, like less than 100ms.
| charcircuit wrote:
| YouTube already creates transcripts for accessibility and for
| feeding into other ML models.
| samanator wrote:
| Likely cached. Try with a long video with few views.
|
| Edit: after reading other comments it seems this may be using
| an undocumented api to retrieve the data.
| FinalDestiny wrote:
| It appears to be using the YouTube auto-generated captions. The
| output, spacing, and punctuation are identical.
| seydor wrote:
| This is great and works well. What is the copyright status of
| transcripts?
| kube-system wrote:
| They are owned by the copyright owner of the underlying audio.
| seydor wrote:
| but for example, is it fair use to reproduce? what about
| indexing?
| [deleted]
| kube-system wrote:
| Depends on why it is being done.
| wantlotsofcurry wrote:
| Not sure on the transcript front, but the owner may want to
| consider removing 'youtube' from their name.
| arboles wrote:
| This UI and Youtube's UI for transcripts are really nice. When
| I'm looking for a particular piece of information I can just
| Ctrl+F and click on the match to play from there. Youtube used to
| auto-generate subtitles, now it also formats subtitles as
| transcripts. I wish offline media players had this functionality,
| if I get distracted for a few seconds I don't have to watch those
| seconds again, I can speedread over the past couple lines.
| arboles wrote:
| Call it "panoramic subtitles"
| politelemon wrote:
| alpb wrote:
| Fwiw YouTube already has a feature for this. Click the "..." next
| to the share and click Show Transcript. There are also extensions
| like https://chrome.google.com/webstore/detail/youtube-
| captions-s... that makes it easy to search them in a popup.
| dobladov wrote:
| They seem to have moved the functionality to the end of the
| description, and there you can find the "Show captions" button.
|
| The extension I made to export the transcript was based on this
| YouTube functionality, I should update the instructions now.
|
| https://chrome.google.com/webstore/detail/youtube2anki/boebb...
| modeless wrote:
| Regular "find in page" works to search the transcript on
| YouTube. I use it often.
| cavisne wrote:
| This script for whisper.cpp works really well
|
| https://github.com/ggerganov/whisper.cpp/blob/master/example...
|
| for my purposes I changed the output from subtitles to txt (so I
| could pipe the result into chatgpt)
| codetrotter wrote:
| > so I could pipe the result into chatgpt
|
| Tell us more :)
| cavisne wrote:
| Nothing too exciting, just "summarize this" followed by the
| transcript in quotes, it works very well
| gbertb wrote:
| Is this utilizing whisper to transcribe?
| arboles wrote:
| Youtube already auto-generates transcripts that you can see in
| the ... menu in most videos. This website just seems like an
| alternative frontend?
| EGreg wrote:
| Or maybe it processes the video with its own backend ? How do
| you tell
| arboles wrote:
| Just minutes ago, I compared two transcripts for the same
| video and they were the exact same. Also on
| YouTubeTranscript.com swearing was redacted with [_], which
| is something I've only ever seen on youtube captions.
| kristianheljas wrote:
| First indication is the processing speed - there's known
| machine in the world that could transcribe videos in such
| speed.
| EGreg wrote:
| How about a cluster in parallel?
| codetrotter wrote:
| The simplest explanation is often the most probable one.
|
| Why would you reach for a cluster of machines working in
| parallel, when you could retrieve the already auto-
| created transcript from YouTube servers?
|
| Also, other comments have pointed out that the
| transcripts are identical with the ones created by
| YouTube, which would be unlikely to happen if this
| service was creating transcripts of their own.
| 88stacks wrote:
| this will be dead soon due to having youtube in the name
| lukeasch21 wrote:
| Don't worry, the website solved this issue: > "Probably Won't
| Fail: Featuring the latest build of an undocumented API."
|
| This will work as long as YouTube doesn't change anything. And
| since when has YouTube changed anything?
| seydor wrote:
| People can switch domains
| kristianheljas wrote:
| Hehe, they might need to switch cloud provider as well. The
| domain and the underlying content is currently served by no
| other than google cloud.
| arcturus17 wrote:
| The copy on your website is pure fire my dude.
| maybelsyrup wrote:
| I've been dreaming about something like this for years. Huge deal
| for me. Thank you for your work!
| faikuygur wrote:
| Here is how to extract Youtube video transcript to an Excel file
| with Robomotion:
|
| https://demo.robomotion.io/designer/shared/6j984jBCQqYVBCaQk...
| joosters wrote:
| My only complaint is with the layout of the site - could you
| please make the transcripts span across the whole width of the
| page, not just to the right of the video?
|
| My one gripe with Youtube's own transcript box is that it is too
| narrow, so it is a shame that a website designed to specifically
| make the transcripts more readable _also_ displays the
| transcripts in a narrow box.
| is0tope wrote:
| Maybe this is a bit off topic, but does anyone know the legal
| footing of having a business with another businesses name in it?
| For instance, this tool uses the word "YouTube" in its name,
| though it is used as only a part of it, and it is not a
| competitor. I've always wondered how this works.
| kube-system wrote:
| Broadly speaking, it would be trademark infringement if it is
| used in a way that may confuse others about the source of the
| product. It doesn't necessarily have to be a specific product
| that Alphabet has a direct competitor for.
| thaumasiotes wrote:
| https://en.wikipedia.org/wiki/Nominative_use
|
| > is a legal doctrine that provides an affirmative defense to
| trademark infringement as enunciated by the United States Ninth
| Circuit, by which a person may use the trademark of another as
| a reference to describe the other product, or to compare it to
| their own.
| tmpburning wrote:
| chiefalchemist wrote:
| Not sure about YouTube but WordPress does not allow the use of
| the name. WP in your (e.g.) domain name is ok. WordPress is
| not.
|
| I'd imagine it's very similar for others. Often a company will
| pursue a violation if only to be consistent in showing the
| courts they actively defend their copy right.
| thaumasiotes wrote:
| > Not sure about YouTube but WordPress does not allow the use
| of the name.
|
| They may not like it, but they don't have the power to
| disallow you from using their name to refer to them. That's
| allowed.
| chiefalchemist wrote:
| Actually, they do. It's copyright. Plenty of legal
| precedent. They defend WordPress, but are willing to allow
| WP.
|
| The law is on their side.
| bdcravens wrote:
| Most corporations regularly search for such domains, and submit
| cease-and-desist. I received one related to an eBay-related
| domain, but in my case, I hadn't built a business around it so
| it was easy enough to just take the site offline.
| johnlk wrote:
| Take video > transcribe > ask gpt to summarize > be genius in 2
| mins
| janandonly wrote:
| The burning hate I feel for all information to be locked away in
| a YouTube video. This will solve that real world problem. I love
| reading (or, skimming) through a long read.
| xuhu wrote:
| Just checked that google also includes youtube captions in
| search returns.
| motoboi wrote:
| Not sure if you know that, but YouTube has a transcript feature
| available for years now. It's somewhat hidden in the interface,
| but let's you search with ctrl-F (or command-F) in the
| transcript
| cratermoon wrote:
| Yeah this website just extract the transcript that exists and
| displays it alongside the video. It's nice, but it's not
| doing the transcribing itself.
| zbrozek wrote:
| I use this for city council meetings to figure out who said
| what. It's not easy, but it's better than nothing. YouTube
| doesn't appear to do so well with multiple speakers.
| alpb wrote:
| > I feel for all information to be locked away in a YouTube
| video.
|
| Google Search actually indexes transcripts of a video and shows
| you some YouTube results based on that even though the
| title/description of the video doesn't match the search query.
| RBerenguel wrote:
| I had a huge backlog of tech videos, so I wrote me this (also
| to play a bit with Haskell, the base idea can be replicated
| easily in any language though):
| https://github.com/rberenguel/glancer
| arboles wrote:
| Heh, this basically makes a storyboard
| Random_Person wrote:
| I've published almost 1,800 video diaries and this is a game
| changer for me. I've been wanting to do more with the back
| catalog, but don't have transcripts.
| thomassmith65 wrote:
| The ratio of information to misinformation on Youtube seems
| pretty bad.
|
| To make transcripts easier to access might create more problems
| than it solves.
|
| Granted I can't make a bullet-proof argument; there's no clear
| way to quantify that ratio.
| neilv wrote:
| Hook this up to a language model, and maybe a user could
| instantly get the _one sentence worth of information_ that the
| YouTube video creator buries in 10 minutes of monetized noise.
|
| And also save yourself time when the creator teases that they
| provide the info, but it turns out they don't, they're just
| trying to get views.
| greggsy wrote:
| I put something like this together to collect transcripts for
| uni videos. It's dumps all transcripts into a directory, with
| URL links, so I can just search the whole directory to find the
| keyword I need.
|
| Helped a lot with take home exams.
| nostromo wrote:
| YouTube created that problem by incentivizing longer videos.
| And now we have videos with tons of fluff.
|
| Similarly Google incentivizes longer webpages, so now we have
| recipes that start with a novella about grandma's cooking
| before showing the actual recipe.
|
| It used to be nice to see a video's thumbs up to thumbs down
| ratio to know if you've been click baited or not before
| watching the whole video. But that signal has been removed now
| too.
| anticristi wrote:
| As a
|
| recipe reader
|
| I want to
|
| dismiss cookies, have a video ad follow me down the page, and
| read why this cake conjures up memories of the author's
| childhood, before reaching the actual recipe
|
| so that
|
| I feel connected to the author, before fully committing to
| mixing ingredients
| slipmagic wrote:
| Tom Redman had this idea but he took feedback from Twitter.
| https://digg.com/2021/one-main-character-tom-redman-
| recipeas...
| neilv wrote:
| That "user story" is like a tragically misinterpreted
| comment by someone at a prospective customer, speaking of a
| special time with their grandmother, but garbled through N
| layers of field sales, marketing, product managers,
| engineering hierarchy, and Agile task management.
|
| Including the part about declining more cookies offered (to
| save room for grandma's lasagna).
| 12907835202 wrote:
| Are you sure Google prefers longer pages? I find (annoyingly)
| that Google likes the search version of my page for lots of
| things. E.g. a page called "best x of the y" the page for
| searching comments on that page called "best x of y search"
| where the only text is the title and a search input, will
| rank really well
| kristianheljas wrote:
| Try to search for recepies :) I also see long novels which
| seem to disguise the ridiculous amount ads which google
| seems to like as well (these are mostly provided by no
| other than themselves!).
| Topgamer7 wrote:
| YouTube-dl had the ability to rip just subtitles. I once used
| this to grep for some information I wanted after downloading
| all of the transcripts.
| InCityDreams wrote:
| ...or, just follow decent creators.
|
| No snark intended, but i just gave up with the dross. And even
| some of them, of late, are getting a bit crafty. But, creators
| get one chance from me now - give me decent content, or even
| with the fancy chapters, you're not getting my eyeballs past
| two minutes. What I have found is that leaving the decent stuff
| on, what auto-plays after is 'generally' of similar quality. A
| quick set of back-buttoning and bookmarking has fairly often
| got me some interesting results.
| neilv wrote:
| Good idea, but I don't follow anyone on YouTube. I was
| thinking about searching the Web for a bit of info, the
| search hits include YouTube videos (but no finer resolution
| than "this entire video").
|
| A search engine could, narrow in on the few sentences AV in
| the video that it thinks correspond to what I was searching
| for, and summarize that, and also link me to the AV start
| timepoint in case I also want to watch the video.
|
| This might change the economics of some YouTube video content
| creation.
| LelouBil wrote:
| Google does exactly that, if a video shows up in the search
| results, it shows you only the relevant small part.
| neilv wrote:
| I've never seen this before now, but I just got a Google
| search result video page with a kind of table-of-contents
| index on _one_ of the video hits just now. (These TOC
| entries _don 't_ correspond to the marked segments on the
| timeline. I don't know whether this is something YouTube
| is doing, or something the content creator did.)
|
| Is this what you mean? (Pardon if I'm not familiar with
| the latest Google Search features; I've mostly been using
| DDG lately, so don't have occasion to see all the
| features that exhibit only occasionally.)
| svat wrote:
| This is a great idea; I really enjoy all these "two channels
| simultaneously" (side-by-side translations, video with subtitles,
| and in this case video with a readable transcript, where you can
| scroll in the video or scroll in the transcript, and be
| synchronized).
|
| I had done something like this a couple of years ago for some
| specific set of videos (e.g.
| https://shreevatsa.net/tex/program/videos/s10/ -- compare with
| https://youtubetranscript.com/?v=_0Cv1G_s4gQ for the same video),
| but never got around to making it general; glad someone has done
| it. It takes just a few lines of Javascript, using the Youtube
| API, to do this i.e. keeping the video and text in sync (just
| view source on either page to see the JS at the bottom).
|
| Something like this can also help with audio recordings
| (generating the alignment automatically is called "forced
| alignment" and there are tools like "aeneas" for this). In case
| anyone's interested or wants to help (for Sanskrit texts): see
| https://github.com/shreevatsa/web-align-audio-text deployed at
| https://shreevatsa.net/ramayana/sarga/ and better version at
| https://github.com/avinashvarna/audio_alignment deployed at
| https://avinashvarna.github.io/audio_alignment/
| unangst wrote:
| Expect an email from Google lawyers early this week about the
| domain name.
| antman wrote:
| I think "transcriptsforyoutube" would be passable? I remember
| something about a case using "for" and being ok but not any
| details.
| bdcravens wrote:
| They generally don't get into nuance. If someone's trademark
| is in your domain name, expect a C&D.
| TheCaptain4815 wrote:
| Funny, was just looking for a tool like this.
|
| Any chance timestamps could be added?
| cm2187 wrote:
| With youtube dl you can download the subtitle tracks which
| should have timestamps. Though last time I checked they were
| broken (showing the whole test on the first timestamp) but
| perhaps they fixed it
| chiefalchemist wrote:
| For Power Point and screenshare based videos, a screenshot
| every 15 seconds or so would be great.
|
| Often enough I'd rather read than watch. Reading in faster.
| Having corresponding visuals would be a big plus.
| breck wrote:
| This is amazing! The speed and simplicity makes me happy. Thank
| you!
___________________________________________________________________
(page generated 2022-12-18 23:00 UTC)