[HN Gopher] Show HN: I create a free website for download YouTub...
___________________________________________________________________
Show HN: I create a free website for download YouTube transcript,
subtitle
Author : trungnx2605
Score : 97 points
Date : 2024-02-18 09:41 UTC (13 hours ago)
(HTM) web link (www.downloadyoutubesubtitle.com)
(TXT) w3m dump (www.downloadyoutubesubtitle.com)
| tomcam wrote:
| What a great service. Thanks!
| rpastuszak wrote:
| How are you getting the transcripts? Using the private YT API
| like in https://www.npmjs.com/package/youtube-transcript?
| trungnx2605 wrote:
| youtube_transcript_api
| santamex wrote:
| I liked also this one:
|
| https://filmot.com/
|
| Here you can search the subtitles of YouTube videos.
| foobarqux wrote:
| How do I actually search files with timestamps (preferably from
| the CLI)?
|
| I can use rg if the search terms happen to be on the same line
| but if the terms span multiple lines the interleaved timestamp
| metadata will prevent the query from being matched.
| ldenoue wrote:
| I'm also doing this but it adds punctuations, paragraphs and
| chapter headers because most raw YouTube transcripts lack proper
| punctuation
|
| https://www.appblit.com/scribe
| BigElephant wrote:
| How are you deriving the punctuation?
| undershirt wrote:
| Wow! This is great
| mmh0000 wrote:
| yt-dlp[1] can also do this:
|
| ```
|
| $ yt-dlp --write-sub --sub-lang "en.*" --write-auto-sub --skip-
| download 'https://www.youtube.com/watch?v=...'
|
| ```
|
| [1] https://github.com/yt-dlp/yt-dlp
| tarasglek wrote:
| Here is mine: https://www.val.town/v/taras/scrape2md
|
| Use it like https://taras-
| scrape2md.web.val.run/https://youtu.be/TJqeCpx...
|
| This is meant to be a general purpose content-to-markdown tool
| for llm interactions in https://chatcraft.org
| hn_acker wrote:
| What's the copyright license on your scrape2md code?
| tarasglek wrote:
| Updated description with license(MIT) and link to the more
| fully featured version.
| wahnfrieden wrote:
| Is there any way to extract the transcripts from JS state on
| YouTube, instead of making API reqs for them?
| rspoerri wrote:
| I use this script, because automatically generated subtitles are
| badly formatted as transcript (only good as subtitles). It works
| pretty well to archive the videos including the transcript and
| subtitles.
|
| ```
|
| #!/bin/zsh
|
| # download as mp4, get normal subtitles
|
| yt-dlp -f mp4 "$@" --write-auto-sub --sub-format best --write-sub
|
| # download subtitles and convert them to transcript
|
| yt-dlp --skip-download --write-subs --write-auto-subs --sub-lang
| en -k --sub-format ttml --convert-subs srt --exec before_dl:"sed
| -e '/^[0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9] -->
| [0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9]$/d' -e
| '/^[[:digit:]]\\{1,3\\}$/d' -e 's/<[^>] _> //g' -e
| '/^[[:space:]]_$/d' -i '' %(requested_subtitles.:.filepath)#q"
| "$@"
|
| ```
| araes wrote:
| Checking online, this [1] appears to be one of the most heavily
| referenced on StackOverflow for downloading both user entered and
| automatically generated transcripts. (Python based)
|
| [1] https://github.com/jdepoix/youtube-transcript-api
|
| Notably, Google really needs to have an obvious API endpoint for
| this kind of call. If 1000's of programmers are all rolling their
| own implementation, there's probably a huge number that
| constantly download the full video and transcribe in data
| harvesting.
|
| Kind of surprised honestly it's taken this long for Youtube to
| fall prey to massive data harvesting campaigns. From this article
| [2] and this paper on Youtube data statistics [3] there are
| ~14,000,000,000 videos on Youtube with a mean length of 615
| seconds (~10 minutes).
|
| You'd think people would be interested in:
| 8,610,000,000,000 seconds 143,500,000,000 minutes
| 2,391,666,666 hours 3,274,083 months 272,840 years
| 27,284 decades 2,728 centuries 273 millennia
|
| Of live action video on nearly every single subject in human
| existence.
|
| Also, the paper's really cool and extremely sobering about being
| a "content creator" based on the 1% get all views.
|
| [2] "What We Discovered on 'Deep YouTube'",
| https://www.theatlantic.com/technology/archive/2024/01/how-m...
|
| [3] "Dialing for Videos: A Random Sample of YouTube",
| https://journalqd.org/article/view/4066/3766
| numpad0 wrote:
| I see lots of yt-dlp commands here so...
|
| PSA: yt-dlp exits non-zero if destination filename or any of
| intermediate files' names are too long for the filesystem. Use
| `-o "%(title).150B [%(id)s].%(ext)s"` to limit filename length(to
| 150 bytes in this example). "--trim-filenames" don't work.
| trungnx2605 wrote:
| It use youtube_transcript_api
| trungnx2605 wrote:
| Hi guy, I still get an error that shows Client Error: Too Many
| Requests for URL. So YouTube blocked the IP right?
___________________________________________________________________
(page generated 2024-02-18 23:01 UTC)