[HN Gopher] Reverse-Engineering YouTube: Revisited
___________________________________________________________________
Reverse-Engineering YouTube: Revisited
Author : tyrrrz
Score : 158 points
Date : 2023-02-04 12:01 UTC (11 hours ago)
(HTM) web link (tyrrrz.me)
(TXT) w3m dump (tyrrrz.me)
| pixl97 wrote:
| While not fully related to the code itself, my daughter has a
| school provided Chromebook that blocks almost all Youtube video
| content. You can browse the YT site, but the thumbnails and
| videos won't load. I'm assuming there is some kind of content
| block occurring here based on some part of the URL.
|
| Well, kids being clever figured out the Chromebook browser shows
| a preview video if you hit the 'share' button and go to embed
| video. This is not content blocked. I didn't dig in to see if it
| would play age restricted content as I assume all access is being
| logged somewhere and want to minimize future fall out.
| [deleted]
| amelius wrote:
| Does anyone know if YouTube runs Ffmpeg internally?
| randomifcpfan wrote:
| Circumstantial evidence that they used to:
| https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/
|
| These days they use special hardware accelerators:
| https://gwern.net/doc/cs/hardware/2021-ranganathan.pdf
| latchkey wrote:
| I built software to efficiently run a large number of GPUs
| (>120k) in data centers. That second link is fantastic, but
| it really gives me PTSD. =)
| rhn_mk1 wrote:
| > There is one thing that developers like more than building
| things -- and that is breaking things built by other people.
|
| Haha. This is not as universal as the author thinks. Every time I
| need to reverse-engineer something obscured on purpose, I wish we
| could just get along.
|
| Every time I have to reverse-engineer something obscured by
| accident, I call it debugging.
|
| But even if I solve the puzzle, it's like solving crosswords: I
| just defeated a human mind, the victory is transient, and will
| soon be forgotten. I'd prefer my victories to be against the
| frontier of knowledge, and to win universal truths. That means
| building things rather than tearing down those humans built.
|
| I just wish there was more mathematical certainty and less human
| vices in programming.
| pixl97 wrote:
| Unfortunately the halting problem takes all your mathematical
| certainty and throws it out the window. It's very easy to take
| your application which will halt within a finite amount of time
| to one that will not. You'll find most programmers and
| companies are not going to spend the massive amount of time to
| ensure their logic is correct, but instead throw the
| application out there quickly and fix it based on crashes and
| feedback.
| rhn_mk1 wrote:
| Mathematical certainty is what to leverage, not what to
| fight. You'd use it _before_ you run into the halting
| prbblem, not after. Just like mathematics was used to
| discover the halting problem in the first place.
|
| And what you're describing as happening in practice is
| precisely the disappointing part of prgramming.
| philipphutterer wrote:
| > I'd prefer my victories to be against the frontier of
| knowledge, and to win universal truths.
|
| You wouldn't need to tear down barriers if the people that
| built them thought the same in the first place. Nonetheless,
| keep up that attitude.
| nyanpasu64 wrote:
| Interestingly I found that YouTube's web UI actually requests
| range _URLs_ rather than range HTTP headers, allowing it to seek
| around the video faster than mpv with yt-dlp (and conveniently
| avoiding throttling as well). I suspect this may be related to
| DASH: https://github.com/mpv-player/mpv/issues/10601
|
| Unfortunately mpv and ffmpeg do not currently have mature DASH
| support and cannot benefit from fast seeks:
| https://github.com/mpv-player/mpv/issues/7033 (didn't look
| deeply)
| 2h wrote:
| > and conveniently avoiding throttling as well
|
| throttling is not avoided. the YouTube web client generates a
| JavaScript signature that disables the throttling, same as what
| the code in the article does.
| kyberias wrote:
| Well that link to the introduction of Prolog video is not a
| really good starting point.
| ape4 wrote:
| Its too bad such an important resource (youtube) has a secret API
| - that changes all the time.
| thrdbndndn wrote:
| Why? YouTube has a proper public API, that doesn't change all
| the time.
| anamexis wrote:
| Can you retrieve videos with it?
| squarefoot wrote:
| They however a few years ago started forcing API users to
| authenticate, so when I had to spend months in bed after a
| bad road accident and later a heart attack, I couldn't
| anymore watch my favorite electronics channels using the Kodi
| YT extension unless I would authenticate. I guess they still
| allow anonymous use with a browser only because by doing that
| they can profile more people.
| kfarr wrote:
| The proper public API notably does not provide access to the
| raw video steam making it useless for many use cases
| Genghis_Khan wrote:
| > secret API
|
| In their client-side code, they provide a worked example of how
| to use their API. That's hardly the way to keep a secret.
| morgannewman wrote:
| Furthermore, the economics of video hosting sites like YouTube
| are such that you have truly incredible storage, server, and
| bandwidth growth, basically forever. I don't think it's
| feasible for there to be a "free" API that lets people use
| YouTube as they please, build clones of the site with no ads,
| etc.
| thrdbndndn wrote:
| This is also a good place to learn about it:
| https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extracto...
| 2h wrote:
| Ive said it before on GitHub, I dont think
| `TVHTML5_SIMPLY_EMBEDDED_PLAYER` is the great solution everyone
| thinks it is. Yeah, you can get the age-restricted videos
| anonymously. However you can also get those videos by logging in,
| which the author doesn't mention: POST
| /youtubei/v1/player HTTP/1.1 Host: www.youtube.com
| Authorization: Bearer
| ya29.a0AVvZVsqRwNWFI3R0MSxnugyNlxbqIOXcwXkeA6NMOcpv_...
| { "contentCheckOk": true, "context": {
| "client": { "clientName": "ANDROID",
| "clientVersion": "18.04.35" } },
| "racyCheckOk": true, "videoId": "Cr381pDsSsA" }
|
| and `TVHTML5_SIMPLY_EMBEDDED_PLAYER` comes with strong drawbacks.
| Some videos under that client require a JavaScript signature for
| BOTH downloading and unthrottling. Each person is welcome to
| their own opinion, but I just dont think its worth the complexity
| of parsing some arbitrary JavaScript with Python when you can
| just log in (programmatically as above). Personally I use the
| ANDROID client, which avoids all JavaScript signatures. Also not
| mentioned in the article is that you can actually take the
| throttled URLs as is, and download pieces concurrently for a
| pretty good result. So each piece is still downloading slowly,
| but if you use on the order of 99 connections, you get decent
| speed. You would think you get IP blocked or something for this,
| but I downloaded quite a bit using this method as a test and the
| YouTube server allowed it. The combined resultant speed was only
| something like 2 MB/s, so big picture it doesn't seem like an
| abuse. My YouTube OAuth code is here for any interested:
|
| http://2a.pages.dev/mech
| philipphutterer wrote:
| > However you can also get those videos by logging in, which
| the author doesn't mention.
|
| > Also not mentioned in the article is that you can actually
| take the throttled URLs as is, and download pieces concurrently
| for a pretty good result.
|
| The author mentioned both, the login option as well as the
| chunking mechanism. Sorry, but did you actually read the blog
| post?
| 2h wrote:
| they mention cookies. that not the correct method for
| authenticating to the API, OAuth is.
| tyrrrz wrote:
| If you can afford to always be logged in, then sure, but it's
| not always an option. Especially if you need a general
| solution.
| 2h wrote:
| who said anything about always? you log in as needed. most
| videos are open.
| nirav72 wrote:
| I don't particularly have anything to add about the article. But
| I do enjoy using your desktop youtube downloader , as well as
| couple of your .net libraries. Especially CliWrap. Amazing work.
| Just wanted to say thanks!
| tyrrrz wrote:
| Glad to hear that :)
| zxcvbn4038 wrote:
| If you really want to understand how streaming video works then
| it definitely takes you down a couple rabbit holes - but it's
| worth it. I think more people and companies should try to stream
| their own video content rather then be at the mercy of Google,
| their algorithms, and their censorship. You don't have to "be"
| YouTube and host other users content but you should be able to
| host your content without YouTube's approval.
| petra wrote:
| There are probably many paid video hosting platforms. You can't
| save that much by hosting it yourself.
|
| Anyone who is hosting on YouTube is looking for a free service.
| mikae1 wrote:
| _> Anyone who is hosting on YouTube is looking for a free
| service. _
|
| https://archive.org/help/video.php is also "free".
| paulpauper wrote:
| And now many of these bypasses and tricks will stop working.
| [deleted]
| jscipione wrote:
| Why do the comment counts almost never match the actual number of
| comments? I know the answer is censorship but why doesn't YouTube
| shadow-ban the comment count when they shadow-ban comments?
___________________________________________________________________
(page generated 2023-02-04 23:01 UTC)