[HN Gopher] Reverse-Engineering YouTube: Revisited
       ___________________________________________________________________
        
       Reverse-Engineering YouTube: Revisited
        
       Author : tyrrrz
       Score  : 158 points
       Date   : 2023-02-04 12:01 UTC (11 hours ago)
        
 (HTM) web link (tyrrrz.me)
 (TXT) w3m dump (tyrrrz.me)
        
       | pixl97 wrote:
       | While not fully related to the code itself, my daughter has a
       | school provided Chromebook that blocks almost all Youtube video
       | content. You can browse the YT site, but the thumbnails and
       | videos won't load. I'm assuming there is some kind of content
       | block occurring here based on some part of the URL.
       | 
       | Well, kids being clever figured out the Chromebook browser shows
       | a preview video if you hit the 'share' button and go to embed
       | video. This is not content blocked. I didn't dig in to see if it
       | would play age restricted content as I assume all access is being
       | logged somewhere and want to minimize future fall out.
        
         | [deleted]
        
       | amelius wrote:
       | Does anyone know if YouTube runs Ffmpeg internally?
        
         | randomifcpfan wrote:
         | Circumstantial evidence that they used to:
         | https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/
         | 
         | These days they use special hardware accelerators:
         | https://gwern.net/doc/cs/hardware/2021-ranganathan.pdf
        
           | latchkey wrote:
           | I built software to efficiently run a large number of GPUs
           | (>120k) in data centers. That second link is fantastic, but
           | it really gives me PTSD. =)
        
       | rhn_mk1 wrote:
       | > There is one thing that developers like more than building
       | things -- and that is breaking things built by other people.
       | 
       | Haha. This is not as universal as the author thinks. Every time I
       | need to reverse-engineer something obscured on purpose, I wish we
       | could just get along.
       | 
       | Every time I have to reverse-engineer something obscured by
       | accident, I call it debugging.
       | 
       | But even if I solve the puzzle, it's like solving crosswords: I
       | just defeated a human mind, the victory is transient, and will
       | soon be forgotten. I'd prefer my victories to be against the
       | frontier of knowledge, and to win universal truths. That means
       | building things rather than tearing down those humans built.
       | 
       | I just wish there was more mathematical certainty and less human
       | vices in programming.
        
         | pixl97 wrote:
         | Unfortunately the halting problem takes all your mathematical
         | certainty and throws it out the window. It's very easy to take
         | your application which will halt within a finite amount of time
         | to one that will not. You'll find most programmers and
         | companies are not going to spend the massive amount of time to
         | ensure their logic is correct, but instead throw the
         | application out there quickly and fix it based on crashes and
         | feedback.
        
           | rhn_mk1 wrote:
           | Mathematical certainty is what to leverage, not what to
           | fight. You'd use it _before_ you run into the halting
           | prbblem, not after. Just like mathematics was used to
           | discover the halting problem in the first place.
           | 
           | And what you're describing as happening in practice is
           | precisely the disappointing part of prgramming.
        
         | philipphutterer wrote:
         | > I'd prefer my victories to be against the frontier of
         | knowledge, and to win universal truths.
         | 
         | You wouldn't need to tear down barriers if the people that
         | built them thought the same in the first place. Nonetheless,
         | keep up that attitude.
        
       | nyanpasu64 wrote:
       | Interestingly I found that YouTube's web UI actually requests
       | range _URLs_ rather than range HTTP headers, allowing it to seek
       | around the video faster than mpv with yt-dlp (and conveniently
       | avoiding throttling as well). I suspect this may be related to
       | DASH: https://github.com/mpv-player/mpv/issues/10601
       | 
       | Unfortunately mpv and ffmpeg do not currently have mature DASH
       | support and cannot benefit from fast seeks:
       | https://github.com/mpv-player/mpv/issues/7033 (didn't look
       | deeply)
        
         | 2h wrote:
         | > and conveniently avoiding throttling as well
         | 
         | throttling is not avoided. the YouTube web client generates a
         | JavaScript signature that disables the throttling, same as what
         | the code in the article does.
        
       | kyberias wrote:
       | Well that link to the introduction of Prolog video is not a
       | really good starting point.
        
       | ape4 wrote:
       | Its too bad such an important resource (youtube) has a secret API
       | - that changes all the time.
        
         | thrdbndndn wrote:
         | Why? YouTube has a proper public API, that doesn't change all
         | the time.
        
           | anamexis wrote:
           | Can you retrieve videos with it?
        
           | squarefoot wrote:
           | They however a few years ago started forcing API users to
           | authenticate, so when I had to spend months in bed after a
           | bad road accident and later a heart attack, I couldn't
           | anymore watch my favorite electronics channels using the Kodi
           | YT extension unless I would authenticate. I guess they still
           | allow anonymous use with a browser only because by doing that
           | they can profile more people.
        
           | kfarr wrote:
           | The proper public API notably does not provide access to the
           | raw video steam making it useless for many use cases
        
         | Genghis_Khan wrote:
         | > secret API
         | 
         | In their client-side code, they provide a worked example of how
         | to use their API. That's hardly the way to keep a secret.
        
         | morgannewman wrote:
         | Furthermore, the economics of video hosting sites like YouTube
         | are such that you have truly incredible storage, server, and
         | bandwidth growth, basically forever. I don't think it's
         | feasible for there to be a "free" API that lets people use
         | YouTube as they please, build clones of the site with no ads,
         | etc.
        
       | thrdbndndn wrote:
       | This is also a good place to learn about it:
       | https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extracto...
        
       | 2h wrote:
       | Ive said it before on GitHub, I dont think
       | `TVHTML5_SIMPLY_EMBEDDED_PLAYER` is the great solution everyone
       | thinks it is. Yeah, you can get the age-restricted videos
       | anonymously. However you can also get those videos by logging in,
       | which the author doesn't mention:                   POST
       | /youtubei/v1/player HTTP/1.1         Host: www.youtube.com
       | Authorization: Bearer
       | ya29.a0AVvZVsqRwNWFI3R0MSxnugyNlxbqIOXcwXkeA6NMOcpv_...
       | {          "contentCheckOk": true,          "context": {
       | "client": {            "clientName": "ANDROID",
       | "clientVersion": "18.04.35"           }          },
       | "racyCheckOk": true,          "videoId": "Cr381pDsSsA"         }
       | 
       | and `TVHTML5_SIMPLY_EMBEDDED_PLAYER` comes with strong drawbacks.
       | Some videos under that client require a JavaScript signature for
       | BOTH downloading and unthrottling. Each person is welcome to
       | their own opinion, but I just dont think its worth the complexity
       | of parsing some arbitrary JavaScript with Python when you can
       | just log in (programmatically as above). Personally I use the
       | ANDROID client, which avoids all JavaScript signatures. Also not
       | mentioned in the article is that you can actually take the
       | throttled URLs as is, and download pieces concurrently for a
       | pretty good result. So each piece is still downloading slowly,
       | but if you use on the order of 99 connections, you get decent
       | speed. You would think you get IP blocked or something for this,
       | but I downloaded quite a bit using this method as a test and the
       | YouTube server allowed it. The combined resultant speed was only
       | something like 2 MB/s, so big picture it doesn't seem like an
       | abuse. My YouTube OAuth code is here for any interested:
       | 
       | http://2a.pages.dev/mech
        
         | philipphutterer wrote:
         | > However you can also get those videos by logging in, which
         | the author doesn't mention.
         | 
         | > Also not mentioned in the article is that you can actually
         | take the throttled URLs as is, and download pieces concurrently
         | for a pretty good result.
         | 
         | The author mentioned both, the login option as well as the
         | chunking mechanism. Sorry, but did you actually read the blog
         | post?
        
           | 2h wrote:
           | they mention cookies. that not the correct method for
           | authenticating to the API, OAuth is.
        
         | tyrrrz wrote:
         | If you can afford to always be logged in, then sure, but it's
         | not always an option. Especially if you need a general
         | solution.
        
           | 2h wrote:
           | who said anything about always? you log in as needed. most
           | videos are open.
        
       | nirav72 wrote:
       | I don't particularly have anything to add about the article. But
       | I do enjoy using your desktop youtube downloader , as well as
       | couple of your .net libraries. Especially CliWrap. Amazing work.
       | Just wanted to say thanks!
        
         | tyrrrz wrote:
         | Glad to hear that :)
        
       | zxcvbn4038 wrote:
       | If you really want to understand how streaming video works then
       | it definitely takes you down a couple rabbit holes - but it's
       | worth it. I think more people and companies should try to stream
       | their own video content rather then be at the mercy of Google,
       | their algorithms, and their censorship. You don't have to "be"
       | YouTube and host other users content but you should be able to
       | host your content without YouTube's approval.
        
         | petra wrote:
         | There are probably many paid video hosting platforms. You can't
         | save that much by hosting it yourself.
         | 
         | Anyone who is hosting on YouTube is looking for a free service.
        
           | mikae1 wrote:
           | _> Anyone who is hosting on YouTube is looking for a free
           | service. _
           | 
           | https://archive.org/help/video.php is also "free".
        
       | paulpauper wrote:
       | And now many of these bypasses and tricks will stop working.
        
         | [deleted]
        
       | jscipione wrote:
       | Why do the comment counts almost never match the actual number of
       | comments? I know the answer is censorship but why doesn't YouTube
       | shadow-ban the comment count when they shadow-ban comments?
        
       ___________________________________________________________________
       (page generated 2023-02-04 23:01 UTC)