[HN Gopher] Yark: Advanced and easy YouTube archiver now stable
       ___________________________________________________________________
        
       Yark: Advanced and easy YouTube archiver now stable
        
       Author : Owez
       Score  : 141 points
       Date   : 2023-01-05 18:45 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | toomuchtodo wrote:
       | If you want to provide an option to upload artifacts to the
       | Internet Archive, you could crib off of
       | https://github.com/bibanon/tubeup It too relies on yt-dlp for
       | extraction.
       | 
       | Importantly, pay close attention to what artifacts are uploaded
       | to an item created, and what metadata is set as part of the
       | upload process.
        
       | newsclues wrote:
       | Why not use https://youtube-dl.org ?
        
         | alwayslikethis wrote:
         | youtube-dl basically became unusuable for a while now. It gets
         | only tens of KB/s on my 1gbps connection. yt-dlp is a more
         | maintained alternative, I think. It seems to get great speeds
         | the last time I checked.
        
         | hbn wrote:
         | ytdl is one of its dependencies
         | 
         | You clearly didn't even skim the readme to see what this does
        
           | blowski wrote:
           | From the HN guidelines:
           | 
           | > Please don't comment on whether someone read an article.
           | "Did you even read the article? It mentions that" can be
           | shortened to "The article mentions that".
        
           | naavis wrote:
           | To be fair, the readme does not mention ytdl.
        
       | nomilk wrote:
       | Occasionally my personal/literature/academic/tech notes link to a
       | youtube source, but when clicking on it I find the video has
       | vanished with no way of knowing what it was or even what it was
       | called (it's sometimes impossible to track down an
       | identical/replacement source). I lost many valuable references
       | that way.
       | 
       | Wayback machine solves for webpages, but nothing I'm aware of
       | (short of youtube-dl-ing the video yourself and storing it
       | somewhere openly, probably at risk of various infringements)
       | solves this. Quite a lot of hassle for something rather simple.
       | 
       | It would be great to be able to immortalise them on a per-video
       | basis, so if it's important enough, we can be _sure_ that
       | references made to the content will still be there in the future
       | when needed.
        
         | svnpenn wrote:
         | Any video you care about, you need to make it your own
         | responsibility to backup the metadata and/or streams. If you're
         | lucky you can internet search the video ID to get the metadata,
         | even after deletion.
        
         | 0cf8612b2e1e wrote:
         | I do not see how that becomes possible without the Internet
         | archive effectively mirroring a large percentage of YouTube. I
         | recall at one point, IA wanted to archive just the video
         | metadata and realized even that would be technically
         | challenging.
        
       | swyx wrote:
       | does anyone have recs on how to run this on a continuous basis in
       | the cloud? this obviously will take a lot more storage than like
       | a normal heroku setup (not that I would use heroku). should i use
       | Railway or Render or is that overkill compared to something else?
       | 
       |  _gasp_ can i run it as a github action???
        
       | iforgotpassword wrote:
       | Did you write your own YouTube scraper, which would be quite a
       | task, or is this based on something like ytdl? Might be worth
       | mentioning in the readme.
        
         | amarshall wrote:
         | Indeed it appears to use yt-dlp
         | https://github.com/Owez/yark/blob/676074ee3d9e379d15e52ffe2e...
        
       | HEHENE wrote:
       | I'm a big fan of the historical information that Yark shows.
       | 
       | Arrimus 3D recently replacing a large chunk of their 3D modeling
       | tutorials with religious content was a pretty big lightbulb
       | moment for me that so much of the content I rely on - not just
       | for the initial learning of a new skill, but as a continual
       | reference when I forget something - is so fragile.
       | 
       | I immediately bought a NAS and began backing up everything that I
       | gleam even the tiniest bit of learning from using a similar
       | project, TubeArchivist[0]. Projects like this are really
       | important for maintaining all of the great knowledge on the web.
       | 
       | [0] https://github.com/tubearchivist/tubearchivist
        
       | 2kwatts wrote:
       | This is really slick. I was figuring it was just a simple wrapper
       | for yt-dlp which scraped some additional things (comments, views,
       | etc) but you went above and beyond with the web interface. Nice
       | job!
        
       | wahnfrieden wrote:
       | Anyone know if Apple bans this kind of lib from use in the app
       | store?
        
       | Owez wrote:
       | I've been working on polishing my YouTube archiver project for
       | the last while and I've finally released a solid version of it,
       | it has an offline web viewer/visualiser for archived channels and
       | it's managed using a cli. Most importantly, its easy to use :)
        
         | bityard wrote:
         | I have a cron job that uses yt-dlp to download just the audio
         | tracks of any videos that I have saved to a public playlist,
         | can this be a replacement for that?
        
         | theandrewbailey wrote:
         | Can you edit your submission to add a "Show HN:" before the
         | title? Like these: https://news.ycombinator.com/show
        
       | prometheus76 wrote:
       | yt-dlp and a batch file that runs via Task Scheduler has been
       | doing this for me for a couple of years now. I also grab the
       | captions and throw that into a database so that I can search
       | transcripts for a clip that I can remember but can't remember
       | which video it's in. It was a fun weekend project.
        
         | NegativeLatency wrote:
         | I run mine with cron and it puts files in a special folder for
         | plex:
         | https://github.com/nburns/utilities/blob/master/youtube.fish
         | 
         | Pulls from my watch later playlist which is quite handy
        
         | 2OEH8eoCRo0 wrote:
         | How do you deal with file numbering?
         | 
         | I prefer the file prefixed with a number that indicates "air
         | date". 01 being the first uploaded video. The default is by
         | index and the top of the channel or playlist is number 01 which
         | is the most recent.
        
           | prometheus76 wrote:
           | I just use the publish date in the format of YYYY-MM-DD at
           | the beginning of the filename so that they sort properly.
        
       | salutonmundo wrote:
       | "Downloading things from YouTube" "Stable"
       | 
       | Ha
       | 
       | Hahahaha
       | 
       | Hahahahahahahahahahaha
        
       | pablo24602 wrote:
       | I'd add some more flags/options for downloading specific videos
       | or updating the library of downloaded content. I don't really
       | want to download all the videos from one specific channel-
       | instead I want to download the last 10 videos, for example.
        
       | causality0 wrote:
       | Does this have the ability to bet set to "grab highest available
       | resolution" instead of specifying one? A lot of the material I'd
       | like to archive has material from well before and after Youtube
       | started supporting HD resolutions.
        
         | 2OEH8eoCRo0 wrote:
         | What if the highest available resolution does not have an audio
         | stream?
        
           | NegativeLatency wrote:
           | grab the audio stream from something else and stitch them
           | together with ffmpeg (like youtube-dl and others do)
        
           | rollcat wrote:
           | Normally in modern adaptive streaming, every video variant is
           | muxed into a separate stream without audio, and different
           | audio variants are muxed into their own individual streams.
        
           | TonyTrapp wrote:
           | Only some formats which I guess are to be used with older
           | browsers contain both video and audio. In general, these days
           | video and audio are delivered through separate streams on
           | YouTube.
        
         | svnpenn wrote:
         | It's just a wrapper for YT-DLP
        
       | eats_indigo wrote:
       | Bit of a noob question here. What's an archiver for?
       | 
       | Is it a library for things you've watched and want to store
       | outside of youtube? Or is this for storing content you've created
       | / managing your own portfolio of content?
        
         | pessimizer wrote:
         | There are coded hints in the link, like:
         | 
         | > Yark lets you continuously archive all videos and metadata
         | for YouTube channels. You can also view your archive as a
         | seamless offline website
        
           | eats_indigo wrote:
           | Snarky and not answering the question, well done.
        
         | iforgotpassword wrote:
         | Personally, for me it's archiving. In case I want to go back to
         | it. Videos just keep disappearing from YouTube because channels
         | get deleted by YouTube, by their owners, videos get
         | copystriked, geoblocked, privated, and so on. As I'm lazy as
         | f### I didn't create anything as sophisticated as OP, but a
         | simple 10 line PHP Script on my home server that just pretends
         | being Kodi enough to fool yatse (android remote for Kodi). So
         | every time I watch a video on YouTube (on my phone) that I want
         | to keep I tap "share" and then "play on Kodi", my php script
         | gets the video url from the post data and launches youtube-dl.
         | It sucks because I never get feedback if it worked and when
         | it's finished, but I log all the URLs and at some point in the
         | future I'll eventually add a cronjob that checks the list and
         | sends reports and whatnot. Some day.
        
           | eats_indigo wrote:
           | Great, thanks for the insight!
        
         | cocacola1 wrote:
         | I think it's the latter. I've no issue with most things being
         | one-and-done. But some channels have phenomenal content that
         | I'd like to keep for the long term. Something might happen to
         | their channel that makes it difficult to get, so I'll regularly
         | update my downloads with new videos, pictures, etc.
         | 
         | This applies to ripping, too. Funimation removed _Drifters_
         | years ago, but I'll always have a copy of it because I ripped
         | it. Of course, I need to store it so it still costs money. But
         | I can be content that I have the content.
        
       | j1elo wrote:
       | Afrer reading the description, this project seems to be solely
       | focused on downloading all of a specific channel's videos.
       | 
       | I've been taking my first steps at having a home server, and one
       | of the things I'd love to do with it is having an archive of the
       | videos that I have saved in my private playlists on YouTube. In
       | my mind, the service would periodically check all my playlists,
       | compare with what exists locally, and download any missing video.
       | Maybe even with a nice web UI so it's easier to visually
       | configure and use.
       | 
       | Does such a service already exist so I can self-host it?
        
         | aquova wrote:
         | I haven't used it personally, but Tube Archivist might be what
         | you're looking for.
         | 
         | https://www.tubearchivist.com/
        
         | erinnh wrote:
         | Look at tubearchivist. It can do what you want.
         | 
         | You can subscribe to playlists, as well as automatically update
         | and download videos.
        
         | Fang_ wrote:
         | Nice web ui aside, if I'm not mistaken youtube-dl already
         | supports this kind of usage. You can `youtube-dl --download-
         | archive archive.txt https://youtu.be/your-playlist` and it'll
         | keep track in the archive.txt of everything it's already
         | downloaded. Supplement with authentication options as
         | necessary, set up a cronjob, done.
        
         | _0ffh wrote:
         | I think youtube-dl as well as yt-dlp can both download
         | playlists. You can create a script to download all your
         | playlist and make it a cron job. Videos that do already exist
         | in the target folder will be skipped automatically.
        
         | mozman wrote:
         | I was doing this for awhile but it became expensive - tens of
         | TBs on an expensive NAS just to hoard data.
        
           | romwell wrote:
           | FSM almighty, how much video are you watching to have that
           | many _favorite_ ones?
        
       | amelius wrote:
       | How cool would it be if everyone had IPFS running in their
       | browser, and everyone dedicated some time to filling it with a
       | backup of the internet, including YouTube.
        
         | KMnO4 wrote:
         | I did some napkin math. If 1 billion people each backed up
         | 10gb, we'd almost have enough to store a copy of YouTube with
         | _zero_ data redundancy.
         | 
         | Google is massive.
        
           | CamelCaseName wrote:
           | It's unbelievable that YouTube was as free as it was for as
           | long as it was.
           | 
           | We got too great a deal for so long that we many people can't
           | see things any other way.
        
             | rchaud wrote:
             | It's unbelievable that Wikipedia is free and survives on
             | donations.
             | 
             | Youtube sells ads and is subsidized by one of the biggest
             | ad companies in the world that happens to have a lot of
             | cheap cloud storage available.
        
             | yesco wrote:
             | Was probably easier when the videos had a time limit and
             | didn't support 4K (or 1080p even).
        
               | judge2020 wrote:
               | For reference, 1080p was in 2009:
               | https://blog.youtube/news-and-events/1080p-hd-comes-to-
               | youtu... while ads came out much before then:
               | https://blog.youtube/news-and-events/partner-program-
               | expands...
        
             | wintermutestwin wrote:
             | It is not "free" at all if you are paying with your data.
             | My data privacy is worth way more than the cost of
             | streaming some video with crap discovery.
        
           | swyx wrote:
           | wait, is the total number of videos on YouTube a known
           | number? thats fascinating, i'd love to see your assumptions
           | for napkin math
        
           | okasaki wrote:
           | But most people have more free disk space than 10gb, and most
           | people also have more than 1 device. I have 2 phones, 3
           | laptops, a desktop and a NAS, with some 40TB between them.
           | 
           | My workplace has a private cloud with some 70PB of storage,
           | plus tape, and tons of desktops and laptops.
        
             | masukomi wrote:
             | a) i seriously doubt your "most" claim about free disk
             | space. Maybe "most privileged white folks in rich
             | countries" but not "most people"
             | 
             | b) just because i HAVE 10Gb of free disk space doesn't mean
             | i'm going to offer it up for archiving of random internet
             | crap
             | 
             | c) if it was on my phone it'd cost money and now we're very
             | actively ignoring just how expensive it is to be online in
             | 3rd world countries if you're not a rich expat.
             | 
             | > My workplace has a private cloud with some 70PB of
             | storage, plus tape, and tons of desktops and laptops.
             | 
             | sure. HOw much of that do you think they'd be willing to
             | contribute, for free to backing up random crap from people
             | on the internet that may or may not be legal and could open
             | them up to litigation because they're not an ISP / platform
             | and thus not protected by the Shield laws?
        
           | amelius wrote:
           | We're currently at around $0.02 per GB, so that would be
           | $0.20 per person. A bargain.
           | 
           | (From a random source on the internet, [1])
           | 
           | [1] https://www.petercai.com/storage-prices-have-sort-of-
           | stopped...
        
       ___________________________________________________________________
       (page generated 2023-01-05 23:00 UTC)