[HN Gopher] Yark: Advanced and easy YouTube archiver now stable
___________________________________________________________________
Yark: Advanced and easy YouTube archiver now stable
Author : Owez
Score : 141 points
Date : 2023-01-05 18:45 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| toomuchtodo wrote:
| If you want to provide an option to upload artifacts to the
| Internet Archive, you could crib off of
| https://github.com/bibanon/tubeup It too relies on yt-dlp for
| extraction.
|
| Importantly, pay close attention to what artifacts are uploaded
| to an item created, and what metadata is set as part of the
| upload process.
| newsclues wrote:
| Why not use https://youtube-dl.org ?
| alwayslikethis wrote:
| youtube-dl basically became unusuable for a while now. It gets
| only tens of KB/s on my 1gbps connection. yt-dlp is a more
| maintained alternative, I think. It seems to get great speeds
| the last time I checked.
| hbn wrote:
| ytdl is one of its dependencies
|
| You clearly didn't even skim the readme to see what this does
| blowski wrote:
| From the HN guidelines:
|
| > Please don't comment on whether someone read an article.
| "Did you even read the article? It mentions that" can be
| shortened to "The article mentions that".
| naavis wrote:
| To be fair, the readme does not mention ytdl.
| nomilk wrote:
| Occasionally my personal/literature/academic/tech notes link to a
| youtube source, but when clicking on it I find the video has
| vanished with no way of knowing what it was or even what it was
| called (it's sometimes impossible to track down an
| identical/replacement source). I lost many valuable references
| that way.
|
| Wayback machine solves for webpages, but nothing I'm aware of
| (short of youtube-dl-ing the video yourself and storing it
| somewhere openly, probably at risk of various infringements)
| solves this. Quite a lot of hassle for something rather simple.
|
| It would be great to be able to immortalise them on a per-video
| basis, so if it's important enough, we can be _sure_ that
| references made to the content will still be there in the future
| when needed.
| svnpenn wrote:
| Any video you care about, you need to make it your own
| responsibility to backup the metadata and/or streams. If you're
| lucky you can internet search the video ID to get the metadata,
| even after deletion.
| 0cf8612b2e1e wrote:
| I do not see how that becomes possible without the Internet
| archive effectively mirroring a large percentage of YouTube. I
| recall at one point, IA wanted to archive just the video
| metadata and realized even that would be technically
| challenging.
| swyx wrote:
| does anyone have recs on how to run this on a continuous basis in
| the cloud? this obviously will take a lot more storage than like
| a normal heroku setup (not that I would use heroku). should i use
| Railway or Render or is that overkill compared to something else?
|
| _gasp_ can i run it as a github action???
| iforgotpassword wrote:
| Did you write your own YouTube scraper, which would be quite a
| task, or is this based on something like ytdl? Might be worth
| mentioning in the readme.
| amarshall wrote:
| Indeed it appears to use yt-dlp
| https://github.com/Owez/yark/blob/676074ee3d9e379d15e52ffe2e...
| HEHENE wrote:
| I'm a big fan of the historical information that Yark shows.
|
| Arrimus 3D recently replacing a large chunk of their 3D modeling
| tutorials with religious content was a pretty big lightbulb
| moment for me that so much of the content I rely on - not just
| for the initial learning of a new skill, but as a continual
| reference when I forget something - is so fragile.
|
| I immediately bought a NAS and began backing up everything that I
| gleam even the tiniest bit of learning from using a similar
| project, TubeArchivist[0]. Projects like this are really
| important for maintaining all of the great knowledge on the web.
|
| [0] https://github.com/tubearchivist/tubearchivist
| 2kwatts wrote:
| This is really slick. I was figuring it was just a simple wrapper
| for yt-dlp which scraped some additional things (comments, views,
| etc) but you went above and beyond with the web interface. Nice
| job!
| wahnfrieden wrote:
| Anyone know if Apple bans this kind of lib from use in the app
| store?
| Owez wrote:
| I've been working on polishing my YouTube archiver project for
| the last while and I've finally released a solid version of it,
| it has an offline web viewer/visualiser for archived channels and
| it's managed using a cli. Most importantly, its easy to use :)
| bityard wrote:
| I have a cron job that uses yt-dlp to download just the audio
| tracks of any videos that I have saved to a public playlist,
| can this be a replacement for that?
| theandrewbailey wrote:
| Can you edit your submission to add a "Show HN:" before the
| title? Like these: https://news.ycombinator.com/show
| prometheus76 wrote:
| yt-dlp and a batch file that runs via Task Scheduler has been
| doing this for me for a couple of years now. I also grab the
| captions and throw that into a database so that I can search
| transcripts for a clip that I can remember but can't remember
| which video it's in. It was a fun weekend project.
| NegativeLatency wrote:
| I run mine with cron and it puts files in a special folder for
| plex:
| https://github.com/nburns/utilities/blob/master/youtube.fish
|
| Pulls from my watch later playlist which is quite handy
| 2OEH8eoCRo0 wrote:
| How do you deal with file numbering?
|
| I prefer the file prefixed with a number that indicates "air
| date". 01 being the first uploaded video. The default is by
| index and the top of the channel or playlist is number 01 which
| is the most recent.
| prometheus76 wrote:
| I just use the publish date in the format of YYYY-MM-DD at
| the beginning of the filename so that they sort properly.
| salutonmundo wrote:
| "Downloading things from YouTube" "Stable"
|
| Ha
|
| Hahahaha
|
| Hahahahahahahahahahaha
| pablo24602 wrote:
| I'd add some more flags/options for downloading specific videos
| or updating the library of downloaded content. I don't really
| want to download all the videos from one specific channel-
| instead I want to download the last 10 videos, for example.
| causality0 wrote:
| Does this have the ability to bet set to "grab highest available
| resolution" instead of specifying one? A lot of the material I'd
| like to archive has material from well before and after Youtube
| started supporting HD resolutions.
| 2OEH8eoCRo0 wrote:
| What if the highest available resolution does not have an audio
| stream?
| NegativeLatency wrote:
| grab the audio stream from something else and stitch them
| together with ffmpeg (like youtube-dl and others do)
| rollcat wrote:
| Normally in modern adaptive streaming, every video variant is
| muxed into a separate stream without audio, and different
| audio variants are muxed into their own individual streams.
| TonyTrapp wrote:
| Only some formats which I guess are to be used with older
| browsers contain both video and audio. In general, these days
| video and audio are delivered through separate streams on
| YouTube.
| svnpenn wrote:
| It's just a wrapper for YT-DLP
| eats_indigo wrote:
| Bit of a noob question here. What's an archiver for?
|
| Is it a library for things you've watched and want to store
| outside of youtube? Or is this for storing content you've created
| / managing your own portfolio of content?
| pessimizer wrote:
| There are coded hints in the link, like:
|
| > Yark lets you continuously archive all videos and metadata
| for YouTube channels. You can also view your archive as a
| seamless offline website
| eats_indigo wrote:
| Snarky and not answering the question, well done.
| iforgotpassword wrote:
| Personally, for me it's archiving. In case I want to go back to
| it. Videos just keep disappearing from YouTube because channels
| get deleted by YouTube, by their owners, videos get
| copystriked, geoblocked, privated, and so on. As I'm lazy as
| f### I didn't create anything as sophisticated as OP, but a
| simple 10 line PHP Script on my home server that just pretends
| being Kodi enough to fool yatse (android remote for Kodi). So
| every time I watch a video on YouTube (on my phone) that I want
| to keep I tap "share" and then "play on Kodi", my php script
| gets the video url from the post data and launches youtube-dl.
| It sucks because I never get feedback if it worked and when
| it's finished, but I log all the URLs and at some point in the
| future I'll eventually add a cronjob that checks the list and
| sends reports and whatnot. Some day.
| eats_indigo wrote:
| Great, thanks for the insight!
| cocacola1 wrote:
| I think it's the latter. I've no issue with most things being
| one-and-done. But some channels have phenomenal content that
| I'd like to keep for the long term. Something might happen to
| their channel that makes it difficult to get, so I'll regularly
| update my downloads with new videos, pictures, etc.
|
| This applies to ripping, too. Funimation removed _Drifters_
| years ago, but I'll always have a copy of it because I ripped
| it. Of course, I need to store it so it still costs money. But
| I can be content that I have the content.
| j1elo wrote:
| Afrer reading the description, this project seems to be solely
| focused on downloading all of a specific channel's videos.
|
| I've been taking my first steps at having a home server, and one
| of the things I'd love to do with it is having an archive of the
| videos that I have saved in my private playlists on YouTube. In
| my mind, the service would periodically check all my playlists,
| compare with what exists locally, and download any missing video.
| Maybe even with a nice web UI so it's easier to visually
| configure and use.
|
| Does such a service already exist so I can self-host it?
| aquova wrote:
| I haven't used it personally, but Tube Archivist might be what
| you're looking for.
|
| https://www.tubearchivist.com/
| erinnh wrote:
| Look at tubearchivist. It can do what you want.
|
| You can subscribe to playlists, as well as automatically update
| and download videos.
| Fang_ wrote:
| Nice web ui aside, if I'm not mistaken youtube-dl already
| supports this kind of usage. You can `youtube-dl --download-
| archive archive.txt https://youtu.be/your-playlist` and it'll
| keep track in the archive.txt of everything it's already
| downloaded. Supplement with authentication options as
| necessary, set up a cronjob, done.
| _0ffh wrote:
| I think youtube-dl as well as yt-dlp can both download
| playlists. You can create a script to download all your
| playlist and make it a cron job. Videos that do already exist
| in the target folder will be skipped automatically.
| mozman wrote:
| I was doing this for awhile but it became expensive - tens of
| TBs on an expensive NAS just to hoard data.
| romwell wrote:
| FSM almighty, how much video are you watching to have that
| many _favorite_ ones?
| amelius wrote:
| How cool would it be if everyone had IPFS running in their
| browser, and everyone dedicated some time to filling it with a
| backup of the internet, including YouTube.
| KMnO4 wrote:
| I did some napkin math. If 1 billion people each backed up
| 10gb, we'd almost have enough to store a copy of YouTube with
| _zero_ data redundancy.
|
| Google is massive.
| CamelCaseName wrote:
| It's unbelievable that YouTube was as free as it was for as
| long as it was.
|
| We got too great a deal for so long that we many people can't
| see things any other way.
| rchaud wrote:
| It's unbelievable that Wikipedia is free and survives on
| donations.
|
| Youtube sells ads and is subsidized by one of the biggest
| ad companies in the world that happens to have a lot of
| cheap cloud storage available.
| yesco wrote:
| Was probably easier when the videos had a time limit and
| didn't support 4K (or 1080p even).
| judge2020 wrote:
| For reference, 1080p was in 2009:
| https://blog.youtube/news-and-events/1080p-hd-comes-to-
| youtu... while ads came out much before then:
| https://blog.youtube/news-and-events/partner-program-
| expands...
| wintermutestwin wrote:
| It is not "free" at all if you are paying with your data.
| My data privacy is worth way more than the cost of
| streaming some video with crap discovery.
| swyx wrote:
| wait, is the total number of videos on YouTube a known
| number? thats fascinating, i'd love to see your assumptions
| for napkin math
| okasaki wrote:
| But most people have more free disk space than 10gb, and most
| people also have more than 1 device. I have 2 phones, 3
| laptops, a desktop and a NAS, with some 40TB between them.
|
| My workplace has a private cloud with some 70PB of storage,
| plus tape, and tons of desktops and laptops.
| masukomi wrote:
| a) i seriously doubt your "most" claim about free disk
| space. Maybe "most privileged white folks in rich
| countries" but not "most people"
|
| b) just because i HAVE 10Gb of free disk space doesn't mean
| i'm going to offer it up for archiving of random internet
| crap
|
| c) if it was on my phone it'd cost money and now we're very
| actively ignoring just how expensive it is to be online in
| 3rd world countries if you're not a rich expat.
|
| > My workplace has a private cloud with some 70PB of
| storage, plus tape, and tons of desktops and laptops.
|
| sure. HOw much of that do you think they'd be willing to
| contribute, for free to backing up random crap from people
| on the internet that may or may not be legal and could open
| them up to litigation because they're not an ISP / platform
| and thus not protected by the Shield laws?
| amelius wrote:
| We're currently at around $0.02 per GB, so that would be
| $0.20 per person. A bargain.
|
| (From a random source on the internet, [1])
|
| [1] https://www.petercai.com/storage-prices-have-sort-of-
| stopped...
___________________________________________________________________
(page generated 2023-01-05 23:00 UTC)