[HN Gopher] Make Your Own Internet Archive with Archive Box
       ___________________________________________________________________
        
       Make Your Own Internet Archive with Archive Box
        
       Author : adamhearn
       Score  : 45 points
       Date   : 2021-01-19 17:48 UTC (5 hours ago)
        
 (HTM) web link (nixintel.info)
 (TXT) w3m dump (nixintel.info)
        
       | blastro wrote:
       | i use this every single day and think very highly of it. thanks
       | for reminding me - i'm going to sponsor this developer on
       | github...
        
       | mikece wrote:
       | This would be a nice thing to be able to run on a Synology NAS or
       | other kind of device that typically has terabytes of storage.
        
       | remirk wrote:
       | This article is blogspam.
       | 
       | The repository has enough information on its own:
       | https://github.com/ArchiveBox/ArchiveBox
        
         | Robotbeat wrote:
         | Disagree. I find links to repositories to be less accessible
         | than blog posts.
        
           | klelatti wrote:
           | It has its own website too.
           | 
           | https://archivebox.io/
        
       | greypowerOz wrote:
       | so.. you CAN have a box that is "the internet"....
        
       | jedimastert wrote:
       | Is there a list of web page archive formats I could look at?
       | There are a few things I'd love to do where it would be very
       | handy to have one file per page
        
       | evc wrote:
       | You will need a lot of disk storage right?
        
         | flas9sd wrote:
         | it doesn't show in the Screenshot in the article, but
         | ArchiveBox in Aug 2020 implemented the "readability article
         | text extractor", see description in the release notes:
         | https://github.com/pirate/ArchiveBox/releases/tag/v0.4.14 and
         | the module that does the work
         | https://github.com/pirate/readability-extractor
         | 
         | By only extracting text and article images you could go deep
         | into an archive. If you skip images, much more so
        
         | Ace_Archer wrote:
         | That probably depends on the scope of what you're looking to
         | archive. If you're looking to make up local backup of your
         | bookmarks folder (as one of the intentions seems to be),
         | probably not an unreasonable amount of storage. Maybe a few GB
         | at most(if you have a moderate to large bookmarks folder),
         | depending on how many sites/heavy the sites are?
        
       ___________________________________________________________________
       (page generated 2021-01-19 23:00 UTC)