hngopher.com

       [HN Gopher] HTTrack Website Copier
       ___________________________________________________________________
        
       HTTrack Website Copier
        
       Author : iscream26
       Score  : 36 points
       Date   : 2024-10-03 18:53 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | xnx wrote:
       | Great tool. Does it still work for the "modern" web (i.e. now
       | that even simple/content websites have become "apps")?
        
         | alganet wrote:
         | Nope. It is for the classic web (the only websites worth saving
         | anyway).
        
           | freedomben wrote:
           | Even for classic web, if it's behind cloudflare, then HTTrack
           | no longer works.
           | 
           | It's a sad point to be at. Fortunately, the single file
           | extension still works really well for single pages, even when
           | they are built dynamically by JavaScript on the client side.
           | There isn't a solution for cloning an entire site though, at
           | least that I know of
        
       | dark-star wrote:
       | oh wow that brings back memories. I have used httrack in the late
       | 90s and early 2000's to mirror interesting websites from the
       | early internet, over a modem connection (and early DSL)
       | 
       | Good to know they're still around, however, now that the web is
       | much more dynamic I guess it's not as useful anymore as it was
       | back then
        
       | Alifatisk wrote:
       | Good ol' days
        
       | corinroyal wrote:
       | One time I was trying to create an offline backup of a botanical
       | medicine site for my studies. Somehow I turned off depth of link
       | checking and made it follow offsite links. I forgot about it. A
       | few days later the machine crashed due to a full disk from trying
       | to cram as much of the WWW as it could on there.
        
       | Felk wrote:
       | Funny seeing this here now, as I _just_ finished archiving an old
       | MyBB PHP forum. Though I used `wget` and it took 2 weeks and
       | 260GB of uncompressed disk space (12GB compressed with zstd), and
       | the process was not interruptible and I had to start over each
       | time my hard drive got full. Maybe I should have given HTTrack a
       | shot to see how it compares.
       | 
       | If anyone wanna know the specifics on how I used wget, I wrote it
       | down here: https://github.com/SpeedcubeDE/speedcube.de-forum-
       | archive
       | 
       | Also, if anyone has experience archiving similar websites with
       | HTTrack and maybe know how it compares to wget for my use case,
       | I'd love to hear about it!
        
       ___________________________________________________________________
       (page generated 2024-10-03 23:00 UTC)