[HN Gopher] HTTrack Website Copier
___________________________________________________________________
HTTrack Website Copier
Author : iscream26
Score : 36 points
Date : 2024-10-03 18:53 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xnx wrote:
| Great tool. Does it still work for the "modern" web (i.e. now
| that even simple/content websites have become "apps")?
| alganet wrote:
| Nope. It is for the classic web (the only websites worth saving
| anyway).
| freedomben wrote:
| Even for classic web, if it's behind cloudflare, then HTTrack
| no longer works.
|
| It's a sad point to be at. Fortunately, the single file
| extension still works really well for single pages, even when
| they are built dynamically by JavaScript on the client side.
| There isn't a solution for cloning an entire site though, at
| least that I know of
| dark-star wrote:
| oh wow that brings back memories. I have used httrack in the late
| 90s and early 2000's to mirror interesting websites from the
| early internet, over a modem connection (and early DSL)
|
| Good to know they're still around, however, now that the web is
| much more dynamic I guess it's not as useful anymore as it was
| back then
| Alifatisk wrote:
| Good ol' days
| corinroyal wrote:
| One time I was trying to create an offline backup of a botanical
| medicine site for my studies. Somehow I turned off depth of link
| checking and made it follow offsite links. I forgot about it. A
| few days later the machine crashed due to a full disk from trying
| to cram as much of the WWW as it could on there.
| Felk wrote:
| Funny seeing this here now, as I _just_ finished archiving an old
| MyBB PHP forum. Though I used `wget` and it took 2 weeks and
| 260GB of uncompressed disk space (12GB compressed with zstd), and
| the process was not interruptible and I had to start over each
| time my hard drive got full. Maybe I should have given HTTrack a
| shot to see how it compares.
|
| If anyone wanna know the specifics on how I used wget, I wrote it
| down here: https://github.com/SpeedcubeDE/speedcube.de-forum-
| archive
|
| Also, if anyone has experience archiving similar websites with
| HTTrack and maybe know how it compares to wget for my use case,
| I'd love to hear about it!
___________________________________________________________________
(page generated 2024-10-03 23:00 UTC)