Post A0eGUJFt1SE0EdcrmC by elr@raru.re
 (DIR) More posts by elr@raru.re
 (DIR) Post #A0e8x3pKfF6x9LP6yu by fluffy@social.handholding.io
       2020-10-29T07:11:31.182152Z
       
       5 likes, 5 repeats
       
       Hello fediI would like to find a way I can have a cold copy of articles that I view, something like a self-hosted archive.is … but it does it automatically for every site I visit. Please help me find out how or where!
       
 (DIR) Post #A0e95hle3AdxL7njeK by Gamercat@zefirchik.xyz
       2020-10-29T07:13:04.776372Z
       
       1 likes, 0 repeats
       
       @fluffy https://github.com/pirate/ArchiveBox
       
 (DIR) Post #A0e9Cwa9MtEjJUsOZM by icedquinn@blob.cat
       2020-10-29T07:14:23.199486Z
       
       0 likes, 0 repeats
       
       @Gamercat @fluffy pywb also kind-of does this.
       
 (DIR) Post #A0e9IoxobcRtQIkQzY by Gamercat@zefirchik.xyz
       2020-10-29T07:15:25.473440Z
       
       0 likes, 0 repeats
       
       @icedquinn @fluffy pywb?
       
 (DIR) Post #A0e9OoCANNN0mFJdMu by lanodan@queer.hacktivis.me
       2020-10-29T07:16:32.154454Z
       
       0 likes, 0 repeats
       
       @fluffy AFAIK wallabag has this feature
       
 (DIR) Post #A0e9QjfiAIi9KL3PRQ by icedquinn@blob.cat
       2020-10-29T07:16:52.720915Z
       
       0 likes, 0 repeats
       
       @lanodan @fluffy don't think wallabag is automatic :blobcatthink:
       
 (DIR) Post #A0e9V5UMzY4I9LpAFU by lanodan@queer.hacktivis.me
       2020-10-29T07:17:40.233294Z
       
       0 likes, 0 repeats
       
       @icedquinn @fluffy automatism just means hooking a program to me.
       
 (DIR) Post #A0e9XvsZB201DMUyOm by jojo@jojo.singleuser.club
       2020-10-29T07:18:11.410018Z
       
       1 likes, 0 repeats
       
       @fluffy @p
       
 (DIR) Post #A0e9YQ0PAS5weMOemW by fluffy@social.handholding.io
       2020-10-29T07:18:16.032793Z
       
       0 likes, 0 repeats
       
       @icedquinn @Gamercat is there some way to have archivebox auto archive all pages you visit? The author says he doesn’t want everything saved but I don’t know now what won’t be available later
       
 (DIR) Post #A0e9aLgiBvdzq26ckK by fluffy@social.handholding.io
       2020-10-29T07:18:37.754631Z
       
       2 likes, 0 repeats
       
       @jojo @p lol GAY
       
 (DIR) Post #A0e9c7Sg3VmIIMNy8u by icedquinn@blob.cat
       2020-10-29T07:18:56.027148Z
       
       1 likes, 0 repeats
       
       @lanodan @fluffy pyweb you run as a proxy and it does a local wayback clone while you browse. archivebox you run cron jobs and it scrapes your history and archives from it.wallabag you have to click a send to wallabag button.fluffy doesn't want to have to push button.
       
 (DIR) Post #A0e9gef8Ybi4QNNO4m by icedquinn@blob.cat
       2020-10-29T07:19:46.354171Z
       
       1 likes, 0 repeats
       
       @fluffy @Gamercat there used to be a way to have it read your firefox history.
       
 (DIR) Post #A0e9h5VUkZ2XgI4ZIO by fluffy@social.handholding.io
       2020-10-29T07:19:50.292559Z
       
       0 likes, 0 repeats
       
       @icedquinn @lanodan i will forget to press it uwu
       
 (DIR) Post #A0e9inaSLZzYgND1kG by jojo@jojo.singleuser.club
       2020-10-29T07:20:09.670090Z
       
       2 likes, 1 repeats
       
       @fluffy @p Here's a treat
       
 (DIR) Post #A0e9n5ggzX25ykAlOa by icedquinn@blob.cat
       2020-10-29T07:20:55.526845Z
       
       3 likes, 0 repeats
       
       @fluffy @lanodan a lot of news sites also go :blobcatgoogly: towards wallabag.
       
 (DIR) Post #A0e9nelsYEygmxrtq4 by Gamercat@zefirchik.xyz
       2020-10-29T07:21:00.555251Z
       
       1 likes, 0 repeats
       
       @fluffy @icedquinn Previously there was an auto-update of archives.
       
 (DIR) Post #A0eA6PAWmj8PtuKzaK by lanodan@queer.hacktivis.me
       2020-10-29T07:24:24.153477Z
       
       0 likes, 0 repeats
       
       @icedquinn @fluffy Yeah but writing that feature (ie. browser extension automatically saving non-saved articles) shouldn't be that hard.But if ArchiveBox fits that's nice.
       
 (DIR) Post #A0eABy9O9SDTqHSSCu by icedquinn@blob.cat
       2020-10-29T07:25:25.608410Z
       
       1 likes, 0 repeats
       
       @Gamercat @fluffy https://github.com/pirate/ArchiveBox/blob/ec4db1f75e09b43f2d1d3acc4f6b0d563fb66a3f/bin/export_browser_history.sh :blobcatread:
       
 (DIR) Post #A0eADURNaCScHIb9Ie by fluffy@social.handholding.io
       2020-10-29T07:25:41.612032Z
       
       1 likes, 0 repeats
       
       @icedquinn @lanodan i was thinking that this stuff would be executed in a web browser plugin and then the document would be saved and rsync’d or somethingeverything seems to have overlays, lazy-load, etc. even zotero fails to archive loads of things
       
 (DIR) Post #A0eAFiKktp9vwiYo8O by fluffy@social.handholding.io
       2020-10-29T07:26:06.256609Z
       
       1 likes, 0 repeats
       
       @icedquinn @Gamercat :senkonom:
       
 (DIR) Post #A0eANHolMwpm1bCtWK by lanodan@queer.hacktivis.me
       2020-10-29T07:27:27.793424Z
       
       0 likes, 0 repeats
       
       @fluffy @icedquinn Otherwise install something like Netscape, caching documents for offline reading (offline mode comes from there) and being able to extract them was built-in at the time.
       
 (DIR) Post #A0eAS14r2MnjYLzNmS by icedquinn@blob.cat
       2020-10-29T07:28:18.816782Z
       
       2 likes, 1 repeats
       
       @fluffy @Gamercat i haven't used achivebox in quite a while. there are ways to get uhh. forget what its called. puppeteer? there's hacked up versions of chrome that run headless, which allows scripts to load pages, wait for renders and then dump pdf's of the content.
       
 (DIR) Post #A0eAemaQE13CNJqGIq by fluffy@social.handholding.io
       2020-10-29T07:30:36.824020Z
       
       0 likes, 0 repeats
       
       @lanodan @icedquinn I also want to post these as links to people, if that’s possible>NetscapeI’m moving away from Firefox (slowly) and towards ungoogled-chromium because I expect ff to become deprecated in a year or two. Netscape is in the opposite direction, but I appreciate the sentiment. When I was a kid, full text search of browser history was standard…
       
 (DIR) Post #A0eAjCjQxX46ccAKOG by fluffy@social.handholding.io
       2020-10-29T07:31:25.667665Z
       
       0 likes, 0 repeats
       
       @icedquinn @Gamercat that sounds cool :blobastolfo3c: Did you use archivebox before? Why, and why did you stop?
       
 (DIR) Post #A0eAu8dAXyuT5WRtku by icedquinn@blob.cat
       2020-10-29T07:33:23.732108Z
       
       2 likes, 0 repeats
       
       @fluffy @Gamercat it kept breaking on some links. i'd have it record some comp-sci paper off a weird uni site and it would just die and time out. but there was no *easy* way to work around it, just edit some big yaml file or something.i just keep a zettelkasten now. but even then, it's pretty rare to need to produce an old deleted source. usually people just go "no obviously you're a nazi, quinn" :cirno_shrug:
       
 (DIR) Post #A0eBAQ8xCerg2I51sm by Gamercat@zefirchik.xyz
       2020-10-29T07:36:20.267145Z
       
       0 likes, 0 repeats
       
       @fluffy @icedquinn i used archivbox via termux, weighed a lot due to the dependence of google chrome and google downloaded the graphical shell, which in the end everything is fun about a gigabyte.
       
 (DIR) Post #A0eBNg1bT3P33WxlY0 by Gamercat@zefirchik.xyz
       2020-10-29T07:38:44.153660Z
       
       0 likes, 0 repeats
       
       @icedquinn @fluffy And my question is, can I make archives through the cli browsers?
       
 (DIR) Post #A0eBPGuxmnuplbcMRU by icedquinn@blob.cat
       2020-10-29T07:39:02.422562Z
       
       0 likes, 0 repeats
       
       @Gamercat @fluffy lynx?
       
 (DIR) Post #A0eBVvEMogEwX5rft2 by Gamercat@zefirchik.xyz
       2020-10-29T07:40:13.900021Z
       
       0 likes, 0 repeats
       
       @icedquinn @fluffy yeap, i can make archives via lynx?
       
 (DIR) Post #A0eBYYWPhiWkyGYt96 by icedquinn@blob.cat
       2020-10-29T07:40:42.272583Z
       
       0 likes, 0 repeats
       
       @Gamercat @fluffy :blobcatghostreach: it is a mystery
       
 (DIR) Post #A0eBcO0OkIKjVtPgHo by oldcoder@mastodon.oldcoder.org
       2020-10-29T07:41:19Z
       
       3 likes, 1 repeats
       
       @fluffy 1) Yes, you can do this up to a point. However, manual capture works better than automatic.2) One approach is Pale Moon plus ScrapBook X. This  approach is simple, but it only works for some sites.3) For near-perfect captures, learn to use WARC toolsets. I've tested openwayback and pywb and suggest starting with those two. This isn't plug and play but I've gotten it to work pretty well.4) Links:https://github.com/webrecorder/pywbhttps://github.com/iipc/openwayback/wikihttps://www.loc.gov/preservation/digital/formats/fdd/fdd000236.shtml
       
 (DIR) Post #A0eBhP5o8AS2KjKtOa by Gamercat@zefirchik.xyz
       2020-10-29T07:42:17.873551Z
       
       0 likes, 0 repeats
       
       @icedquinn @fluffy Therefore, I want to centralize my knowledge from articles and sites in one place, yes via termux
       
 (DIR) Post #A0eBsCHe5wQdnB1IEC by fluffy@social.handholding.io
       2020-10-29T07:44:15.540030Z
       
       0 likes, 0 repeats
       
       @oldcoder Thanks for the detailed explanation. It looks like I’ll have to do some building to get exactly what I want!
       
 (DIR) Post #A0eCdz6mNpIcpVix9s by Gamercat@zefirchik.xyz
       2020-10-29T07:52:53.224920Z
       
       0 likes, 0 repeats
       
       @oldcoder @fluffy Can I configure openwayback via lynx?
       
 (DIR) Post #A0eDITEhmv2NqwjizI by fluffy@social.handholding.io
       2020-10-29T08:00:11.835064Z
       
       1 likes, 1 repeats
       
       @oldcoder This looks promising:https://github.com/rhizome-conifer/coniferhttps://conifer.rhizome.org/
       
 (DIR) Post #A0eDNjshKUjOvNxGaG by oldcoder@mastodon.oldcoder.org
       2020-10-29T07:59:44Z
       
       1 likes, 0 repeats
       
       @fluffy 1) These tools do work though in different ways. One of the WARC tools saves captures in multiple formats. You'll find that only the WARC format and multimedia files work well.2) I started to do this in 2006 using pre-Quantum Firefox and the original ScrapBook extension. It's nice to still be able to read websites that are long gone.3) There's one more approach that's tricky but automatic as you wished. One sets up a copy of Squid to decode https and cache content forever.
       
 (DIR) Post #A0eDXnKvL0U8XKx6Zc by fluffy@social.handholding.io
       2020-10-29T08:02:57.906020Z
       
       0 likes, 0 repeats
       
       @oldcoder My main grievance is that I on occasion want to refer to an article I read, or to find it by searching by browser history, but if I store a link in my database years later it is gone, and browser history is no longer full text searchable, let alone far in the past.Is there some reason you don’t automatically archive all of the web pages you visit?
       
 (DIR) Post #A0eDr57pQKeVD41XCy by oldcoder@mastodon.oldcoder.org
       2020-10-29T08:04:20Z
       
       1 likes, 0 repeats
       
       @fluffy I noticed Conifer but thought that it was a commercial service. However, I see that you've found a FOSS core. I'll give it a try later this Fall.
       
 (DIR) Post #A0eDr5MMYIHJw8p8rY by fluffy@social.handholding.io
       2020-10-29T08:06:27.707303Z
       
       1 likes, 0 repeats
       
       @oldcoder I look forward to reading about it. Do you have a newsletter or an RSS feed?
       
 (DIR) Post #A0eFQ63C4jx8AX7GZk by oldcoder@mastodon.oldcoder.org
       2020-10-29T08:19:23Z
       
       1 likes, 0 repeats
       
       @fluffy 1) Regarding the grievance: Yes, I started to write my own tools in 1995 to capture websites for this and other reasons.2) Q. Why not capture everything? A. In the past, disks cost a lot more and complete and automated captures, especially for https sites, were more difficult.But 1 TB disks, even SSDs, are cheap now and the Squid approach is both fast and automatic. So I'll probably capture more pages automatically in the future.
       
 (DIR) Post #A0eGUJFt1SE0EdcrmC by elr@raru.re
       2020-10-29T08:31:16Z
       
       2 likes, 0 repeats
       
       @fluffy @oldcoder mmm thank you for sharing the links un public. interesting thing.
       
 (DIR) Post #A0eHmcH796dqOHUeRs by oldcoder@mastodon.oldcoder.org
       2020-10-29T08:47:59Z
       
       1 likes, 1 repeats
       
       @Gamercat @fluffy I think that openwayback per se is designed more for interaction with full web browsers but there are parts of these toolsets that you can run in CLI. Lynx could be used to trigger captures by way of CLI scripts.
       
 (DIR) Post #A0eHtRZ9MPPLCiBWmu by Gamercat@zefirchik.xyz
       2020-10-29T08:51:42.684538Z
       
       0 likes, 0 repeats
       
       @oldcoder @fluffy I think I have some very funny tests on termux
       
 (DIR) Post #A0eTVP6A59L23Dvfua by p@freespeechextremist.com
       2020-10-29T11:01:50.861378Z
       
       0 likes, 0 repeats
       
       @jojo @fluffy I've sketched out a system for this.I didn't get the contract, it went to the owner's nephew.
       
 (DIR) Post #A0eTqWCoqtRsAw6yWW by fluffy@social.handholding.io
       2020-10-29T11:05:38.258586Z
       
       1 likes, 0 repeats
       
       @p @jojo so other people want it too...?Why did this guy want it
       
 (DIR) Post #A0eWlri5ltrFHiDCgi by p@freespeechextremist.com
       2020-10-29T11:38:26.190704Z
       
       2 likes, 0 repeats
       
       @fluffy @jojo This guy wanted it for some government contract to monitor terrorist activity.