Posts by edsu@social.coop
(DIR) Post #AYHawAOGfT5UqN8IT2 by edsu@social.coop
2023-08-01T06:51:50Z
0 likes, 1 repeats
I guess I'm lacking in imagination, but it wasn't until I read https://marcwatkins.org/2023/07/30/will-2024-look-like-1984/ that the purpose for Google's clumsy proposal for Web Environment Integrity started to slip into place. They are panicking because software quickly needs to be able to determine if it is interacting with a real person and not an AI (aka an LLM fueled bot).https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md
(DIR) Post #AYNrh9VlIGk0jBBsIK by edsu@social.coop
2023-08-04T10:14:02Z
0 likes, 0 repeats
@strypey do you think focusing on how governance *actually* operates in a project (both implicitly and explicitly) can be a helpful way of working with those differences? Or is that a killjoy too?
(DIR) Post #AYOmb7Aoldu37veiXY by edsu@social.coop
2023-08-03T20:12:25Z
0 likes, 0 repeats
@colby I think they excluded all bots, so I don't think it's personal?https://deno.land/robots.txt
(DIR) Post #AZKCrU4rkvnSkqAcBE by edsu@social.coop
2023-08-26T21:33:02Z
1 likes, 0 repeats
let's gooooooo!
(DIR) Post #AZKCrZNq1owZD85Kwi by edsu@social.coop
2023-08-27T14:04:00Z
0 likes, 0 repeats
Update: configured wifi, added 5TB external storage, and moved to an old desk in the basement (under a rubber duck)
(DIR) Post #AapzmQs7oVHJIGwd0q by edsu@social.coop
2023-10-16T16:29:16Z
0 likes, 0 repeats
I was working on a research tool to create a SQLite database from WARC data https://github.com/edsu/warcdb and only learned when I was going to publish it on PyPI that someone else had the same idea, to use the same tools (Python, sqlite-utils, warcio, click, etc) and did a better job of it!https://github.com/florents-Tselai/warcdbNow I'm feeling this weird mixture of gratitude and envy. Is there a word for that?
(DIR) Post #Ab7cBEphfqB2CH942i by edsu@social.coop
2021-09-20T10:36:17Z
0 likes, 0 repeats
@pukkamustard @clacke I don't know what the original service was written in but I think a rewrite started in 2007: https://web.archive.org/web/20070811100100/http://www.oclc.org/news/releases/200669.htmThe federated approach and/or the software failed and it moved to the Internet Archive in 2016? https://blog.archive.org/2016/09/27/persistent-url-service-purl-org-now-run-by-the-internet-archive/I'm not sure if they are running the same service or if it is something new.I think some people got frustrated and created https://w3id.org/ which is basically some Apache rewrite rules maintained with Git.
(DIR) Post #Ab7cFmKb4Or1FSlCi0 by edsu@social.coop
2021-09-20T10:38:08Z
0 likes, 0 repeats
@pukkamustard @clacke I dropped a question in the Internet Archive Slack to see if anyone knows what software is running it now.
(DIR) Post #Ab7cFotPXvtHC142Fc by edsu@social.coop
2021-09-20T14:58:20Z
1 likes, 0 repeats
@pukkamustard @clacke @how yes, I just heard the same from Mark Graham in IA Slack. He said that it's a Python app (with some cache layers) that sits on top of data in IA storage.It turns out you can browse the PURL data in this IA collection:https://archive.org/details/purl_collectionEach namespace is an item, that has a JSON file with the mappings in it. So DublinCore is:https://archive.org/details/purl_dcand it has JSON file:https://ia601205.us.archive.org/29/items/purl_dc/purl_dc_purl.jsonKinda cool :-)
(DIR) Post #AbN67QSIrkIYIX3hPE by edsu@social.coop
2023-11-01T14:50:52Z
0 likes, 0 repeats
I accidentally did:$ vi URLand it worked!? I initially thought it might be a neovim thing (I alias vi to nvim), but it works in vim too?You learn something new every day I guess.
(DIR) Post #AbPlaONMA7gagMazOy by edsu@social.coop
2023-11-02T17:28:24Z
2 likes, 4 repeats
TIL that there is a URI scheme for dictionaries, and curl supports it:curl dict://dict.org/define:mastodon
(DIR) Post #AdPgWqfGBWZzv0e42K by edsu@social.coop
2024-01-01T19:31:46Z
0 likes, 0 repeats
A patch of moss on our delightfully, dilapidated driveway. I hope the moss wins. #mosstodon
(DIR) Post #Ai1wjki8vDFiLBtHKC by edsu@social.coop
2024-05-18T23:03:17Z
0 likes, 0 repeats
@internetarchive it would've been interesting if they also looked at the missing, to see what was available in a web archive.
(DIR) Post #AlsDI0MgDwZkpplsS8 by edsu@social.coop
2024-09-06T12:42:23Z
0 likes, 1 repeats
This is an important vote for @SocialCoop about whether bridged Bluesky posts should circulate more visibly in our our little corner of the fediverse. Many thanks to @flancian for bringing it up for discussion, and articulating a proposal.https://www.loomio.com/p/izOUTs1l/proposal-remove-the-instance-wide-limit-on-bridgy-fed-bluesky-bridge-and-snarfed-orgProtip: once the page loads scroll up to the top to see the proposal. I don't know what the Loomio designers were thinking when they implemented that "feature".
(DIR) Post #AlsDI2UuIKLRRvdTYO by edsu@social.coop
2024-09-06T12:54:18Z
0 likes, 0 repeats
To read more of the (fascinating) discussion see https://www.loomio.com/d/W6tL5cvp/the-bluesky-bridge/ A core theme I saw articulated is what sort of vision the co-op has for an Open Web that is suffused with surveillance capitalism.It seems to me that this is not an easy problem to tackle in the abstract, and that it is best addressed in particular decisions like this.But perhaps there is a place to write down shared general principles, so we don't need to constantly re-litigate decisions? I would not be opposed!
(DIR) Post #Aqi6pvEytTureDsd8K by edsu@social.coop
2025-02-02T12:49:53Z
0 likes, 2 repeats
It looks like the CDC uploaded a 98 GB snapshot of their datasets as of January 28, 2025 to @internetarchivehttps://archive.org/details/20250128-cdc-datasets
(DIR) Post #Aqi6q3OqYgYqx8Ycca by edsu@social.coop
2025-02-02T13:20:51Z
0 likes, 0 repeats
... & why it is needed https://www.404media.co/the-cdcs-website-is-being-actively-purged-to-comply-with-trump-dei-order/
(DIR) Post #AsvKe3EBVJ6RrUK8Z6 by edsu@social.coop
2025-04-09T08:39:54Z
0 likes, 1 repeats
Archiving everything on the web isn't an option. So we need better tools and methods for making decisions about what is in need of #WebArchiving.Here is a short case study in evaluating how much the US Census FTP site has been archived by the @internetarchive and the End of Term Web Archive--both are critical pieces of memory infrastructure.https://inkdroid.org/2025/04/08/census/Thanks to @andrewjbtw for doing the heavy lift of collecting the data. All potential errors are my own.
(DIR) Post #AuMGdeiWVEwSUhwt4C by edsu@social.coop
2025-05-20T12:51:33Z
0 likes, 0 repeats
@asrg @jsbarretto what effect do you think this technique has on less extractive automated agents that are trying to index and archive parts of the web? maybe the robots.txt should encourage them not to get lost in the maze?
(DIR) Post #AuUkCicrgAMEmriEDo by edsu@social.coop
2025-05-26T16:05:18Z
1 likes, 0 repeats
> One thing that should be clear is that there’s incredible symmetry between what people like and dislike about standard libraries.https://alexgaynor.net/2025/may/19/standard-libraries/There is the seed of a software studies dissertation in here me thinks...