Post AYJM7v72BdYJwG8dXc by bonifartius@freespeechextremist.com
 (DIR) More posts by bonifartius@freespeechextremist.com
 (DIR) Post #AYJ9Frs0pIyqnNZUDg by r000t@ligma.pro
       2023-08-02T03:37:17Z
       
       4 likes, 0 repeats
       
       The guy I worked for between 2015-2018 wants to do Mastodon hosting and I'm like "you can't put my name aaaaaanywhere near it lmao""Oh, it'll be fine. Nobody will harass you for selling out""No you don't get it, they'll harass *you* and instruct everyone to block your whole AS"
       
 (DIR) Post #AYJ9QiXNrjMf2I610C by Hoss@shitpost.cloud
       2023-08-02T03:39:23.663223Z
       
       2 likes, 1 repeats
       
       Sometimes I forget that normies lack any real frame of reference for how deranged people on the Internet can be.
       
 (DIR) Post #AYJDuAg0ivOOWSQngW by r000t@ligma.pro
       2023-08-02T04:29:24Z
       
       1 likes, 1 repeats
       
       Anyway, one of the big problems I want to solve before doing Mastodon hosting is... how can we take advantage of the fact that we're hosting dozens or hundreds of instances to - Save on disk/DB space by deduplicating statuses and especially attachments - Save on bandwidth by ensuring traffic between our customers doesn't leave the internal network - Maybe even put group multiple customers together with custom daemon software that know what different domains are - Use the resulting economies of scale to provide affordable multi-homing, high-availability, and failover, so we don't become, and aren't subject to, single points of failure (how much of the network would disappear during an OVH/Hetzner outage?)
       
 (DIR) Post #AYJM7v72BdYJwG8dXc by bonifartius@freespeechextremist.com
       2023-08-02T06:01:40.397095Z
       
       1 likes, 0 repeats
       
       @r000t > - Save on disk/DB space by deduplicating statuses and especially attachments maybe @p s webvac helps, at least for the same status across instances. clients posting the same attachment likely results in different files because everyone reencodes or does other shenanigans now :clownworld:
       
 (DIR) Post #AYJN9Uh7Tl5gXQKbR2 by r000t@ligma.pro
       2023-08-02T06:13:08Z
       
       2 likes, 0 repeats
       
       @bonifartius @p It definitely still helps for *remote media* If we have 1,000 customers, and even 20% of them have at least one user that, for example, follows shitpostbot, that's 200 users that only need to store the image one time among them. If the daemon is aware of this, and not just the S3 backend (like Jortage), then we could even skip *fetching* the file.
       
 (DIR) Post #AYKY1jDWtRMLutwykC by p@freespeechextremist.com
       2023-08-02T19:49:43.548765Z
       
       1 likes, 0 repeats
       
       @bonifartius @r000t webvac just does this with media; the hefty bits of storage for statuses are actually in the indexes of the metadata.Revolver does deduplicate metadata (where possible; for example, to/cc fields are sorted and then stored independently, so, for example, '["https://ligma.pro/users/r000t","https://www.w3.org/ns/activitystreams#Public","https://freespeechextremist.com/users/p"]'  (122 bytes) turns into a 64-byte hash) and the post is compressed most of the time (when creating a block, after truncating zeroes, it tries to compress the data, and stores compressed data when it's smaller, which is almost always the case with textual data, like JSON).
       
 (DIR) Post #AYKYZHtyxPDX2toFgu by p@freespeechextremist.com
       2023-08-02T19:55:47.456200Z
       
       0 likes, 0 repeats
       
       @r000t @bonifartius > that's 200 users that only need to store the image one time among them. Well, redundancy.  But breaking it into blocks (like venti does, so webvac does that for free, Revolver does this because shamelessly steal from venti) means you can fetch just the top-level block and if you have the children, nothing further needs fetching from the network.