Post APWaLxKbyUI1QVCxQu by chopsstephens@mastodon.nzoss.nz
 (DIR) More posts by chopsstephens@mastodon.nzoss.nz
 (DIR) Post #APWaLu38B1WZFQ0SGm by chopsstephens@mastodon.nzoss.nz
       2022-11-11T02:37:09Z
       
       0 likes, 1 repeats
       
       I'm curious if anyone has done any modelling of the scaling characteristics of the entire Mastodon network as a distributed system. I'm curious if there's scaling tipping points to be watched out for: total number of Mastodon users; total number of Mastodon instances; toot volume per user; interconnectedness of servers (as users connections are spread across more instances).#mastodon #scaling
       
 (DIR) Post #APWaLv4wLmZ8RJrPjE by rook@hulvr.com
       2022-11-11T03:35:48Z
       
       0 likes, 0 repeats
       
       @chopsstephens I think the main numbers are going to be per-instance user count and global (federated) message volume (as seen by that instance) against the available resources, on an instance-by-instance basis, plus some concern about things like media caches.That said, gargamel is apparently running three separate instances, so the limits have probably been explored.Nb. Twitter was launched on a Ruby codebase and eventually had to switch off.
       
 (DIR) Post #APWaLw3CjiltSE3Xf6 by chopsstephens@mastodon.nzoss.nz
       2022-11-11T03:42:42Z
       
       0 likes, 0 repeats
       
       @rook I was wondering if there's going to be anything subtle, like the interconnectedness of instances. Maybe as Mastodon spreads across more instances, and on average people are following more instances to follow the same number of people, there's some implication there. I'm still reading documentation and code to try and understand exactly how instances communicate with each other.
       
 (DIR) Post #APWaLwAeI2j5pJXUGW by chopsstephens@mastodon.nzoss.nz
       2022-11-11T02:38:23Z
       
       0 likes, 0 repeats
       
       I did some searching. https://docs.joinmastodon.org/admin/scaling/ talks about scaling individual instances. I saw a few Reddit posts, but no concrete analysis.
       
 (DIR) Post #APWaLx03CvqKOjaXNw by rook@hulvr.com
       2022-11-11T03:48:28Z
       
       0 likes, 0 repeats
       
       @chopsstephens There could be, but that would probably involve significant overhead of managing federation vs. dealing with user-driven resource usage. I think the user count is usually going to be high enough that it will be in the noise, but you might notice a significant effect on small instances.Fwiw, some single-user admins have complained of the resource usage but I'm not sure if that's from the process of initial federating or from steady-state stuff.
       
 (DIR) Post #APWaLxKbyUI1QVCxQu by chopsstephens@mastodon.nzoss.nz
       2022-11-11T03:54:55Z
       
       0 likes, 0 repeats
       
       @rook this is why I'm interested. I'm considering trying to build a back of envelope model to get a feel for some of these things. A big ask right now considering how little I understand 😁The small instance comment is interesting. If there's a certain point where you need significant hardware to get started as a new instance, that could halt growth of the Mastodon federation [*].[*] Does anyone call it the Mastodon federation? Sounds very Star Trek lol.
       
 (DIR) Post #APWaLxnKFjG4rYdtBo by strypey@mastodon.nzoss.nz
       2022-11-12T10:23:17Z
       
       0 likes, 0 repeats
       
       @chopsstephens You're asking some great questions. One place folks gather around the watering hole to explore such questions is:http://socialhub.activitypub.rocks/> Does anyone call it the Mastodon federation?No. We call it the fediverse. Mainly because as @lightweight mentioned briefly in his replies,  Mastodon is far from the only software in use in the overall network, see:https://fediverse.partyActivityPub isn't even the only protocol, although it is by far the most commonly used.@rook
       
 (DIR) Post #APWaLyKIH9d6Vo4DZo by chopsstephens@mastodon.nzoss.nz
       2022-11-11T03:48:36Z
       
       0 likes, 0 repeats
       
       @rook As far as use of Ruby on Rails goes, I think federation may save Mastodon here; that each instance remains small enough that RoR will be sufficiently performant. I'd be more worried about Postgres; is there a point where a relational database cannot scale to meet demand, or if the cost per user becomes prohibitive as the network grows.
       
 (DIR) Post #APWaM1eFvONcoaQhbk by chopsstephens@mastodon.nzoss.nz
       2022-11-11T02:47:06Z
       
       0 likes, 1 repeats
       
       My thinking here is that whilst it's inevitable that complex distributed systems will exhibit surprising behavior as they scale, it would be worthwhile to have considered what might break down if Mastodon reaches 10 million or 100 millions users. Are there any intrisic limits in the current architecture?
       
 (DIR) Post #APWdZmLLvND3GRpz72 by rook@hulvr.com
       2022-11-11T03:57:51Z
       
       0 likes, 0 repeats
       
       @chopsstephens Not sure why you'd call out Postgres, it is competitive. At least when setup correctly. I haven't looked at the schema or queries, so I couldn't promise they are reasonable though.But Ruby might create disproportionately more overhead for the small instances. I don't know what their budget is, but anything done with Rails tends to have a certain up-front cost. Maybe steady state is okay if the JIT is any good these days.
       
 (DIR) Post #APWdZnGmTr9A8Yhqcq by chopsstephens@mastodon.nzoss.nz
       2022-11-11T04:09:53Z
       
       0 likes, 0 repeats
       
       @rook don't get me wrong, Postgres is a great RDBMS. I'm calling out the general usage of an RDBMS at all.This is hangover from my AWS time; I've seen personally how RDBMSes have non-linear scaling characteristics, so when you hit limits they don't slowly brown out, you rapidly go from performing well to completely stuffed. It's one of the arguments for using non-relational data stores for highly scalable services - predictable scaling characteristics and horizontal scaling.
       
 (DIR) Post #APWdZnoSSe5Lp0Sk7M by rook@hulvr.com
       2022-11-11T04:30:09Z
       
       0 likes, 0 repeats
       
       @chopsstephens Ah, good point. In that kind of scenario I think the message volume is going to be the key quantity and if we're lucky it will only knock out enough of the network to bring the volume down enough to keep the problem from going completely global. Heterogeneity and all that... probably medium-large instances survive, then smaller ones come back online.I agree, not likely to happen any time soon, but it could be an eventuality though. Not even a million people here yet.
       
 (DIR) Post #APWdZoCZ11Mr1ljzgu by chopsstephens@mastodon.nzoss.nz
       2022-11-11T04:38:07Z
       
       0 likes, 0 repeats
       
       @rook another possibility is that the network cannot keep up with volumes flowing through the network, either due to API rate limits or simple hardware capacity. That at busy periods, toots are delayed; or maybe something else like media.I think the disaster scenario here is that it gets to a point where it's not just peak volumes that overload the system, it's average volume. At this point, each instance just falls steadily behind, and the network becomes unusable.
       
 (DIR) Post #APWdZodrNXCaOQVnEm by chopsstephens@mastodon.nzoss.nz
       2022-11-11T04:40:32Z
       
       0 likes, 0 repeats
       
       @rook hopefully these issues would be discovered earlier, and that once things start falling behind during busy periods (or earlier if there's signs of issues to be observed before users are impacted), whatever architectural limitation is being hit can be addresed before steady state becomes problematic.
       
 (DIR) Post #APWdZopChMHAxboqv2 by chopsstephens@mastodon.nzoss.nz
       2022-11-11T04:10:22Z
       
       0 likes, 0 repeats
       
       @rook It's hard to imagine a Mastodon instance reaching the point this is more than academic 😁, with mastodon.social coping fine with 173K users.
       
 (DIR) Post #APWdZp8hWrs7w4wQJE by rook@hulvr.com
       2022-11-11T04:49:51Z
       
       0 likes, 0 repeats
       
       @chopsstephens I'm of two minds. I guess it's really two failure modes: degradation vs. just falling over. For example if you have an extreme non-linear scaling issue, at some point the demands on resources may go sufficiently vertical to abruptly end instances. That's what I was thinking about.The other version is a self-moderating problem, where even if scaling is significantly non-linear, it still grows slowly enough to cause a reduction in traffic and so just limps along.
       
 (DIR) Post #APWdZpYDzyHxDEso5o by chopsstephens@mastodon.nzoss.nz
       2022-11-11T05:22:30Z
       
       0 likes, 0 repeats
       
       @rook I mean all of these problems are self-moderating. If instances are overloaded to the point of crashing, people will leave until load drops to non-crashing point😁I guess the question is whether this form of self-moderation is desirable. As a thought experiment, lets say there's an architectural issue that hits around 20 million users. Twitter actually collapses, and suddenly an influx takes Mastodon to 50 million users.
       
 (DIR) Post #APWdZq6xuo4swz8YF6 by rook@hulvr.com
       2022-11-11T05:45:56Z
       
       0 likes, 0 repeats
       
       @chopsstephens I was thinking things merely badly under-performing vs falling over completely in a Morris-worm type scenario, depending on just how much is in flight on the whole network.A total Twitter collapse might create something that looks like early days Twitter. They had bad outages prior to the rewrite. At 50M users... two orders of magnitude from current levels, I'd expect something to give.Bluesky might be the thing that takes over in that case. Worth thinking about.
       
 (DIR) Post #APWdZqXuIdd2IXk4Ei by chopsstephens@mastodon.nzoss.nz
       2022-11-11T05:54:59Z
       
       0 likes, 0 repeats
       
       @rook my suspicion is that architected right, the federated model can scale all the way up. Done well, federation is effectively a cellular architecture. Each server has a limited number of users who follow a limited number of users. I think it's possible to architect so there's hardly any increased load from an order of magnitude increase in the overall network if each instance's connectedness to other instances remains the same.
       
 (DIR) Post #APWdZr0yYYsfkhLHXs by rook@hulvr.com
       2022-11-11T06:09:48Z
       
       0 likes, 0 repeats
       
       @chopsstephens I agree, except a possible issue is when a significant portion of users just leave the federated TL open in a tab all day. Now you're delivering approximately most messages to most users (modulo instance blocks etc.) But I guess we'd need statistics to figure that out.Not too hard to address, you could abridge the federated TL as needed. Non-follow traffic starts dropping out when things start to saturate.
       
 (DIR) Post #APWdZr4WLNiTvh074S by chopsstephens@mastodon.nzoss.nz
       2022-11-11T05:22:46Z
       
       0 likes, 0 repeats
       
       @rook These users have a terrible experience, Mastodon gets bad mouthed in the press. These users move to commercial/centralised alternatives. Mastodon and the fediverse never really recover from the reputational hit, and commercial social media maintains dominance.
       
 (DIR) Post #APWdZra4S4xBVXlJFQ by chopsstephens@mastodon.nzoss.nz
       2022-11-11T08:00:55Z
       
       0 likes, 0 repeats
       
       @rook oh I hadn't thought about the federated feed. Shows how new I am to all this, I guess. And I think you're right, the federated feed just loses fidelity as load increases.
       
 (DIR) Post #APWdZruHEx7IWDDRk8 by rook@hulvr.com
       2022-11-11T18:48:29Z
       
       0 likes, 0 repeats
       
       @chopsstephens so, one of the instances documented recent scaling issues: https://nora.codes/post/scaling-mastodon-in-the-face-of-an-exodus/Seems like they did find a tipping point, otoh the db was (among other things) poorly configured.Strong implication here that incoming federated traffic is to blame, and that stuff people are following isn't distinguished from the rest of it.
       
 (DIR) Post #APWdZsCi8PrVRNqATY by chopsstephens@mastodon.nzoss.nz
       2022-11-11T06:01:46Z
       
       0 likes, 0 repeats
       
       @rook the trick is knowing leading indicators of failure so they can be addressed before they have a serious impact. Which, coming full circle, is why I'm interested in a model of #Mastodon #Performance; something where you can tweak different variables (number of users/servers, user connection patterns, media volume, etc) and see the impact. The model will be wrong, but done well it could give some idea of the metrics to watch, where the tipping points might be.
       
 (DIR) Post #APWdZsQXJ0vA8GJD1c by strypey@mastodon.nzoss.nz
       2022-11-12T10:59:23Z
       
       0 likes, 0 repeats
       
       @rookThere is no ...> rest of it.Instances only import posts from accounts one of their users is following. Period. This has some serious downsides in the UX, like fragmented threads when a lot of users have participated in them, and no in-app way of being able to see the posts of an account nobody on your instance is following yet (excerpt for ones boosted by someone being followed).But it does keep both storage and bandwidth loads down on what they could be.@chopsstephens
       
 (DIR) Post #APWdZw061G9FDPy9Mu by chopsstephens@mastodon.nzoss.nz
       2022-11-11T05:22:56Z
       
       0 likes, 0 repeats
       
       @rook If we're serious about the #fediverse, if we believe the general populace are better of on social media outside of corporate control, we should be thinking about how the next three 10x increases happen - 1 million to 10 million to 100 million to 1 billion. The network doesn't need to scale anywhere near to that today, but we should be thinking in those terms.
       
 (DIR) Post #APWda021243niWtjuq by chopsstephens@mastodon.nzoss.nz
       2022-11-11T05:23:53Z
       
       0 likes, 0 repeats
       
       @rook on a personal note, I just wanted to say thanks for engaging in this discussion with me. I'm finding it invigorating and fun; I hope you are too.
       
 (DIR) Post #APWeHZoHssYn8PFveK by chopsstephens@mastodon.nzoss.nz
       2022-11-12T11:07:23Z
       
       0 likes, 0 repeats
       
       @strypey @rook I think scaling Mastodon would be impossible without this. As it stands, Mastodon can scale as large as it likes, and (hopefully) as long as the level of connectedness of each server to each other server doesn't change (user interaction patterns don't change), each server's load remains constant. Otherwise, every server would have to scale as the Fediverse scales.
       
 (DIR) Post #APWo5KNCS3HdKBNrMW by rook@hulvr.com
       2022-11-12T12:57:09Z
       
       0 likes, 0 repeats
       
       @strypey @chopsstephens There's a lot resting on that period, and I don't think it's right. For example: https://github.com/mastodon/mastodon/issues/7399#issuecomment-387528168I am fairly sure my small instance is not running follow-bots, and yet we see an enormous federated TL. I haven't kept up on how the new searches work either, but I had the impression they were meant to be better than what you could already see in your TL, which is why they needed improvement in the first place.
       
 (DIR) Post #APYxiE4b62YyyzE2nw by strypey@mastodon.nzoss.nz
       2022-11-13T13:54:30Z
       
       0 likes, 0 repeats
       
       @chopsstephens> the level of connectedness of each server to each other serverNot sure what you mean by this. How could this change in a way that might negatively impact there ability of the fediverse to grow?@rook
       
 (DIR) Post #APYxv5Nyhud5ntQHAG by strypey@mastodon.nzoss.nz
       2022-11-13T13:56:50Z
       
       0 likes, 0 repeats
       
       Me:> This has some serious downsides in the UXWhich I'm guessing could be fixed. It's all basically web stuff, so it would be possible to, for example, have a fedi app load the last X posts for an account when someone looks at its profile page, with a load more button that pulls in the next X posts. They could be deleted from the database if the account isn't followed after X hours/ days. Full thread reconstruction could be handled in a similar manner.@rook@chopsstephens
       
 (DIR) Post #APZ6x5uOdKZralcQ2S by rook@hulvr.com
       2022-11-13T15:37:56Z
       
       0 likes, 0 repeats
       
       @strypey @chopsstephens There are a few things involved here.We are departing from the incident (which is fine), which presumably had little to do with what was already in the db. That is just to say if this process was in place, purging wouldn't have helped anything in that case. Though db size is a scaling concern.The other is privacy. There is some historical sensitivity about actively fetching posts from other instances. I don't think it has been treated rigorously though.