[HN Gopher] We reduced the cost of building Mastodon at Twitter-...
___________________________________________________________________
We reduced the cost of building Mastodon at Twitter-scale by 100x
Author : tekacs
Score : 589 points
Date : 2023-08-15 17:54 UTC (5 hours ago)
(HTM) web link (blog.redplanetlabs.com)
(TXT) w3m dump (blog.redplanetlabs.com)
| MisterBastahrd wrote:
| What one finds useful from a web application and what the web
| application actually is are usually two entirely different
| things.
|
| I work in marketing automation, and I guess I have in one way or
| another my entire career. The clients who need to use the
| platform to communicate with their own clients over social
| networking may never touch our print delivery system, but that
| doesn't mean that print delivery doesn't exist or isn't
| important.
|
| If you are unwilling to recreate the totality of the application
| in terms of functionality, then you are lying if you say that you
| have recreated it.
| nathanmarz wrote:
| Not sure what you're talking about as we implemented the
| entirety of Mastodon from scratch.
| lionkor wrote:
| Very interesting, looking forward to reading the docs once they
| come out.
|
| Why Java?
| nathanmarz wrote:
| It's one of the most widely used/known programming languages in
| the world.
|
| It's a Java API so any JVM language can be used (Clojure,
| Scala, etc.).
| rubiquity wrote:
| Noticeably missing are any details about concurrency control and
| replication or recovery protocols. A Twitter clone is one thing
| but any sort of application needing ACID Transactions is a whole
| other beast.
| nathanmarz wrote:
| All data on Rama is replicated automatically with a
| configurable "replication factor". Data written to Rama is not
| made visible until it's successfully replicated. The
| documentation we're releasing next week includes a page going
| into detail in how this works.
| Pxtl wrote:
| I've seen many people describe frameworks like this - you know,
| first you have the slow back-end event-driven master database
| that you don't query live against, then you've got eventual-
| consistency flows against the various data-warehouses and data-
| stores and partitioned sharded databases in useful query-friendly
| layouts that you actually read live from... and I never see it
| clearly explained: how do you read a change back to the user
| literally just after they made the change? How do you say "other
| views eventual-consistency is fine but for this view of this bit
| of info we need it updated _now_ ".
|
| This write-up is very detailed but I couldn't find that
| explanation.
| lossolo wrote:
| You have the option to track the latest update time and, during
| the minute immediately following this update, direct all reads
| to come from the leader. Additionally, you could oversee the
| replication lag among followers and block queries on any
| follower that lags more than a minute behind the leader.
|
| For the client, it's feasible to retain the timestamp of its
| most recent write. In this way, the system can ensure that the
| replica responsible for any reads related to that user
| incorporates updates at minimum up to that recorded timestamp.
| If a replica isn't adequately current, the read can either be
| managed by another replica or the query can wait until the
| replica catches up. The timestamp might take the form of a
| logical timestamp, signifying the order of writes (e.g., log
| sequence number), or it could be based on the actual system
| clock, where synchronized clocks become vital.
|
| When your replicas are spread across multiple datacenters--
| whether for user proximity or enhanced availability--there's an
| added layer of complexity. Requests requiring the leader's
| involvement must be directed to the datacenter housing the
| leader.
| chubot wrote:
| Yeah definitely, these ideas always sound very appealing to me,
| in theory -- I almost wonder why nobody has built it before
|
| e.g. they mention "event sourcing" and "materialized views" in
| the post -- sounds good
|
| But I thought I heard from a few people who were like "we
| ripped event sourcing" out of our codebase and so forth
|
| And yeah your question is an obvious good one, and the Reddit
| answer of "write through cache" ... is less than satisfying to
| me
|
| I FREQUENTLY have the problem where I reload the page and
| Reddit shows me stale data. It's SUPER buggy.
|
| ---
|
| Anyway I definitely look forward to hearing people try this and
| what their longer term impressions are !
|
| I basically want to know what the tradeoffs are -- it sounds
| good, but there are always tradeoffs
|
| So is the tradeoff "eventual consistency" ? What are the other
| tradeoffs?
| chubot wrote:
| Hilariously, I went to edit the above comment, and HN was
| overloaded. Then it served me three or four 500's, AND it
| served me stale data in between
|
| I was pissed off that I would have to type my comment again,
| but actually it did save it, and refreshing worked.
|
| From what I understand Hacker News is architected more in-
| memory, on one big box ... Perhaps similar to the event
| sourcing model
|
| (not knocking hacker news -- it's generally a very fast site,
| MUCH better than Reddit. Just that scaling beyond a single
| machine is difficult and full of gotchas )
| nathanmarz wrote:
| When you step back and consider the incredible amount of
| manpower and resources that have been put into these
| applications, it's amazing how buggy these applications are.
| To put it simply, they're buggy because the underlying
| infrastructure and techniques used to build them are so
| complex that the implementation is beyond the realm of human
| understanding.
|
| The way applications are built, and have been built since
| before I was born, is by combining together potentially
| dozens of narrow tools together: databases, computation
| systems, caches, monitoring tools, etc. There has never been
| a cohesive model capable of expressing arbitrary backends
| end-to-end, and every application built has to be twisted to
| fit onto the existing narrow pieces.
|
| Rama is a lot more than just "event sourcing" and
| "materialized views". Those are two concepts at its
| foundation, but the real breakthrough is being that cohesive
| model capable of expressing diverse backends in their
| entirety. It took me more than five years of dedicated
| research to discover this model, and it was extremely
| difficult.
| chubot wrote:
| Yes, I 100% agree with you. I would like something like
| this to succeed, and agree the problem is real.
|
| But what are the tradeoffs? There's nothing that comes with
| 100x benefit with no tradeoffs
|
| (side note: I worked on Google Code for a short while in
| 2008, concurrent with Github's founding ... I think Github
| moved a lot faster in a large part because they weren't
| dealing with distributed systems at first -- they had a
| Rails app, a database, and RAID disks, and grew it from
| there. We had BigTable and perf reviews :-P )
|
| Eventual consistency is probably one?
|
| Can I specify that comment editing is correct and ACID,
| while likes/upvotes are eventually consistent? (No is a
| fine answer, these problems are hard)
|
| I read through much of the doc, and don't see a mention of
| the word "consistency" at all, which seems like an
| oversight for something that is unifying what would be in a
| database with computation.
| nathanmarz wrote:
| Rama is a much broader platform than a database, so the
| consistency semantics you get depend on how you use it.
| When using Rama, you're not mutating indexes directly
| like you do with a database, but adding source data that
| then gets materialized into any number of indexes.
|
| You get read-after-write consistency for any PStates in a
| streaming ETL colocated with the depot you appended to.
| This is if you do the depot append with "full acking",
| which coordinates its response with the completion of
| colocated streaming ETLs. If you append at a lower level
| of acking, then you get eventual consistency on those
| PStates at the benefit of lower latency appends.
|
| Microbatching is always eventually consistent as
| processing is asynchronous and uncoordinated to depot
| appends. Microbatching is higher thorughput than
| streaming and has simpler fault-tolerance semantics.
|
| You'll be able to read a lot more about this when we
| release the docs next week.
| jokethrowaway wrote:
| You can hack it and optimistically render the data you know
| about because your client created it - on the frontend, at no
| additional cost.
| hot_gril wrote:
| This is usually what I do. Don't even want to wait for an
| HTTP roundtrip for some of these, e.g. "liking" a post should
| fill in the heart icon or whatever instantly.
|
| One famous example of this going to far: Mac Mail app used to
| play a whoosh sound when your email is actually sent. They
| changed it to whoosh instantly no matter what. Given how
| often an email might fail to send or get delayed, this meant
| an actually useful indication of "great, your thing was sent,
| you can close your laptop now" was rendered useless.
| ceejayoz wrote:
| > Don't even want to wait for an HTTP roundtrip for some of
| these, e.g. "liking" a post should fill in the heart icon
| or whatever instantly.
|
| HN does this, and on slow days, about half of my upvotes
| don't go through.
| sitzkrieg wrote:
| yea but does hn have any client side js?
| [deleted]
| codetrotter wrote:
| Yeah, a very small amount so that clicking the upvote
| button does not need to reload the whole page
| hot_gril wrote:
| Can confirm, HN relies on client-side JS for voting and
| collapsing, but view/post/edit/delete don't need it.
| hot_gril wrote:
| Messaging apps often have a checkmark to indicate the
| message actually went to the server, and maybe another
| checkmark to indicate it was received on the other end.
| Maybe HN needs an icon indicating that your vote went
| through.
| newaccount74 wrote:
| Make the arrows grey to indicate the click registered,
| make them disappear to indicate the server successfully
| registered the vote?
| hot_gril wrote:
| Yeah, it's easy enough that I was able to do it in the
| web inspector in a minute (artificial 1s network delay
| added): https://s11.gifyu.com/images/ScPMI.gif
| wizofaus wrote:
| You actually check your list of upvoted comments?
| ceejayoz wrote:
| No, I just notice it when I come back to the thread later
| in the day and a bunch of comments I know I upvoted are
| back to normal.
| [deleted]
| sixo wrote:
| I imagine you get some UUID back from your write, and
| effectively "block" until you see it committed to the event
| stream. The intent of such a system is certainly for the read-
| after-write latency to be not much longer than a traditional
| RDBMS. (This is roughly what the RDBMS is doing under the hood
| anyway.) Probably you can isolate latency-critical paths so
| they don't get stuck behind big stream processing jobs.
|
| The advantage of the overall architecture is that nearly all
| application functionality (for something like a social network)
| can tolerate much higher latency than an RDBMS, so you really
| want to have architectural building blocks that let you
| actually _use_ this headroom.
| jedberg wrote:
| The short answer is write-through cache.
|
| You write the update directly to the cache closest to the user
| and into the eventually consistent queue.
|
| We did this at reddit. When you make a comment the HTML is
| rendered and put straight into the cache, and the raw text is
| put into the queue to go into the database. Same with votes. I
| suspect they do this client side now, which is now the closest
| cache to the user, but back then it was the server cache.
| jitl wrote:
| Rama should bundle a write-through cache! Another in-memory
| JVM cluster thingamabob (Apache Ignite) used to propose
| write-through caching as it's primary selling point:
| https://ignite.apache.org/use-cases/in-memory-
| cache.html#:~:....
|
| Or, maybe their pitch is that the streaming bits are so fast,
| you can just await the downstream commit of some write to a
| depot and it'll be as fast as a normal SQL UPDATE.
| nathanmarz wrote:
| Rama is extremely fast, as you can see for yourself by
| playing with our Mastodon instance.
| jedberg wrote:
| It's fast until it's not. Making a post and then hitting
| reload and not seeing it can be very jarring for the
| user. Definitely something to think about.
| nathanmarz wrote:
| What do you mean? Every post I do shows up instantly.
|
| Reloading the page from scratch can be slow due to
| Soapbox doing a lot of stuff asynchronously from scratch
| (Soapbox is the open-source Mastodon interface that we're
| using to serve the frontend). https://soapbox.pub/
| squeaky-clean wrote:
| I think the concern is will this still be true if
| Mastodon reaches Twitter scale?
| nathanmarz wrote:
| Rama is scalable. So as your usage grows, you add
| resources to keep up. Scaling a Rama module is a trivial
| one-line command at the terminal.
|
| Rama's built-in telemetry provides the information you
| need to know when it's time to scale.
| jitl wrote:
| is there a way to guarantee reading your own writes from
| a client perspective?
| nathanmarz wrote:
| Yes. Depot appends by default don't return success until
| colocated streaming topologies have completed processing
| the data. So this is one way to coordinate the frontend
| with changes on the backend.
|
| Within an ETL, when the computations you do on PStates
| are colocated with them, you always read your own writes.
| teacpde wrote:
| It makes sense, but wouldn't the write be slow?
| Especially when you have many streaming pipelines.
| nathanmarz wrote:
| That's part of designing Rama applications. Acking is
| only coordinated with colocated stream topologies -
| stream topologies consuming that depot from another
| module don't add any latency.
|
| Internally Rama does a lot of dynamic auto-batching for
| both depot appends and stream ETLs to amortize the cost
| of things like replication. So additional colocated
| stream topologies don't necessarily add much cost (though
| that depends on how complex the topology is, of course).
| reilly3000 wrote:
| DynamoDB's DAX cache espouses the same approach.
|
| I have to say in my ~12 years as an active Redditor I can't
| recall a time where I saw any real state issues, even with
| rapidly changing votes, etc. Bravo!? Now that we're beyond
| the days of molten servers, I have to say its overall
| reliability in the face of massive spiky traffic is quite a
| feat.
| endisneigh wrote:
| Really? I see this all the time even now.
| squeaky-clean wrote:
| In Nathan Marz's (the article author) book, Big Data, he
| describes this and calls it the Speed Layer. I haven't fully
| finished the article yet, but the components it's describing
| seem to be equivalent to what he calls the Batch Layer and
| the Serving Layer in his book.
|
| But I'm kind of getting the impression this works without any
| speed layer and is expected to be fast enough as-is.
| nathanmarz wrote:
| Rama codifies and integrates the concepts I described in my
| book, with the high level model being: indexes =
| function(data) and query = function(indexes). These
| correspond to "depots" (data) , "ETLs" (functions),
| "PStates" (indexes), and "queries" (functions).
|
| Rama is not batch-based. That is, PStates are not
| materialized by recomputing from scratch. They're
| incrementally updated either with stream or microbatch
| processing. But PStates can be recomputed from the source
| data on depots if needed.
| kulahan wrote:
| This explains so many bugs I came across on Reddit. I guess
| it works, but man I dislike this implementation.
| j45 wrote:
| Just on system design alone this was enjoyable to read.
|
| Clever architecture can help as much if not more than clever
| coding especially when keeping it simple but scalable is needed.
| [deleted]
| afro88 wrote:
| Looks amazing and incredibly smart. But I found the LOC and
| implementation time comparisons to Twitter and Threads very
| disingenuous. It makes me wonder what other wool will be pulled
| over our eyes with Rama in future (or important real world
| details missed / future footguns).
|
| Still super impressive. Reminds me of when I discovered Elixir
| while building a social-ish music discovery app. Switching the
| backend from Rails to Elixir felt like putting on clothes that
| actually fit after wearing old sweats. Rama looks like a similar
| jump, but another layer up, encompassing system architecture.
| softwaredoug wrote:
| It's hard to construct a true randomized control trial for
| software engineering methods. People make many claims about
| programming paradigms or tools hard to validate.
|
| It's also unsure what we would compare a tool like this to. I
| doubt you could just say "compare it to Rails" given how
| frameworks like rails are bound to specific data models, and
| most realistic applications. You'd have to compare it to some
| other opinion about how to wire together different data
| structures.
| StephenAmar wrote:
| +1 the comparisons are not great. How much engineer-hours did
| it take to build Rama itself?
|
| The numbers they got for Twitter likely include the time it
| took to build their infrastructure, common libraries (like
| finagle,...)
| paxys wrote:
| Don't forget their entire ads system, data
| processing/analytics, monitoring, customer support, payments,
| internationalization. They have replicated _at most_ a tiny
| bit of Twitter 's core infra for sending Tweets. The company
| itself does a lot more than that.
| jacoblambda wrote:
| Honestly I'm willing to accept the number they gave since the
| author (Nathan Marz) was one of the lead/founding devs for
| twitter's streaming compute backend in the past.
| jauntywundrkind wrote:
| > _The instance has 100M bots posting 3,500 times per second at
| 403 average fanout to demonstrate its scale._
|
| Mastodon has to send messages to each instance with a recipient.
| That server can then fan out to all it's subscribers. The way
| this point is worded makes me think all the bits are on just a
| single instance, meaning all the fan out can be dealt with
| internally without having to do any server-to-server at all.
|
| That is a fair comparison to Twitter, which is single instance.
| But it sounds like a much reduced ambition versus the task
| Mastodon has to do.
| nathanmarz wrote:
| We implemented federation fully exactly as you described.
| nicechianti wrote:
| [dead]
| stuaxo wrote:
| Lovely, I could see this paradigm spreading to other languages,
| something was definitely needed.
| kyle-rb wrote:
| Kinda disappointed by the simulation, where are all the viral
| posts?
|
| I've been digging around for a while and haven't found any posts
| with more than 20 faves. The accounts I've found with ~1 million
| followers have little to no engagement. I want to see how a post
| with a million faves holds up to the promises of "fast constant
| time".
|
| I'm especially curious about these queries -- fave-count and has-
| user-faved -- since a couple years ago Twitter stopped checking
| has-user-faved when rendering posts more than a month or so old,
| so I imagine it was expensive at scale.
| nathanmarz wrote:
| The load generator generates boosts/favorites for a subset of
| posts that are randomly picked to be "popular". However, since
| the rate of posts is so high even individual posts picked to be
| "popular" are only getting ~70 reactions.
|
| Tracking reactions is considerably easier than timeline fanout
| though, as a favorite does a small handful of things (updates
| set of users favoriting a status and sending a notification),
| while fanout has to do an operation on every follower (403
| operations on average, sometimes up to 22M).
|
| The code getting the favorite count for a status looks like:
| .localSelect("$$statusIdToFavoriters",
| Path.key("*statusId").view(Ops.SIZE)).out("*numFavorites")
|
| Because the nested set is subindexed, that's an extremely fast
| operation (looking at our telemetry, about 0.05ms).
|
| Determining "has-user-faved" looks like:
| .localSelect("$$statusIdToFavoriters",
| Path.key("*statusId").view(Ops.CONTAINS,
| "*accountId")).out("\*hasFavorited")
|
| The API server doesn't do these queries individually, which
| would be two roundtrips. It does them together in a query
| topology along with fetching other needed information (like
| number of boosts, number of replies, "has boosted", "has muted
| this status", etc.).
| kyle-rb wrote:
| Thanks for the response, I'm still curious about the details
| of the subindexing and how that scales. I'll be keeping an
| eye out for the release!
| mping wrote:
| Congrats on the (kinda) launch. I was curious to see what you
| guys were up to. The blog post is pretty detailed, and with good
| insights. Reducing modern app development complexity to mixing
| data structures sounds like a good abstraction. I'm sure you
| thought really hard about the building blocks of Rama and you
| know your problems better than most of the hn crowd.
|
| Now, the really hard part becomes selling. If companies start
| using your product to get ahead, that will be the real proof,
| otherwise its "just" tech that is good on paper.
|
| On a side note, did you guys got any inspiration from clojure? I
| see lots of interesting projects propping up from clojure
| people...
|
| Best of luck!
| nathanmarz wrote:
| Rama is written in Clojure :)
| mark_l_watson wrote:
| Nathan, we guessed that it was written in Clojure :-)
|
| Interesting to finally see the announcement.
| cutler wrote:
| "Rama is programmed entirely with a Java API".
| nathanmarz wrote:
| The customer API in Java, and the implementation of that
| API is in Clojure.
| fiddlerwoaroof wrote:
| Will there be a first-party Clojure wrapper? Or, would
| the expectation be that users would use Java interop?
| nathanmarz wrote:
| We're not releasing one as we don't have the bandwidth
| right now to maintain and document another API. That
| said, making a Clojure wrapper around the Java API should
| be pretty easy.
| Huhuhn wrote:
| Would this framework also be useful for building a Lemmy
| instance?
| endisneigh wrote:
| I don't really see the point of the comparison. They should show
| something you could only make with Rama or show how much faster
| it is to iterate with Rama.
|
| Saying this is 100 or even a million times cheaper is like saying
| taking a picture of Sistine chapel and printing out copies is a
| trillion times cheaper than making it originally.
|
| Many of us on this site could make a number of products very
| efficiently and cheaply given a static and fixed set of
| requirements as well as an existing implementation for reference.
|
| That being said it was a very detailed post, so kudos for that,
| but it's far too vague to be actionable. Why not just release the
| code and post simultaneously instead of just bragging about how
| little code was required?
| miki123211 wrote:
| Is this just me, or does the code in the post feel like they've
| implemented what should have been a new programming language on
| top of Java?
|
| Their "variables" have names that you have to keep as Java
| strings and pass to random functions. If you want composable
| code, you don't declare a function, you call .macro(). For
| control flow and loops, you don't use if and for, but a weird
| abstraction of theirs.
|
| I feel like this code could have been a lot simpler if it was
| written in a specialized language (or a mainstream language with
| a specialized transpiler and/or Macro capabilities.)
|
| I'd quote the old adage about every big program containing a slow
| and buggy implementation of Common Lisp, but considering that
| this thing is written in Clojure, the authors have probably heard
| it before.
| phillipcarter wrote:
| In the year of our lord 2023, people are still launching immature
| products with "we built a clone of a tiny subset of Twitter" as
| their use case? Come on. Twitter is huge because they have to
| support a huge number of use cases. Using this proprietary
| framework won't magically make complex use cases go away.
| DoesntMatter22 wrote:
| "We recreated a service from 2007 and it's so much faster!"
| nathanmarz wrote:
| As we mentioned in the article, Instagram just spent ~25
| person-years building Threads which is a barebones clone of
| Twitter. Not only did we build our instance 30x faster than
| that, we have way more features like federation, hashtag
| follows, polls, DMs, global timelines, and more. And
| Instagram didn't start from scratch as Instagram/Meta already
| had infrastructure powering similar products.
|
| https://www.washingtonpost.com/technology/2023/07/29/meta-
| th...
| DoesntMatter22 wrote:
| You said this in another post.
|
| > That's why we're comparing it to the cost of Twitter's
| original consumer product.
|
| Plus, you cloned a preexisting architecture. FB wrote
| theirs from scratch. Not apples to apples. This is much
| easier.
| [deleted]
| sixo wrote:
| The comparisons to Twitter are completely goofy, but the
| architecture is nothing short of enlightened. Nice work.
| ltr_ wrote:
| i always had this question: how realistically is to, having an
| standard spec and interoperable protocols, for toxic apps of big
| international tech companies that provides """services""", so
| instances of implementations can be maintained by municipalities
| or local tech business and talent with 100x less employees and
| money? what policies should be in place to achieve that? what
| would be the challenges? it would be better/healthier? is someone
| researching such things like transition to sustainable digital
| services? (sustainable in terms of local labor, privacy, economy,
| accountability, etc...)
|
| i mean if you think about this as public services not as a
| business, profit is secondary, and first is just to make the
| thing better and better for the users, no need for spying , no
| advertisement, no need for a rich piece of shit somewhere getting
| a piece of the money paid in your city for every taxi drive, food
| delivery or to give up privacy to a soulless/faceless entity just
| because you want to say something publicly or keep in touch with
| people. there is no disruption from their part, its just an old
| thing put on the internet, they are just in the middle of
| everyone's life, just sucking everything they can. is the actual
| state of affairs "efficient"?
|
| there must be fed up engineers and tech people everywhere with
| the sad state of IT industry.
| 5636588 wrote:
| The EU already has regulations in place regarding open banking
| with its Payment Services Directive. I'd imagine a similar
| framework could be applied to big social tech.
| NoahTheDuke wrote:
| Congrats! This looks super cool.
|
| Are there any plans for exposing a Clojure API? Given that it's
| implemented in Clojure, seems like it would be a natural fit.
| Interop with Java is nice but can be cumbersome compared to the
| more natural calling conventions and idioms (threading macros
| instead of `..` builder patterns, etc).
| nathanmarz wrote:
| Answered this in another comment:
| https://news.ycombinator.com/item?id=37138526
| NoahTheDuke wrote:
| Thanks.
| NoraCodes wrote:
| I would argue that this is not "a Mastodon instance", since it is
| not running Mastodon - other than that, very very neat work! I'm
| excited for that "Source Code" link to be live :)
| frandroid wrote:
| Mastodon-compatible or Mastodon clone... If it quacks and walks
| like a duck, surely it is part of the aviary family of
| microblogging services...
| mikae1 wrote:
| It was a good blog post title choice for making it to the front
| page of HN.
| nathanmarz wrote:
| We call it a "Mastodon instance" because we implemented the
| entire Mastodon API (https://docs.joinmastodon.org/api/). This
| is in addition to also implementing the ActivityPub API which
| Mastodon also implements for federation.
| CharlesW wrote:
| > _We call it a "Mastodon instance" because we implemented
| the entire Mastodon API..._
|
| Except "Mastodon instance" means an instance of Mastodon,
| which is open source. Whether or not it was intended to be
| deceptive (I'd think a group of smart people would know
| better), this personally left a bad taste in my mouth.
| mattl wrote:
| Mastodon-compatible would be better.
|
| Mastodon is the name of a piece of software as well as an
| API, a website, etc.
|
| Naming this stuff is hard but calling it a Mastodon instance
| would be more confusing.
| [deleted]
| rvnx wrote:
| I think it's smart from a legal perspective, because the team
| members seem to partially be coming from companies acquired by
| Twitter.
|
| So I guess, if you say "it's a Mastodon-clone", you cannot be
| accused of taking proprietary ideas from Twitter (this is just
| a guess, they know better).
|
| But technically very interesting and refreshing to see. I
| really like their approach. It feels they are innovative.
| ollien wrote:
| Yeah, I think this is just an ActivityPub server that supports
| the Mastodon extensions, right? I think we should embrace the
| fact that the federated world can be diverse, rather than just
| call everything "Mastodon"
| jauntywundrkind wrote:
| Mastodon has it's own API. It basically offers a very limited
| ActivityPub API too, but it's own API is very different.
|
| And it's a very slim ActivityPub inplementation. For example,
| I don't think you can do basic things like get an individual
| post in ActivityPub. This should be easy simple json-ld to
| get but it's just 404.
| https://www.w3.org/TR/activitypub/#retrieving-objects
| colatkinson wrote:
| Mastodon for sure supports fetching individual posts over
| ActivityPub. For example: curl -L -H
| 'accept: application/activity+json'
| 'https://mastodon.social/users/Gargron/statuses/18614983'
|
| It does have a bunch of stuff that isn't federated though,
| such as Like counts/collections. And of course it only
| implements the server-to-server (S2S) part of AP, not the
| client-to-server (C2S) part.
| pmlnr wrote:
| Mastodon. o. not a.
| Pxtl wrote:
| Yes, a "mastodon-like Fediverse instance running on our
| proprietary new application/data framework" sounds like a
| better description.
|
| And either way, I think the source code to their Mastodonlike
| will not be usable since it will be running on their Rama
| server framework.
| doublepg23 wrote:
| HN seems to be putting you through the wringer, I for one am
| excited you guys made this and plan to open source it- it looks
| like a fantastic project.
| [deleted]
| softwaredoug wrote:
| It's a massive ask, even if the platform was 100x better, for all
| developers to give up every programming language and database
| they've ever used to depend on a startups closed source platform
| for all functionality.
|
| It's hard enough trusting Google or Amazons cloud offerings won't
| change.
|
| It seems that's what they're proposing right? What am I missing?
| nathanmarz wrote:
| We're actually not asking anyone to give up anything. First
| off, it has a simple integration API (which you'll be able to
| see the details of next week) that allows it to seamlessly
| integrate with any other backend tool (databases, monitoring
| systems, queues, etc.). So Rama can be incrementally introduced
| into any existing architecture.
|
| Second, Rama has a pure Java API and is not a bespoke language.
| So no new language needs to be learned.
| yazzku wrote:
| What is the licensing of Rama? Is it libre/open?
| masklinn wrote:
| > We're keeping it closed-source for now.
| mikae1 wrote:
| _> Second, Rama has a pure Java API and is not a bespoke
| language. So no new language needs to be learned._
|
| Isn't Mastodon a Ruby On Rails application?
| theogravity wrote:
| The article says they re-wrote Mastodon from scratch
| (probably the backend piece). I'm guessing in Java.
| nathanmarz wrote:
| Yes, it's 100% written in Java.
| kpw94 wrote:
| > Rama can be incrementally introduced into any existing
| architecture
|
| Big if true (and if the opposite, of incrementally _removing_
| it also works). There have been similar platform efforts in
| past, such as https://news.ycombinator.com/item?id=20985429 .
| For that one, the "massive ask to give up every programming
| language and database they've ever used to depend on a
| startups closed source platform" seems like the biggest
| hindrance to adoption.
| softwaredoug wrote:
| I can imagine this being really useful from the ground up.
| Because it looks like it wants to be the source of truth,
| with different views on the data.
|
| It's hard to imagine it for a complex legacy application
| without having lots of added complexity. It wants to be the
| unifying programming model for the application. It would seem
| like running with two RDMS sources of truth simultaneously.
|
| It's like the xkcd "there are 12 ways of doing X, let's
| create a standard to unify them" now there are 13 ways
| fragmede wrote:
| That's xkcd 927.
|
| 9, which is 3^2, and 27, which is 3^3. Or 900 is Yoda's
| age, and 27 which is the 27 club of musicians who committed
| suicide.
| mkl wrote:
| Most members of the 27 Club didn't die by suicide:
| https://en.wikipedia.org/wiki/27_Club
| erlend_sh wrote:
| So Rama-powered apps need to be written in Java? Or will any
| JVM language work?
|
| And the Rama core will remain closed-source? That part seems
| like the toughest sell of all, at a time when the vast
| majority of developer tooling and backends are open source or
| at the very least source-available.
| roguas wrote:
| Since all jvm languages usually have "ffi" to
| javaapis/javalibs, I would say yes.
| nathanmarz wrote:
| Any JVM language should work. We've built modules with
| Clojure.
|
| We're keeping it closed-source for now.
| vanviegen wrote:
| > We're keeping it closed-source for now.
|
| Rama sounds interesting to me for my 'next big project',
| but I'd not even consider building it on top of a closed
| core. I think this is a pretty common sentiment in these
| circles.
|
| I understand building an OSS business is not easy either.
| But perhaps there is some middle of the road that you can
| walk?
|
| - A contractual obligation to open source all (now
| current) code a couple of years in the future? - Or an
| almost-OSS license that makes life difficult for
| competing cloud providers, like
| https://www.mongodb.com/licensing/server-side-public-
| license... ?
| clusterhacks wrote:
| I'm excited to see the docs for Rama. But I am also a little
| scared of the comment " I came to suspect a new programming
| paradigm was needed" from Nathan.
|
| It's not so much that I think the comment is wrong or anything,
| but rather that it seems so similar to what I have heard in the
| past from power-lisp (or Clojure in this case) super-smart
| engineers.
|
| I feel like we have reached a point in software development where
| "better" paradigms don't necessarily gain much adoption. But if
| Rama wins in the marketplace, that will be interesting. And I am
| quite excited to see what a smart tech leader and good team have
| been able to grind out given a years-long timeframe in this
| programming platform space . . .
| nathanmarz wrote:
| This is why we exposed Rama as a Java API rather than Clojure
| or our internal language (which is defined with Clojure macros,
| so it's technically also Clojure). Rama's Java dataflow API is
| effectively a subset of our internal language, with operations
| like "partitioners" being implemented using continuations.
| cutler wrote:
| Just curious, what advantage over Clojure did reverting to
| "pure Java" give you? Perf or something else?
| mwcampbell wrote:
| Presumably approachability for programmers that would be
| scared away by Clojure. Smart marketing move.
| teakie wrote:
| [dead]
| riffic wrote:
| the group involved here may want to be mindful of the Mastodon
| gGmbH trademarks. Using the Mastodon logo on redplanetlabs.com to
| pitch a reimplementation of ActivityPub might be seen as
| infringing.
|
| https://joinmastodon.org/trademark
|
| removed part about the mastodon subreddit since this is clearly
| not about the Mastodon software per se.
| hedora wrote:
| Any trademark case is going to have to prove that a reasonable
| person would think this article is from Mastodon gGmbH, or is
| talking about their product "Mastodon".
|
| The top of the page reads _" Red Planet Labs"_, the title of
| the article is _" How we reduced the cost of building Twitter
| at Twitter-scale by 100x"_ and the first line of the article is
| _" We built a Twitter-scale Mastodon instance from scratch in
| only 10k lines of code."_
|
| No reasonable person is going to think that this article has
| anything to do with the official Mastodon software, so there's
| no trademark issue here.
| throwaway7382 wrote:
| Their big reveal after 10 years is "keep waiting".
|
| Move along, nothing to see here.
| frandroid wrote:
| If you're going to bother creating a throwaway, you should make
| a more impactful/meaningful statement...
| sandGorgon wrote:
| nice! is this is cloudflare worker & block storage built in Java
| ?
| dataangel wrote:
| I do C++ backend work in a non-web industry and this entire post
| is Greek to me. Even though this is targeted at developers, you
| need a better pitch. I get "we did this 100x faster" but the
| obvious followup question is "how" but then the answer seems to
| be a ton of flow diagrams with way too many nodes that tell me
| approximately nothing and some handwaving about something called
| P-States that are basically defined to be entirely nebulous
| because they are any kind of data structure.
|
| I'm not saying there's nothing here, but I am adjacent to your
| core audience and I have no idea whether there is after reading
| your post. I think you are strongly assuming a shared basis where
| everybody has worked on the same kind of large scale web app
| before; I would find it much more useful to have an overview of,
| "This what you would usually do, here are the problems with it,
| here is what we do instead" with side by side code comparison of
| Rama vs what a newbie is likely to hack together with single
| instance postgres.
| ldayley wrote:
| Nathan Marz created Apache Storm, coauthored the book "Big
| Data", and founded an early real-time infrastructure team at
| Twitter. It's likely the 'curse of knowledge' of working on
| this specific problem for so long is responsible for the unique
| and/or unfamiliar style of communication here.
|
| EDIT: Specifics
| HaZeust wrote:
| ... Maybe the post isn't targeted to your audience at all? How
| is "C++" and "non-web work" adjacent to web work with web
| launguage audiences?
| rollcat wrote:
| OP did not specify what their industry actually is. I've been
| doing "web work" for 17 years and I'm sharing their concern:
| where's the TL;DR for this? If this somehow can make me 100x
| as productive, how about starting with a "hello world"
| example that shows me how is it different from pip install
| django, etc?
| slim wrote:
| he's a developer and curious about the subject. Since it's a
| blog post, not a scientific paper, the fact that he did not
| understand could be a communication failure. I think he's
| being helpful
| sdwr wrote:
| In a typical architecture, the DB stores data, and the backend
| calls the DB to make updates and compile views.
|
| Here, the "views" are defined formally (the P-states), and
| incrementally, automatically updated when the underlying data
| changes.
|
| Example problem:
|
| Get a list of accounts that follow account 1306
|
| "Classic architecture":
|
| - Naive approach. Search through all accounts follow lists for
| "1306". Super slow, scales terribly with # of accounts.
|
| - Normal approach. Create a "followed by" table, update it
| whenever an account follows / unfollows / is deleted / is
| blocked.
|
| Normal sounds good, but add 10x features, or 1000x users, and
| it gets trickier. You need to make a new table for each
| feature, and add conditions to the update calls, and they start
| overlapping... Or you have to split the database up so it
| scales, but then you have to pay attention to consistency, and
| watch which order stuff gets updated in.
|
| Their solution is separating the "true" data tables from the
| "view" tables, formally defining the relationship between the
| two, and creating the "view" tables magically behind the
| scenes.
| ethbr1 wrote:
| So... at a high level, early React for data? In other words,
| letting a framework manage update dependency graph tracking,
| and then cascading updates through its graph in an optimized
| manner to enhance performance?
|
| Obviously, with tons of implementation difficulties and
| details, and not actual graph structures, but as a top level
| analogy.
| endisneigh wrote:
| I read their post and honestly it's not really that much
| different than just materialized views in a regular database
| plus async jobs to do the long running tasks.
|
| It's a ridiculous amount of fluff to describe that. Not to
| mention it's proprietary and only supports the JVM and
| doesn't integrate with the tons of tooling designed about
| RDBMS unless you stream everything to them, defeating the
| purpose.
|
| What really irks me is that they go on and on bragging about
| the low LoC count and literally show nothing complete. They
| should've held on this post and released it simultaneously
| with the code.
| sdwr wrote:
| This is all armchair for me, but I think they have
| containers and sharding built in as well, which is the
| other half of the puzzle when it comes to scaling.
| endisneigh wrote:
| Yes, but there are plenty of NewSQL that support views
| and offer all of that too. Yugabyte, Cockroach, TiDB and
| that's just off the top of my head and open source. If we
| count proprietary then you have Fauna, Cloud Spanner and
| more I'm sure.
| sixo wrote:
| The difference is that the materialized-view logic lives
| naturally in the application code; there's no step where
| they go out of the DB to do computations and then reinsert.
|
| Once SQL materialized views aren't enough, you might do
| this by replicating your database into Kafka, implementing
| logic in Flink or something, and reinserting into the same
| DB/Elasticsearch/etc. Very common architecture. (Writ
| small, could also use a queue processor like RabbitMQ.)
|
| Their approach is to instead--apparently--make all of these
| first-class elements of the same ecosystem, not by "putting
| it all in the database", but by putting the database into
| the application code. Which seems wild, but colocates data,
| transformation, and view.
|
| Seems like it would open up a lot of cans of worms, but if
| you solve those, sounds great.
| leonidasv wrote:
| IIUIC, the most significant difference from a materialized
| view is that the Rama infrastructure recompute only the
| changed data by checking the relationship between fields,
| while a traditional materialized view recomputes the whole
| table?
| dustingetz wrote:
| Summarizing, now edited down with some editorializing for
| clarity:
|
| What is it? build web-scale reactive backends with an expressive
| java dataflow API. Instead of a database you develop your own
| custom app-specific indexes which are reactive, distributed and
| durable. It's like event sourcing and materialized views but
| integrated in a linearly scalable way.
|
| > _I cannot emphasize enough how much interacting with indexes as
| regular data structures instead of magical "data models"
| liberates backend programming_
|
| > _It allows for true incremental reactivity from the backend up
| through the frontend. ... enable UI frameworks to be fully
| incremental instead of doing expensive diffs to find out what
| changed._
|
| Ok, so in my mind I am positioning this against Materialized /
| differential dataflow, whose key primitive is a efficient
| streaming incremental join that works across very large
| relational tables. Materialized makes SQL reactive, Rama gives
| you a java dataflow DSL for developing purpose-built reactive
| database indexes.
|
| How it works? 4 concepts: Depot, ETLs, PState, Query
|
| Depots: "distributed, durable, and replicated logs of data."
| [Event streams?] "like Kafka except integrated" "All data coming
| into Rama comes in through depot appends."
|
| ETLs: data arrives via depots, and is ETLed to PStates via "a
| Java dataflow API for coding topologies that is extremely
| expressive". "Most of the time spent programming Rama is spent
| making ETLs."
|
| PStates seem like reactive data structures that are also
| durable/replicated, these are meant to supersede your database
| and indexes, letting you build custom purpose-built indexes that
| contain 100M elements:
|
| > _"partitioned states" are how data is indexed in Rama ...
| Unlike existing databases, which have rigid indexing models (e.g.
| "key-value", "relational", "column-oriented", "document",
| "graph", etc.), PStates have a flexible indexing model. In fact,
| they have an indexing model already familiar to every programmer:
| data structures. A PState is an arbitrary combination of data
| structures. ... nested data structures can efficiently contain
| hundreds of millions of elements. For example, a "map of maps" is
| equivalent to a "document database", and a "map of subindexed
| sorted maps" is equivalent to a "column-oriented database". Any
| [composition] is valid - e.g. you can have a "map of lists of
| subindexed maps of lists of subindexed sets"._
|
| Query: once you develop PStates to aggregate relevant data into a
| custom index of the right ... shape?, query seems sorta like
| GraphQL selectors over your custom index:
|
| > _Queries in Rama take advantage of the data structure
| orientation of PStates with a "path-based" API that allows you to
| concisely fetch and aggregate data from a single partition_
|
| > _"query topologies" ... real-time distributed querying and
| aggregation over an arbitrary collection of PStates. These are
| the analogue of "predefined queries" in traditional databases,
| except programmed via the same Java API as used to program ETLs
| and far more capable._
| yayitswei wrote:
| For context, nathanmarz created what is now Apache Storm, which
| is used for stream processing at some of the world's largest
| companies, so he knows a thing or two about scale.
| primitivesuave wrote:
| The "N bots posting X times/second" isn't a very meaningful
| statistic. A system's reliability is mostly characterized by its
| performance under stress.
| ceejayoz wrote:
| Headline: "building Twitter at Twitter-scale"
|
| Article: "building Mastodon at sub-Twitter-scale"
| hinkley wrote:
| We have twitter at home!
| dang wrote:
| Thank - I've changed the title to be consistent with what the
| article says.
| messe wrote:
| Minor gripe, but there's a misspelling in the title: it
| should read Mastodon not Mastadon.
| dang wrote:
| Oops! Fixed now. Thanks to you, Fabricio20, and riffic.
| Fabricio20 wrote:
| Hey dang, I think you have a typo in the title.. says
| Mastadon instead of Mastodon!
| riffic wrote:
| masto not masta
| Kiro wrote:
| That's not the title of the article and also not what the
| article says. I would be really pissed if you editorialized
| the title of my article like that.
| dang wrote:
| I'm happy to correct it if anyone suggests a better one.
| The intention is to find a neutral title that accurately
| reflects what the article itself is saying.
|
| We've learned that when an article's original title
| generates complaints like
| https://news.ycombinator.com/item?id=37137317, the thread
| is likely to get derailed by shallow arguing about the
| title. It's in both the author's interest and the
| community's for us to nip that in the bud by (1) putting an
| accurate and neutral title at the top (preferably using
| representative language from the article itself), and (2)
| marking the title complaint offtopic since it no longer
| applies. These steps nudge the thread toward discussing the
| article's content rather than merely its title.
| Kiro wrote:
| Alright, sounds reasonable. I think the problem here is
| that the author specifically says (in a sibling comment)
| that the point is not Mastodon and now it's in the title.
| Maybe they're fine with it though.
| dang wrote:
| I'm no expert and definitely get things wrong - we only
| skim things and make a first crack at an attempt, and
| then rely on other people to refine it. If Nathan or
| someone else wants to suggest a more accurate and neutral
| title, we can do that - the goal is simply to clear the
| discussion space for something more interesting than
| title fever (https://hn.algolia.com/?dateRange=all&page=0
| &prefix=true&que...). But now I'm repeating myself!
| nathanmarz wrote:
| Actually if you read the article you can see we tested way
| above Twitter-scale. We can easily run this instance at full
| Twitter-scale by just paying for more servers.
|
| The point isn't the Mastodon instance, but rather that Rama
| enabled us to build it at scale with in a tiny amount of code
| and time.
| ceejayoz wrote:
| Mastodon and Twitter don't do the same amount of work per
| post. Mastodon doesn't have a recommendation engine, they
| don't have an advertising engine, they don't scan every post
| for CSAM, there's no global search, etc. (Some of these
| things are good not to have, but they still _drastically_
| change the scope.)
|
| Claiming to have enabled significant scaling of a
| Mastodon/ActivityPub-compatible instance is fine. Claiming to
| have replicated Twitter on the cheap is, from the post, not
| accurate.
| nathanmarz wrote:
| That's why we're comparing it to the cost of Twitter's
| original consumer product. As a demonstration of Rama, we
| scoped this project to the entirety of Mastodon which is
| roughly equivalent to Twitter's original consumer product
| (actually, it's probably greater in scope with additional
| features like hashtag follows and more complex filter/mute
| capabilities).
|
| All those use cases you listed absolutely can be
| implemented with Rama, and Rama's extreme cost benefits
| would apply to those as well.
| [deleted]
| failuser wrote:
| Is there a breakdown of effort Twitter spent doing the mastodon-
| level service (serving a feed of the accounts you are subscribed
| to) vs everything else like ads, algorithmic feed, moderation,
| fighting spam, copyright claims, localization, GR, PR, safety,
| etc?
| sharms wrote:
| The performance on the example Mastodon instance is very
| responsive - almost anywhere I clicked loaded nearly instantly. I
| created an account and the only thing I found missing was it
| doesn't implement full text search unless my user was tagged, but
| that might be a Mastodon specific item.
|
| I think they have thought a lot about typical hard problems, such
| as having the timeline processing happen along side the pipeline,
| taking network / storage etc out of the picture. Nice work!
| nathanmarz wrote:
| That is indeed an intentional part of Mastodon's design, which
| we tried to be faithful to as much as possible. We originally
| implemented search across all statuses and had to reimplement
| it when we realized Mastodon is a little different.
| sitzkrieg wrote:
| did you ever consider starting from something already
| technically performant like pleroma or misskey?
| nathanmarz wrote:
| Well, we didn't start from anything as we implemented this
| completely from scratch. I believe Mastodon is much more
| widely used than those so it seemed like a better target
| for this.
| [deleted]
| sitzkrieg wrote:
| yea i misspoke, good distinction lol. certainly makes
| sense, thanks
| DigitalSea wrote:
| One of these posts. Dig into the numbers and claims, and you'll
| see that they're not building something anywhere near Twitter
| scale.
| whateverman23 wrote:
| ctrl+f "ads"
|
| ctrl+f "monetization"
|
| ctrl+f "moderation"
|
| ctrl+f "existing infrastructure"
|
| ctrl+f "personalization"
|
| etc etc
|
| Yeah about what I expect from a "we rebuilt twitter for cheap"
| post. There's no point to the comparisons with the Twitter
| codebase size/cost. It completely distracts from what is probably
| a perfectly fine project.
| jscottmiller wrote:
| That's a fair criticism - this isn't an apples-to-apples
| comparison. What I find interesting about this is the cost of
| running the service. Being able to run a twitter-like thing on
| a hundred or so large aws instances is neat and I'm sure that
| many folks here dream of that kind of efficiency at their day
| jobs, but I'm more excited about how this scales down. Can you
| run a community of a thousand or so posters on a micro or nano
| instance for a few bucks a month or less? At that scale and
| cost, donations should easily be able to cover hosting fees and
| you would surely be able to deputize enough mods to keep things
| civil (for whatever definition of civil your instance lands
| on). Ads, monetization, personalization are non-issues (well,
| not major issues) at that scale.
| 10000truths wrote:
| The point is that much of that should be unnecessary to sustain
| the service because hosting costs are significantly (presumably
| 100x) cheaper.
| raverbashing wrote:
| They deserve congrats for that since they built the load test to
| prove this
|
| Of course, for actual production use, there's probably a lot of
| things still, but this is a very nice works nonetheless
| nathanmarz wrote:
| I wouldn't call our instance a load test, as it's a legitimate
| instance available for anyone to use. It's very much
| production-grade.
| raverbashing wrote:
| This is what I'm calling load test:
|
| > The instance has 100M bots posting 3,500 times per second
| at 403 average fanout to demonstrate its scale.
| mdaniel wrote:
| I would not want to speak for raverbashing but I feel the
| same way: I actually can't tell if the bug is with soapbox or
| with your instance but clicking on the first link from your
| post practically locks up my browser due to _every single
| Toot_ getting swapped out "at twitter scale"
|
| If one clicks quick enough to jump to an actual post, it
| seems relatively static so it's hard to tell if the bots are
| deleting and recreating their posts or what. In true Xitter
| clone fashion, trying to view the Posts & replies from any
| one user is "sign in
|
| Anyway, all of this is not to detract from your framework
| announcement as much as to have you consider that it's
| perfectly fine to label that instance as a load test, that's
| a fine thing, but calling it a legitimate instance seems to
| be a potential source of confusion
| nathanmarz wrote:
| We did notice on a less powerful machines the browser
| getting overwhelmed with the rate of new content (even
| though we're only streaming 10/s instead of the full 3.5k/s
| actually happening on the backend). I don't know if the
| poor performance in this context is due to Soapbox, the
| browser, or just the hardware.
|
| To get a better feeling of Rama's performance on your
| hardware, I suggest registering an account which will allow
| you to poke around the whole platform. It takes just a
| couple seconds to register and we don't send any emails.
| FridgeSeal wrote:
| Semi-related: Their homepage (https://redplanetlabs.com/) has to
| be one of the best looking websites I've seen in a while, buttery
| smooth as well. I love it.
| alberth wrote:
| If you like the pretty static background look with light text,
| checkout https://carrd.co/
|
| It's a website builder with lots of themes similar in design.
| [deleted]
| ThinkBeat wrote:
| I am confused.
|
| This is meant to be hyped to sell your Rama
| platform/product/framework? That you have spent 10 years building
| in secret? During that time you have built a datastore and a
| Kafke competitor and ?
|
| Should not those 10 years be factored into the time it took to
| develop this technical demo?
|
| Is it 100x less code including every LOC in all of Rama?
|
| I mean I am sure you picked a use cast that is well suited to
| creating a Twitterish architecture implementation.
|
| If I went off and wrote a ThinkBeat platform for creating
| Twitterish systems and then created a Twitterish implementation
| on top if it, its real easy to reach low LOCs.
| say_it_as_it_is wrote:
| need to port this to Go...
| itissid wrote:
| TL;DR: Chat GPT summary of 5 "pages" of the thing:
| https://chat.openai.com/share/bd6eac38-5bac-4c6f-b405-7ca7d8...
| skybrian wrote:
| It sounds like interesting technology for someone, but I wonder
| more about scaling down. What does a developer instance running
| on a laptop look like?
| nathanmarz wrote:
| Great question. There's actually two ways to look at this: what
| does it look like to run Rama in a unit test environment, and
| what does it look like to run a small-scale single-node Rama
| application in production?
|
| For the former, Rama has a class called "InProcessCluster" that
| works identically to a real cluster. It enables Rama
| applications to be tested and experimented with end-to-end.
| There's an example of this in the post and this is what we're
| releasing next week.
|
| For the latter, Rama can be run on a single node with each
| daemon and module being a separate process. We made it really
| easy to launch single-node Rama instances with just a couple
| commands with the "rama" script that comes with the release.
| That said, we haven't spent much time yet optimizing small-
| scale Rama deployments and there's likely things we can do to
| make it more efficient (e.g. combine the Conductor and
| Supervisor daemons into a single process).
| dunk010 wrote:
| You're killing it with the replies here +++
| skybrian wrote:
| Interesting. How about running the cloud? I'm thinking of the
| many ways someone who wants to start a blog could install
| Ghost. [1]
|
| [1] https://ghost.org/docs/install/
| joelthelion wrote:
| Follow up question : do you see Rama as being a good fit for
| applications that /don't/ need Twitter scale? These have
| simpler requirements, but I feel the integration you propose
| could still have value there.
| nathanmarz wrote:
| Yes, it's a better model for developing backends in
| general. Our comparison against Mastodon's official
| implementation demonstrates this, being at least 44% less
| code.
|
| It's the ability to avoid the impedance mismatches which
| dominate existing tooling that makes such a difference.
| With existing databases, including RDBMS's, you have to
| twist your application to fit their data models. The
| existence of things like ORMs help, but they add their own
| layers of complexity.
|
| With Rama, you mold your indexes to exactly match your
| application's needs. And you're always just working with
| objects represented however you want, whether appending
| data to depots, processing data in ETLs, or storing data in
| PStates.
|
| That computation and storage are integrated and colocated
| is another way that Rama simplifies application development
| and deployment.
| sourcecodeplz wrote:
| Who cares. Mastodon was/is destined to fail. Trigger happy mods
| ban you from a server, then you're banned from a bunch.
| kaimac wrote:
| skill issue
| chiefalchemist wrote:
| "We stood on the shoulders of giants..."
|
| X years from now "We reduced the cost of building _____ at
| Mastodon-scale by 1000x".
|
| It's certainly interesting, certainly an accomplishment, but it's
| also the nature of the game. The present eating the past, to be
| eaten by the future. Rinse. Repeat.
| LeifCarrotson wrote:
| > How is it possible that we've reduced the cost of building
| scalable applications by multiple orders of magnitude?
|
| > You can begin to understand this by starting with a simple
| observation: you can describe Mastodon (or Twitter, Reddit,
| Slack, Gmail, Uber, etc.) in total detail in a matter of hours.
| It has profiles, follows, timelines, statuses, replies, boosts,
| hashtags, search, follow suggestions, and so on. It doesn't take
| that long to describe all the actions you can take on Mastodon
| and what those actions do. So the real question you should be
| asking is: given that software is entirely abstraction and
| automation, why does it take so long to build something you can
| describe in hours?
|
| > At its core Rama is a coherent set of abstractions...
|
| This conclusion is alarming to read from a company that's trying
| to sell a new platform. The vast majority of the work in building
| Twitter or Reddit is not about building a coherent set of
| abstractions, it's working with an often incoherent reality,
| dealing with a myriad of laws that describe, as if your web app
| were a human clerk at a post office, how to handle PII and credit
| cards and CSAM filters and audits and copyright claims and on and
| on...
|
| I'm honestly shocked that the technical implementation of a
| simplified, coherent platform took a full 9 person-months. That
| shouldn't be the hard part. What I'd want to know as a
| prospective customer is how you handle exceptions to your
| beautiful, idealized architecture, when some foreign country
| requires that you only store comments posted by their citizens
| within their borders or something like that.
| amendegree wrote:
| ~~full text search doesn't appear to work... so it's possible
| they punted on one of the harder parts, which is fast efficient
| accurate fuzzy search, which moderation and a lot of those
| other harder things rely on.~~
|
| eta: they say that had it but removed it because apparently
| it's not something mastodon supports. so I guess it is a pretty
| good high level implementation.
| ciconia wrote:
| > I'm honestly shocked that the technical implementation of a
| simplified, coherent platform took a full 9 person-months.
|
| To be fair they developed this whole new platform to build this
| app with. I guess that's where the effort went.
| titanomachy wrote:
| Not exactly:
|
| > Our implementation is built on top of a new platform called
| Rama that we at Red Planet Labs have developed over the past
| 10 years.
| newZWhoDis wrote:
| So the things that make it difficult are all things you
| shouldn't be doing in the first place? Well that certainly
| helps.
|
| You shouldn't be handling PII/raw CC's anyways (assuming
| FinTech is not your core business)
|
| Secretly scanning your customers private messages against an
| illegal and immoral hash table from a pseudo-government entity?
| Are you law enforcement? No? Then fucking stop.
|
| Copyright claims? Fuck 'em. Only do what you are absolutely,
| positively, no way-out legally bound to do. No more no less.
| Require formal, written requests and comply in the maximum
| amount of time allowed.
|
| Audits? What kind of audit? If they're non-financial you're
| probably doing something wrong.
|
| Corporate squares have ruined the tech scene, and it's time to
| resist.
| nathanmarz wrote:
| Building Twitter/Mastodon *not at scale* isn't that hard and
| certainly doesn't take 200 person-years. Building it *at scale*
| is a completely different story. Remember the fail-whale? That
| was years of Twitter struggling to scale their product.
|
| That said, as we described in the post our implementation of
| Mastodon is less code than Mastodon's official implementation.
| So not only is Rama orders of magnitude more efficient for
| building applications at scale, it's also much faster for
| building first versions of an application.
| roguas wrote:
| Well since you use clojure, you probably know that to have
| small codebase, people often pick clojure. Going from point A
| to point Z quickly is rarely a goal for startups, going
| through A.. B... C... quickly, is the goal. I am still
| looking through all this, but a thought of having to bet on
| some java api + hope and pray it will jump over all unknown
| hoops, hm.
|
| Comparisons to twitter are unfair, twitter is not really
| technical gem or is it? It's pretty impressive to build it
| with 3 ppl in 3 months, but hmm also seems feasible using
| other tech, given all blueprints are out there.
| nathanmarz wrote:
| Well, as mentioned in the post Instagram literally just
| built and released their own barebones Twitter clone this
| year, and it took them 25 person years. They were also able
| to leverage all their existing infrastructure powering
| similar products.
|
| So I would not say it's remotely feasible to do this in
| less than one person-year with any other technology.
|
| https://www.washingtonpost.com/technology/2023/07/29/meta-
| th...
| _dwt wrote:
| Hmmm, "Rama is programmed entirely with a Java API - no custom
| languages or DSLs" according to the landing page, but this sure
| looks like an embedded DSL for dataflow graphs to me - Expr and
| Ops everywhere. Odd angle to take.
| [deleted]
| nathanmarz wrote:
| I consider "DSL" as something that's its own language with it's
| own lexer and parser, like SQL. The Rama API is just Java -
| classes, interfaces, methods, etc. Everything you do in Rama,
| from defining indexes, performing queries, or writing dataflow
| ETLs, is done in Java.
| chc4 wrote:
| This is usually referred to as an "embedded DSL" - you have a
| DSL embedded in a normal programming language using its first
| class constructs.
| gfodor wrote:
| Yep the original term DSL was for custom languages, the
| eventual introduction of using it for these kinds of
| literate APIs was done later. Using it in the original way
| unqualified is fine imo.
| goostavos wrote:
| Odd thing to split hairs over.
| gkfuhff wrote:
| When someone makes a distinction that you don't immediately
| appreciate, maybe don't just dismiss it as splitting hairs,
| as if the world was a simple place.
| dcre wrote:
| It's not a small detail. It's one of the headline claims!
| gfodor wrote:
| Something I'm immediately thinking about with this is change
| management and inertia at the early stages of a new, underdefined
| project. Less code is great, the big question is how such a
| system compares to the usual hack-and-slash method of getting a
| v1 up and running as you search for PMF from the perspectives of
| ops, cost, data migrations, rapid deployments, and so on.
| Presumably, the idea here is to start from the beginning with
| Rama, skipping over the usual "monolith fetches from RDBMS" happy
| paths, even for your basic prototype, this way you don't slip
| into a situation like Twitter did where that grew slowly into an
| unscalable monstrosity requiring a rewrite. So an article focused
| on the "easy" part that's required in the beginning of rapid
| change, as much as it's not as important as the "simple" part
| that shines later at scale, seems useful.
| nathanmarz wrote:
| Thanks, this is a good idea for a another post.
|
| The basic operation Rama provides for evolving an application
| over time is "module update". This lets you update the code for
| an existing module, including adding new depots, PStates, and
| topologies.
| trollied wrote:
| > We spent nine person-months building our scalable Mastodon
| instance.
|
| + the time spent creating Rama, the platform that enables it.
|
| Very dishonest leaving that out.
| nathanmarz wrote:
| You're missing the point. Rama is a generic platform that
| provides a new baseline for how expensive it is to build
| applications at scale. There's nothing about Rama specific to
| social networks. What we're showing is that Rama creates a new
| era in software engineering where the cost of building
| applications at scale is radically reduced. With Rama, anyone
| embarking on a new application today has a radically different
| economic outlook for the end-to-end cost of developing that
| application from prototype through large scale.
| 3cats-in-a-coat wrote:
| If I grasp the essence of Rama:
|
| - "Depots" are event streams (for event sourced data
| repositories)
|
| - ETL read one or more streams and project them to indexable
| read models...
|
| - Which read models are called "PStates" and represent nested
| combinations of indices like hashtables, b-trees, linked
| lists and so on. The point of those being they have the data
| in fast to query way.
|
| - And you have query engine which splits a query into 1+
| index sub-queries and then aggregates.
|
| Am I missing something, this seems relatively standard event-
| sourced / CQRS-like architecture, but streamlined to avoid
| redundancy and reimplementation of common abstractions.
|
| It would've helped if the terms were less obscure than
| "depots" and "PStates".
| nathanmarz wrote:
| From the post:
|
| _Individually, none of these concepts are new. I'm sure
| you've seen them all before. You may be tempted to dismiss
| Rama's programming model as just a combination of event
| sourcing and materialized views. But what Rama does is
| integrate and generalize these concepts to such an extent
| that you can build entire backends end-to-end without any
| of the impedance mismatches or complexity that characterize
| and overwhelm existing systems._
|
| You have the general model correct, but here are a few
| clarifications:
|
| - PStates are partitioned, durable, replicated indexes that
| are represented as arbitrary combinations of data
| structures. A PState can be as simple an an integer per
| partition, or it can be complex like a map of lists of maps
| of sets. PStates allow you to shape your indexes to
| perfectly match your application's use cases.
|
| - I wouldn't call Rama queries an "engine", as it's
| considerably more straightforward in how it works than
| something like SQL. The base query API is called "paths",
| which are an imperative way to concisely reach into one
| partition of one PState to fetch or aggregate values.
| There's also "query topologies" which are predefined, on-
| demand distributed computations that can fetch and
| aggregate data from many partitions of many PStates.
| 3cats-in-a-coat wrote:
| Thanks, I will read more soon! I'm curious... how do you
| resolve the "impedance mismatch" between some "canonical"
| models that business decisions are made, based upon,
| which need to be synchronous with the depots (and
| mutually synchronous with other models sharing fragments
| of the same data), and the eventually consistent read
| models, which have a more lax constraint on how up to
| date they are?
|
| How do you ensure consistency here? How do you organize
| it in the data flow?
|
| Say I update a user, because that user seems to still be
| there in the query result/indexes, but actually an event
| for this user being deleted has happened some time ago?
|
| This can also happen I suppose of the depots run queries
| themselves on PState in order to determine if a certain
| event is valid at all or not, and how exactly to carry it
| out.
| nathanmarz wrote:
| The impedance mismatches you're used to from using
| databases are gone because:
|
| - You can finely tune your indexes to be exactly the
| optimal shape for your application (data structure). You
| can see this in our Mastodon implementation with the big
| variety of data structures we used for all the use cases.
| - You're generally just using regular Java objects
| everywhere: appending to depots, during ETL processing,
| and stored in indexes.
|
| How you coordinate data creation with view updates is a
| deeper topic, so I'll just summarize one of the basic
| mechanisms Rama provides for coordinating this. Depot
| appends can have an "ack level" that determines the
| conditions before Rama tells you that depot append has
| completed. The default level is "full ack" which includes
| all streaming topologies colocated with that depot fully
| processing that record. With this level, when the depot
| append completes you know that all associated indexes
| (PStates) have been updated.
|
| There's also "append ack", which only waits for the depot
| append to be replicated on the depot, and "no ack", which
| is fire and forget. These all have their uses depending
| the specific needs of an application.
| 3cats-in-a-coat wrote:
| Thanks! So we can see these ACKs as "wait and
| synchronize" signals I suppose? However how can we ensure
| an "all or nothing" between all parties trying to ACK a
| conditions they're mutually dependent on? I.e.
| transactionality or atomicity?
| dustingetz wrote:
| you're missing automatic/free linear scaling
| 3cats-in-a-coat wrote:
| Systems that promise "free linear scaling" without
| qualifiers either withhold or have not analyzed/realized
| their bottlenecks yet. Say if there is eventual
| consistency maybe the "eventuality" becomes so long that
| the service fails at its purpose. Or the communication
| link bandwidth is exhausted between key business logic
| (mutation event generating) services, and so on.
|
| The only systems that scale linearly are stateless
| systems. Mastodon is not stateless. And even stateless
| systems hit some bottlenecks eventually, as they exist
| and run in a scale-variant Universe.
|
| So this claim by itself doesn't immediately impress me,
| just turns my red lights on, awaiting further
| investigation. But we can of course discuss why this
| claim is made and how is it supported. The article is
| long so I've not had the chance to read it entirely yet.
|
| But we have X number of event streams mapped through Y
| number of ETLs to produce Z number of read model indices,
| in a shape that seems to form a highly interlinked DAG,
| which eventually loops back on itself in terms of message
| flow. Just the increased cross-chatter here as we
| introduce more features suggests non-linear scaling.
| yid wrote:
| > What we're showing is that Rama creates a new era in
| software engineering where the cost of building applications
| at scale is radically reduced.
|
| Bold of you to come to HN with the breathless hyperbolic
| marketing fluff that may work on Twitter...
| nathanmarz wrote:
| I think we provided a ton of substance backing up that
| claim, and we will provide even more next week when we
| release the build of Rama that anyone can use and its
| corresponding 100k words of documentation.
| Zak wrote:
| Not from the perspective of this being a demo application to
| sell Rama. The pitch is that if you use Rama, you can achieve
| similar results.
| boredumb wrote:
| neat read but I was expecting to read about twitter migrating and
| literally 100x savings being had.
| dunk010 wrote:
| You never know.
| buro9 wrote:
| Measuring "Twitter Scale" by tweets per second seems to be not
| how I would measure it.
|
| Updates per second to end users who follow the 7K tweets per
| second seems more realistic, it's the timelines and notifications
| that hurt, not the top of ingest tweets per second prior to the
| fan out... and then of course it's whether you can do that
| continuously so as not to back up on it.
| nathanmarz wrote:
| That's why we're saying "at 403 fanout". The bottleneck of
| Mastodon/Twitter is timeline writes, which is posts/second
| multiplied by the average number of followers per post. So our
| instance is doing 1.4M timeline writes / second.
|
| Another important metric is "time to deliver to follower
| timelines", which is tricky due to how much variance there can
| be every second due to the extremely unbalanced social graph.
| When someone with 20M followers posts, that multiples the
| number of needed timeline writes by 15x. We went into depth in
| our post on how we handled that to provide fairness by
| preventing these big users from hogging all the resources all
| at once.
| faitswulff wrote:
| I heard somewhere that one of the particular challenges of
| Twitter's scale is not the average fanout, but the outliers
| where millions or tens of millions of users follow a single
| account. Does your simulation take that into account?
| nathanmarz wrote:
| Yes, we discussed this at length in the post.
| duped wrote:
| This is what they've been hyping on Twitter for a week?
|
| FWIW, why hype at all? Why "We'll more in a week. Then more in
| two weeks." Show the code today!
| newaccount74 wrote:
| Considering the length and amount of detail in this blog post,
| I understand why they would need another week to get the code
| ready (assuming there will be more docs)
| nathanmarz wrote:
| We're releasing 100k words of high-quality documentation next
| week.
| jitl wrote:
| This architecture seems very similar to existing offerings in the
| "in-memory data grid" category, like Apache Ignite and Hazelcast.
| I'm more familiar with Ignite (I built a toy Notion backend with
| it over a few afternoons in 2020).
|
| The way Ignite works overall is similar. You make a cluster of
| JVM processes, your data partitioned and replicated across the
| cluster, and you upload some JARs of business logic to the
| cluster to do things. Your business logic can specify locality so
| it runs on the same nodes as the relevant data, which ideally
| makes things a lot faster compared to systems where you need to
| pull all your data across the wire from a DB. Like Rama, Ignite
| uses a Java API for everything, including serializing and storing
| plain 'ol java objects.
|
| Ignite's architecture isn't focused on "ETL" into "PStates".
| Instead it's more about distributed "caches" of data. It does
| have streaming for ingestion
| (https://ignite.apache.org/docs/latest/data-streaming), but you
| can transactionally update the datastore directly
| (https://ignite.apache.org/docs/latest/key-value-
| api/transact...). It also has a "continuous query" feature for
| those reactive queries to retrieve data
| (https://ignite.apache.org/docs/latest/key-value-
| api/continuo...).
|
| Rama's data-structure oriented PState index seems easier to work
| with than building indexes yourself on top of Ignite's KV cache,
| but Ignite also offers an SQL language, so you can insert your
| data into the KV cache however, add some custom SQL functions,
| and then accept more flexible SQL querying of your data compared
| to the very purpose-built PCache things, but still be able to do
| lower-level or more performance-oriented logic with data
| locality.
|
| Anyways, if you like some of this stuff but want to use an
| existing, already battle-tested open source project, you can look
| for these "in-memory data grid", "distributed cache", kind of
| projects. There's a few more out there that have similar JVM
| cluster computing models.
| theptip wrote:
| Hazelcast has been on my list to explore for a while. Anyone
| have pointers to a good sample project / deep-dive in the same
| sort of spirit as the OP here?
|
| Also would love to hear folks' thoughts on the sort of usecase
| where this data grid excels.
| 2Gkashmiri wrote:
| whats the server specs of this demo running at?
|
| is it baremetal?
|
| vps?
|
| how about doing a comparison on consumer grade vps like 1
| vcpu/4GB ram setup comparison between your product and mastodon
| or pleroma for example?
|
| i mean sure you can build a twitter scale product but federation
| means people can do that on their own and with your tech, they
| dont have to worry about scaling issues.
| polishdude20 wrote:
| "We spent nine person-months building our scalable Mastodon
| instance. "
|
| Nono, you can't say that when later on you say it's built on top
| of Rama. You literally spent 10 years building the framework to
| even make this.
|
| And yes, you built this in 10k lines of code but how many lines
| of code is Rama? This seems disingenuous.
| beders wrote:
| No it is not disingenuous. They didn't built Rama to build a
| twitter clone.
|
| And you can't take the "twitter engine" out of twitter and
| build other apps with it. A lot of it is custom built to fit
| the twitter data model.
|
| Unlike - it seems - Rama.
| scratcheee wrote:
| Actually took them longer than that even.
|
| They had to invent the computer first, and before that they had
| to create a universe capable of sustaining both life and
| computers.
| xmonkee wrote:
| Going by their claims, they are showing off their generalizable
| platform, Rama, by building an application on top of it. The
| application is an example, not the product. For example,
| someone implements a Todo app on their hot new javascript
| framework in 10 mins, your objection would be, "But it took you
| 2 years to make the framework, so actually it took you 2 years
| plus 10 mins". Why stop there? It also took many years to build
| the underlying language, networking layers, infrastructure,
| processors, materials etc etc. You have to draw the line at the
| point where the application specific code starts and the
| generalizable platform ends, no?
| bo1024 wrote:
| Their point is to show off the power of Rama, I.e. it is
| possible to build such applications on top with little work.
| dunk010 wrote:
| Exactly, why are so many people missing this point. It's not
| "we built a narrow, tedious framework for knocking off
| Twitter clones", it's "We built a platform that turns data
| processing on its head and look in a couple of months you can
| clone Twitter just imagine what YOU can do with this."
|
| I see parallels though to Datomic, where they turned the
| database inside out, co-located the app logic and data and
| indexes, etc. There are a bunch of great videos on YT about
| Datomic by Rich Hickey & co, worth a watch and I think shine
| a light on the approach here, too.
| bo1024 wrote:
| I think they didn't do a good job making the point clear
| for people who just clicked the link without context. It
| starts off talking a lot about the Mastodon clone and then
| gradually starts talking about Rama as it goes.
| dunk010 wrote:
| People should probably close the TikTok and pick up a
| book instead to increase their attention spans then :-D
| colonwqbang wrote:
| The JVM took years to write. It took decades to develop the
| technology necessary to build a modern microcomputer. Before
| that, millennia to invent written language. And now that those
| platforms (including Rama) all exist, one can deliver a
| Mastodon server on top of them in about 9 man-months.
| [deleted]
| RomanPushkin wrote:
| > ...10k lines of code. This is 100x less code than the ~1M lines
| Twitter
|
| I wish I didn't see this comparison, which is not fair at all.
| Everyone in their right mind understands that the number of
| features is much less, that's why you have 10k lines.
|
| Add large-scale distributed live video support at the top of
| that, and you won't get any close to 10k lines. It's only one of
| many many examples. I really wish you compare Mastodon to Twitter
| 0.1 and don't do false advertising
|
| > 100M bots posting 3,500 times per second... to demonstrate its
| scale
|
| I'm wondering why 100M bots post only 3500 times per second? Is
| it 3500 per second for each bot? Seems like it's not, since https
| termination will consume the most of resources in this case. So
| I'm afraid it's just not enough.
|
| When I worked in Statuspage, we had support of 50-100k requests
| per second, because this is how it works - you have spikes, and
| traffic which is not evenly distributed. TBH, if it's only 3500
| per second total, then I have to admit it is not enough.
| MikePlacid wrote:
| So
|
| >> 100M bots posting 3,500 times per second...
|
| and
|
| > We used the OpenAI API to generate 50,000 statuses for the
| bots to choose from at random.
|
| I wonder: 100M OpenAI bots talking to each other continuously
| and with much vigor - how is this affecting OpenAI's uhm...
| intellect?
| sdwr wrote:
| They generated 50,000 statuses once, put them in a text file,
| and pick between them randomly. So not at all.
| nathanmarz wrote:
| We're comparing just to the original consumer product, which is
| about the same as Mastodon is today. That's why we said
| "original consumer product" and not "Twitter's current consumer
| product".
|
| Mastodon actually has more features than the original Twitter
| consumer product like hashtag follows, global timelines, and
| more sophisticated filtering/muting capabilities.
|
| Some people argue it's not so expensive to build a scalable
| Twitter with modern tools, which is why we also included the
| comparison against Threads. That's a very recent data point
| showing how ridiculously expensive it is to build applications
| like this, and they didn't even start from scratch as
| Instagram/Meta already had infrastructure powering similar
| products.
| doctorpangloss wrote:
| I work in gaming, so I cannot speak to your specific
| experiences. Entity Component Systems are extremely
| performant, really good science, and shipping in middlewares
| like Unity. However, in order to ship an ECS game, in my
| experience, you have to have already made your whole game
| first in a normal approach, in order to have everything be
| fully specified sufficiently that you can correctly create an
| ECS implementation. In practice, this means ECS is used to
| make first person shooters, which have decades of well
| specified traditions and behavior, and V2 of simulators, like
| for Cities Skylines 2 and Two Point Campus.
|
| So this is not meant to diminish the achievements of what you
| have built at all, it is more intellectually honest to say
| that "any high performance framework is most suitable for
| projects that are exact clones of pre-existing, mature things
| with battle-hardened specifications and end user behavior."
| While this might cover some greenfield projects, including
| the best capitalized ones that may matter to you, it does
| diminish the appeal of a framework for the vast majority of
| success stories from small & poorly capitalized teams. Those
| small & poor teams are very innovation and serendipity driven
| and hence rarely copying a pre-existing thing. And even if
| they try to become well-capitalized, they are almost always
| doing so by having worked on the thing they are copying
| already (i.e., already shipping version 1.0 for years).
| deltree7 wrote:
| [dead]
| ehutch79 wrote:
| Are you sure about that.
|
| With things like twitter, the ui is not the hard part. Things
| like moderation are the secret sauce. All the corner cases
| and support for devopsy stuff likely account for a lot.
| Routing to specific instances for celebrities and such.
| justrealist wrote:
| Nathan worked at Twitter so while he might be wrong, I
| don't think it's reasonable to assume he's just naive
| http://nathanmarz.com/blog/leaving-twitter.html.
| littlestymaar wrote:
| > Add large-scale distributed live video support at the top of
| that, and you won't get any close to 10k lines.
|
| But Twitter isn't, and was never, about live video support:
| this is pure feature creep and that's how you get headcount
| inflation and a company that can be run for 17 years without
| making profit (AKA terrible business).
|
| > When I worked in Statuspage, we had support of 50-100k
| requests per second
|
| Having served 150kqps in the past as part of a very small team
| (3 back-end eng.), this isn't necessarily as big of a deal as
| you make it sound: it mostly depends on your workload and
| whether or not you need consistency (or even persistence at
| all) in your data.
|
| In practice, building scalable system is hard mostly because
| it's hard to get the management forgot their vanity ideas that
| go against your (their, actually) system's scalability.
| WheelsAtLarge wrote:
| We see this type of post regularly. Something like, "How I
| built a better <pick your app> clone by myself in a month."
| Well, no, usually it's just a bare skeleton with the least
| amount of functionality. Not only that, the software is the
| least of the functionality. The organizational structure around
| the app is what matters most to keep it going. It's an
| attention seeking ploy and the whole thing usually disappears
| real quick.
| hosh wrote:
| How much of Twitter's code base is dedicated to things like
| security, compliance, and moderation?
|
| Granted, a decentralized platform would eliminate some of
| those, just by being decentralized
| adventured wrote:
| None of those things get eliminated by decentralization, they
| get distributed to whatever the point of control / ownership
| is.
|
| Mastodon still requires security, compliance and moderation.
| And those requirements are going to keep getting more
| challenging by the year. It'll end up being another reason
| nobody will want to host content in a decentralized manner,
| the burden will become obnoxious.
| Fomite wrote:
| During the first wave of Twitter exodus, several people in
| my professional circle asked if they should be hosting
| professional, field-specific Mastadon servers, etc.
|
| My answer, born of moderating a modestly sized forum, was
| "Absolutely not under any circumstances."
| hosh wrote:
| An organization trying to maintain an ISO certification
| will have drastically different policies and controls than
| a small shop, or even a hobbyist group.
|
| Everyone (theoretically) would be complying to statues in
| its broadest sense, but jurisdiction, regulations, industry
| best practices, reporting requirements, and appetite for
| risk is going to be different from organization to
| organization. It's not one-size-fits-all.
|
| So some of these things are eliminated because the people
| hosting those are not put under the same kind of scrutiny
| as say, a Twitter.
___________________________________________________________________
(page generated 2023-08-15 23:00 UTC)