[HN Gopher] We reduced the cost of building Mastodon at Twitter-...
       ___________________________________________________________________
        
       We reduced the cost of building Mastodon at Twitter-scale by 100x
        
       Author : tekacs
       Score  : 589 points
       Date   : 2023-08-15 17:54 UTC (5 hours ago)
        
 (HTM) web link (blog.redplanetlabs.com)
 (TXT) w3m dump (blog.redplanetlabs.com)
        
       | MisterBastahrd wrote:
       | What one finds useful from a web application and what the web
       | application actually is are usually two entirely different
       | things.
       | 
       | I work in marketing automation, and I guess I have in one way or
       | another my entire career. The clients who need to use the
       | platform to communicate with their own clients over social
       | networking may never touch our print delivery system, but that
       | doesn't mean that print delivery doesn't exist or isn't
       | important.
       | 
       | If you are unwilling to recreate the totality of the application
       | in terms of functionality, then you are lying if you say that you
       | have recreated it.
        
         | nathanmarz wrote:
         | Not sure what you're talking about as we implemented the
         | entirety of Mastodon from scratch.
        
       | lionkor wrote:
       | Very interesting, looking forward to reading the docs once they
       | come out.
       | 
       | Why Java?
        
         | nathanmarz wrote:
         | It's one of the most widely used/known programming languages in
         | the world.
         | 
         | It's a Java API so any JVM language can be used (Clojure,
         | Scala, etc.).
        
       | rubiquity wrote:
       | Noticeably missing are any details about concurrency control and
       | replication or recovery protocols. A Twitter clone is one thing
       | but any sort of application needing ACID Transactions is a whole
       | other beast.
        
         | nathanmarz wrote:
         | All data on Rama is replicated automatically with a
         | configurable "replication factor". Data written to Rama is not
         | made visible until it's successfully replicated. The
         | documentation we're releasing next week includes a page going
         | into detail in how this works.
        
       | Pxtl wrote:
       | I've seen many people describe frameworks like this - you know,
       | first you have the slow back-end event-driven master database
       | that you don't query live against, then you've got eventual-
       | consistency flows against the various data-warehouses and data-
       | stores and partitioned sharded databases in useful query-friendly
       | layouts that you actually read live from... and I never see it
       | clearly explained: how do you read a change back to the user
       | literally just after they made the change? How do you say "other
       | views eventual-consistency is fine but for this view of this bit
       | of info we need it updated _now_ ".
       | 
       | This write-up is very detailed but I couldn't find that
       | explanation.
        
         | lossolo wrote:
         | You have the option to track the latest update time and, during
         | the minute immediately following this update, direct all reads
         | to come from the leader. Additionally, you could oversee the
         | replication lag among followers and block queries on any
         | follower that lags more than a minute behind the leader.
         | 
         | For the client, it's feasible to retain the timestamp of its
         | most recent write. In this way, the system can ensure that the
         | replica responsible for any reads related to that user
         | incorporates updates at minimum up to that recorded timestamp.
         | If a replica isn't adequately current, the read can either be
         | managed by another replica or the query can wait until the
         | replica catches up. The timestamp might take the form of a
         | logical timestamp, signifying the order of writes (e.g., log
         | sequence number), or it could be based on the actual system
         | clock, where synchronized clocks become vital.
         | 
         | When your replicas are spread across multiple datacenters--
         | whether for user proximity or enhanced availability--there's an
         | added layer of complexity. Requests requiring the leader's
         | involvement must be directed to the datacenter housing the
         | leader.
        
         | chubot wrote:
         | Yeah definitely, these ideas always sound very appealing to me,
         | in theory -- I almost wonder why nobody has built it before
         | 
         | e.g. they mention "event sourcing" and "materialized views" in
         | the post -- sounds good
         | 
         | But I thought I heard from a few people who were like "we
         | ripped event sourcing" out of our codebase and so forth
         | 
         | And yeah your question is an obvious good one, and the Reddit
         | answer of "write through cache" ... is less than satisfying to
         | me
         | 
         | I FREQUENTLY have the problem where I reload the page and
         | Reddit shows me stale data. It's SUPER buggy.
         | 
         | ---
         | 
         | Anyway I definitely look forward to hearing people try this and
         | what their longer term impressions are !
         | 
         | I basically want to know what the tradeoffs are -- it sounds
         | good, but there are always tradeoffs
         | 
         | So is the tradeoff "eventual consistency" ? What are the other
         | tradeoffs?
        
           | chubot wrote:
           | Hilariously, I went to edit the above comment, and HN was
           | overloaded. Then it served me three or four 500's, AND it
           | served me stale data in between
           | 
           | I was pissed off that I would have to type my comment again,
           | but actually it did save it, and refreshing worked.
           | 
           | From what I understand Hacker News is architected more in-
           | memory, on one big box ... Perhaps similar to the event
           | sourcing model
           | 
           | (not knocking hacker news -- it's generally a very fast site,
           | MUCH better than Reddit. Just that scaling beyond a single
           | machine is difficult and full of gotchas )
        
           | nathanmarz wrote:
           | When you step back and consider the incredible amount of
           | manpower and resources that have been put into these
           | applications, it's amazing how buggy these applications are.
           | To put it simply, they're buggy because the underlying
           | infrastructure and techniques used to build them are so
           | complex that the implementation is beyond the realm of human
           | understanding.
           | 
           | The way applications are built, and have been built since
           | before I was born, is by combining together potentially
           | dozens of narrow tools together: databases, computation
           | systems, caches, monitoring tools, etc. There has never been
           | a cohesive model capable of expressing arbitrary backends
           | end-to-end, and every application built has to be twisted to
           | fit onto the existing narrow pieces.
           | 
           | Rama is a lot more than just "event sourcing" and
           | "materialized views". Those are two concepts at its
           | foundation, but the real breakthrough is being that cohesive
           | model capable of expressing diverse backends in their
           | entirety. It took me more than five years of dedicated
           | research to discover this model, and it was extremely
           | difficult.
        
             | chubot wrote:
             | Yes, I 100% agree with you. I would like something like
             | this to succeed, and agree the problem is real.
             | 
             | But what are the tradeoffs? There's nothing that comes with
             | 100x benefit with no tradeoffs
             | 
             | (side note: I worked on Google Code for a short while in
             | 2008, concurrent with Github's founding ... I think Github
             | moved a lot faster in a large part because they weren't
             | dealing with distributed systems at first -- they had a
             | Rails app, a database, and RAID disks, and grew it from
             | there. We had BigTable and perf reviews :-P )
             | 
             | Eventual consistency is probably one?
             | 
             | Can I specify that comment editing is correct and ACID,
             | while likes/upvotes are eventually consistent? (No is a
             | fine answer, these problems are hard)
             | 
             | I read through much of the doc, and don't see a mention of
             | the word "consistency" at all, which seems like an
             | oversight for something that is unifying what would be in a
             | database with computation.
        
               | nathanmarz wrote:
               | Rama is a much broader platform than a database, so the
               | consistency semantics you get depend on how you use it.
               | When using Rama, you're not mutating indexes directly
               | like you do with a database, but adding source data that
               | then gets materialized into any number of indexes.
               | 
               | You get read-after-write consistency for any PStates in a
               | streaming ETL colocated with the depot you appended to.
               | This is if you do the depot append with "full acking",
               | which coordinates its response with the completion of
               | colocated streaming ETLs. If you append at a lower level
               | of acking, then you get eventual consistency on those
               | PStates at the benefit of lower latency appends.
               | 
               | Microbatching is always eventually consistent as
               | processing is asynchronous and uncoordinated to depot
               | appends. Microbatching is higher thorughput than
               | streaming and has simpler fault-tolerance semantics.
               | 
               | You'll be able to read a lot more about this when we
               | release the docs next week.
        
         | jokethrowaway wrote:
         | You can hack it and optimistically render the data you know
         | about because your client created it - on the frontend, at no
         | additional cost.
        
           | hot_gril wrote:
           | This is usually what I do. Don't even want to wait for an
           | HTTP roundtrip for some of these, e.g. "liking" a post should
           | fill in the heart icon or whatever instantly.
           | 
           | One famous example of this going to far: Mac Mail app used to
           | play a whoosh sound when your email is actually sent. They
           | changed it to whoosh instantly no matter what. Given how
           | often an email might fail to send or get delayed, this meant
           | an actually useful indication of "great, your thing was sent,
           | you can close your laptop now" was rendered useless.
        
             | ceejayoz wrote:
             | > Don't even want to wait for an HTTP roundtrip for some of
             | these, e.g. "liking" a post should fill in the heart icon
             | or whatever instantly.
             | 
             | HN does this, and on slow days, about half of my upvotes
             | don't go through.
        
               | sitzkrieg wrote:
               | yea but does hn have any client side js?
        
               | [deleted]
        
               | codetrotter wrote:
               | Yeah, a very small amount so that clicking the upvote
               | button does not need to reload the whole page
        
               | hot_gril wrote:
               | Can confirm, HN relies on client-side JS for voting and
               | collapsing, but view/post/edit/delete don't need it.
        
               | hot_gril wrote:
               | Messaging apps often have a checkmark to indicate the
               | message actually went to the server, and maybe another
               | checkmark to indicate it was received on the other end.
               | Maybe HN needs an icon indicating that your vote went
               | through.
        
               | newaccount74 wrote:
               | Make the arrows grey to indicate the click registered,
               | make them disappear to indicate the server successfully
               | registered the vote?
        
               | hot_gril wrote:
               | Yeah, it's easy enough that I was able to do it in the
               | web inspector in a minute (artificial 1s network delay
               | added): https://s11.gifyu.com/images/ScPMI.gif
        
               | wizofaus wrote:
               | You actually check your list of upvoted comments?
        
               | ceejayoz wrote:
               | No, I just notice it when I come back to the thread later
               | in the day and a bunch of comments I know I upvoted are
               | back to normal.
        
         | [deleted]
        
         | sixo wrote:
         | I imagine you get some UUID back from your write, and
         | effectively "block" until you see it committed to the event
         | stream. The intent of such a system is certainly for the read-
         | after-write latency to be not much longer than a traditional
         | RDBMS. (This is roughly what the RDBMS is doing under the hood
         | anyway.) Probably you can isolate latency-critical paths so
         | they don't get stuck behind big stream processing jobs.
         | 
         | The advantage of the overall architecture is that nearly all
         | application functionality (for something like a social network)
         | can tolerate much higher latency than an RDBMS, so you really
         | want to have architectural building blocks that let you
         | actually _use_ this headroom.
        
         | jedberg wrote:
         | The short answer is write-through cache.
         | 
         | You write the update directly to the cache closest to the user
         | and into the eventually consistent queue.
         | 
         | We did this at reddit. When you make a comment the HTML is
         | rendered and put straight into the cache, and the raw text is
         | put into the queue to go into the database. Same with votes. I
         | suspect they do this client side now, which is now the closest
         | cache to the user, but back then it was the server cache.
        
           | jitl wrote:
           | Rama should bundle a write-through cache! Another in-memory
           | JVM cluster thingamabob (Apache Ignite) used to propose
           | write-through caching as it's primary selling point:
           | https://ignite.apache.org/use-cases/in-memory-
           | cache.html#:~:....
           | 
           | Or, maybe their pitch is that the streaming bits are so fast,
           | you can just await the downstream commit of some write to a
           | depot and it'll be as fast as a normal SQL UPDATE.
        
             | nathanmarz wrote:
             | Rama is extremely fast, as you can see for yourself by
             | playing with our Mastodon instance.
        
               | jedberg wrote:
               | It's fast until it's not. Making a post and then hitting
               | reload and not seeing it can be very jarring for the
               | user. Definitely something to think about.
        
               | nathanmarz wrote:
               | What do you mean? Every post I do shows up instantly.
               | 
               | Reloading the page from scratch can be slow due to
               | Soapbox doing a lot of stuff asynchronously from scratch
               | (Soapbox is the open-source Mastodon interface that we're
               | using to serve the frontend). https://soapbox.pub/
        
               | squeaky-clean wrote:
               | I think the concern is will this still be true if
               | Mastodon reaches Twitter scale?
        
               | nathanmarz wrote:
               | Rama is scalable. So as your usage grows, you add
               | resources to keep up. Scaling a Rama module is a trivial
               | one-line command at the terminal.
               | 
               | Rama's built-in telemetry provides the information you
               | need to know when it's time to scale.
        
               | jitl wrote:
               | is there a way to guarantee reading your own writes from
               | a client perspective?
        
               | nathanmarz wrote:
               | Yes. Depot appends by default don't return success until
               | colocated streaming topologies have completed processing
               | the data. So this is one way to coordinate the frontend
               | with changes on the backend.
               | 
               | Within an ETL, when the computations you do on PStates
               | are colocated with them, you always read your own writes.
        
               | teacpde wrote:
               | It makes sense, but wouldn't the write be slow?
               | Especially when you have many streaming pipelines.
        
               | nathanmarz wrote:
               | That's part of designing Rama applications. Acking is
               | only coordinated with colocated stream topologies -
               | stream topologies consuming that depot from another
               | module don't add any latency.
               | 
               | Internally Rama does a lot of dynamic auto-batching for
               | both depot appends and stream ETLs to amortize the cost
               | of things like replication. So additional colocated
               | stream topologies don't necessarily add much cost (though
               | that depends on how complex the topology is, of course).
        
           | reilly3000 wrote:
           | DynamoDB's DAX cache espouses the same approach.
           | 
           | I have to say in my ~12 years as an active Redditor I can't
           | recall a time where I saw any real state issues, even with
           | rapidly changing votes, etc. Bravo!? Now that we're beyond
           | the days of molten servers, I have to say its overall
           | reliability in the face of massive spiky traffic is quite a
           | feat.
        
             | endisneigh wrote:
             | Really? I see this all the time even now.
        
           | squeaky-clean wrote:
           | In Nathan Marz's (the article author) book, Big Data, he
           | describes this and calls it the Speed Layer. I haven't fully
           | finished the article yet, but the components it's describing
           | seem to be equivalent to what he calls the Batch Layer and
           | the Serving Layer in his book.
           | 
           | But I'm kind of getting the impression this works without any
           | speed layer and is expected to be fast enough as-is.
        
             | nathanmarz wrote:
             | Rama codifies and integrates the concepts I described in my
             | book, with the high level model being: indexes =
             | function(data) and query = function(indexes). These
             | correspond to "depots" (data) , "ETLs" (functions),
             | "PStates" (indexes), and "queries" (functions).
             | 
             | Rama is not batch-based. That is, PStates are not
             | materialized by recomputing from scratch. They're
             | incrementally updated either with stream or microbatch
             | processing. But PStates can be recomputed from the source
             | data on depots if needed.
        
           | kulahan wrote:
           | This explains so many bugs I came across on Reddit. I guess
           | it works, but man I dislike this implementation.
        
       | j45 wrote:
       | Just on system design alone this was enjoyable to read.
       | 
       | Clever architecture can help as much if not more than clever
       | coding especially when keeping it simple but scalable is needed.
        
       | [deleted]
        
       | afro88 wrote:
       | Looks amazing and incredibly smart. But I found the LOC and
       | implementation time comparisons to Twitter and Threads very
       | disingenuous. It makes me wonder what other wool will be pulled
       | over our eyes with Rama in future (or important real world
       | details missed / future footguns).
       | 
       | Still super impressive. Reminds me of when I discovered Elixir
       | while building a social-ish music discovery app. Switching the
       | backend from Rails to Elixir felt like putting on clothes that
       | actually fit after wearing old sweats. Rama looks like a similar
       | jump, but another layer up, encompassing system architecture.
        
         | softwaredoug wrote:
         | It's hard to construct a true randomized control trial for
         | software engineering methods. People make many claims about
         | programming paradigms or tools hard to validate.
         | 
         | It's also unsure what we would compare a tool like this to. I
         | doubt you could just say "compare it to Rails" given how
         | frameworks like rails are bound to specific data models, and
         | most realistic applications. You'd have to compare it to some
         | other opinion about how to wire together different data
         | structures.
        
         | StephenAmar wrote:
         | +1 the comparisons are not great. How much engineer-hours did
         | it take to build Rama itself?
         | 
         | The numbers they got for Twitter likely include the time it
         | took to build their infrastructure, common libraries (like
         | finagle,...)
        
           | paxys wrote:
           | Don't forget their entire ads system, data
           | processing/analytics, monitoring, customer support, payments,
           | internationalization. They have replicated _at most_ a tiny
           | bit of Twitter 's core infra for sending Tweets. The company
           | itself does a lot more than that.
        
           | jacoblambda wrote:
           | Honestly I'm willing to accept the number they gave since the
           | author (Nathan Marz) was one of the lead/founding devs for
           | twitter's streaming compute backend in the past.
        
       | jauntywundrkind wrote:
       | > _The instance has 100M bots posting 3,500 times per second at
       | 403 average fanout to demonstrate its scale._
       | 
       | Mastodon has to send messages to each instance with a recipient.
       | That server can then fan out to all it's subscribers. The way
       | this point is worded makes me think all the bits are on just a
       | single instance, meaning all the fan out can be dealt with
       | internally without having to do any server-to-server at all.
       | 
       | That is a fair comparison to Twitter, which is single instance.
       | But it sounds like a much reduced ambition versus the task
       | Mastodon has to do.
        
         | nathanmarz wrote:
         | We implemented federation fully exactly as you described.
        
       | nicechianti wrote:
       | [dead]
        
       | stuaxo wrote:
       | Lovely, I could see this paradigm spreading to other languages,
       | something was definitely needed.
        
       | kyle-rb wrote:
       | Kinda disappointed by the simulation, where are all the viral
       | posts?
       | 
       | I've been digging around for a while and haven't found any posts
       | with more than 20 faves. The accounts I've found with ~1 million
       | followers have little to no engagement. I want to see how a post
       | with a million faves holds up to the promises of "fast constant
       | time".
       | 
       | I'm especially curious about these queries -- fave-count and has-
       | user-faved -- since a couple years ago Twitter stopped checking
       | has-user-faved when rendering posts more than a month or so old,
       | so I imagine it was expensive at scale.
        
         | nathanmarz wrote:
         | The load generator generates boosts/favorites for a subset of
         | posts that are randomly picked to be "popular". However, since
         | the rate of posts is so high even individual posts picked to be
         | "popular" are only getting ~70 reactions.
         | 
         | Tracking reactions is considerably easier than timeline fanout
         | though, as a favorite does a small handful of things (updates
         | set of users favoriting a status and sending a notification),
         | while fanout has to do an operation on every follower (403
         | operations on average, sometimes up to 22M).
         | 
         | The code getting the favorite count for a status looks like:
         | .localSelect("$$statusIdToFavoriters",
         | Path.key("*statusId").view(Ops.SIZE)).out("*numFavorites")
         | 
         | Because the nested set is subindexed, that's an extremely fast
         | operation (looking at our telemetry, about 0.05ms).
         | 
         | Determining "has-user-faved" looks like:
         | .localSelect("$$statusIdToFavoriters",
         | Path.key("*statusId").view(Ops.CONTAINS,
         | "*accountId")).out("\*hasFavorited")
         | 
         | The API server doesn't do these queries individually, which
         | would be two roundtrips. It does them together in a query
         | topology along with fetching other needed information (like
         | number of boosts, number of replies, "has boosted", "has muted
         | this status", etc.).
        
           | kyle-rb wrote:
           | Thanks for the response, I'm still curious about the details
           | of the subindexing and how that scales. I'll be keeping an
           | eye out for the release!
        
       | mping wrote:
       | Congrats on the (kinda) launch. I was curious to see what you
       | guys were up to. The blog post is pretty detailed, and with good
       | insights. Reducing modern app development complexity to mixing
       | data structures sounds like a good abstraction. I'm sure you
       | thought really hard about the building blocks of Rama and you
       | know your problems better than most of the hn crowd.
       | 
       | Now, the really hard part becomes selling. If companies start
       | using your product to get ahead, that will be the real proof,
       | otherwise its "just" tech that is good on paper.
       | 
       | On a side note, did you guys got any inspiration from clojure? I
       | see lots of interesting projects propping up from clojure
       | people...
       | 
       | Best of luck!
        
         | nathanmarz wrote:
         | Rama is written in Clojure :)
        
           | mark_l_watson wrote:
           | Nathan, we guessed that it was written in Clojure :-)
           | 
           | Interesting to finally see the announcement.
        
           | cutler wrote:
           | "Rama is programmed entirely with a Java API".
        
             | nathanmarz wrote:
             | The customer API in Java, and the implementation of that
             | API is in Clojure.
        
               | fiddlerwoaroof wrote:
               | Will there be a first-party Clojure wrapper? Or, would
               | the expectation be that users would use Java interop?
        
               | nathanmarz wrote:
               | We're not releasing one as we don't have the bandwidth
               | right now to maintain and document another API. That
               | said, making a Clojure wrapper around the Java API should
               | be pretty easy.
        
       | Huhuhn wrote:
       | Would this framework also be useful for building a Lemmy
       | instance?
        
       | endisneigh wrote:
       | I don't really see the point of the comparison. They should show
       | something you could only make with Rama or show how much faster
       | it is to iterate with Rama.
       | 
       | Saying this is 100 or even a million times cheaper is like saying
       | taking a picture of Sistine chapel and printing out copies is a
       | trillion times cheaper than making it originally.
       | 
       | Many of us on this site could make a number of products very
       | efficiently and cheaply given a static and fixed set of
       | requirements as well as an existing implementation for reference.
       | 
       | That being said it was a very detailed post, so kudos for that,
       | but it's far too vague to be actionable. Why not just release the
       | code and post simultaneously instead of just bragging about how
       | little code was required?
        
       | miki123211 wrote:
       | Is this just me, or does the code in the post feel like they've
       | implemented what should have been a new programming language on
       | top of Java?
       | 
       | Their "variables" have names that you have to keep as Java
       | strings and pass to random functions. If you want composable
       | code, you don't declare a function, you call .macro(). For
       | control flow and loops, you don't use if and for, but a weird
       | abstraction of theirs.
       | 
       | I feel like this code could have been a lot simpler if it was
       | written in a specialized language (or a mainstream language with
       | a specialized transpiler and/or Macro capabilities.)
       | 
       | I'd quote the old adage about every big program containing a slow
       | and buggy implementation of Common Lisp, but considering that
       | this thing is written in Clojure, the authors have probably heard
       | it before.
        
       | phillipcarter wrote:
       | In the year of our lord 2023, people are still launching immature
       | products with "we built a clone of a tiny subset of Twitter" as
       | their use case? Come on. Twitter is huge because they have to
       | support a huge number of use cases. Using this proprietary
       | framework won't magically make complex use cases go away.
        
         | DoesntMatter22 wrote:
         | "We recreated a service from 2007 and it's so much faster!"
        
           | nathanmarz wrote:
           | As we mentioned in the article, Instagram just spent ~25
           | person-years building Threads which is a barebones clone of
           | Twitter. Not only did we build our instance 30x faster than
           | that, we have way more features like federation, hashtag
           | follows, polls, DMs, global timelines, and more. And
           | Instagram didn't start from scratch as Instagram/Meta already
           | had infrastructure powering similar products.
           | 
           | https://www.washingtonpost.com/technology/2023/07/29/meta-
           | th...
        
             | DoesntMatter22 wrote:
             | You said this in another post.
             | 
             | > That's why we're comparing it to the cost of Twitter's
             | original consumer product.
             | 
             | Plus, you cloned a preexisting architecture. FB wrote
             | theirs from scratch. Not apples to apples. This is much
             | easier.
        
         | [deleted]
        
       | sixo wrote:
       | The comparisons to Twitter are completely goofy, but the
       | architecture is nothing short of enlightened. Nice work.
        
       | ltr_ wrote:
       | i always had this question: how realistically is to, having an
       | standard spec and interoperable protocols, for toxic apps of big
       | international tech companies that provides """services""", so
       | instances of implementations can be maintained by municipalities
       | or local tech business and talent with 100x less employees and
       | money? what policies should be in place to achieve that? what
       | would be the challenges? it would be better/healthier? is someone
       | researching such things like transition to sustainable digital
       | services? (sustainable in terms of local labor, privacy, economy,
       | accountability, etc...)
       | 
       | i mean if you think about this as public services not as a
       | business, profit is secondary, and first is just to make the
       | thing better and better for the users, no need for spying , no
       | advertisement, no need for a rich piece of shit somewhere getting
       | a piece of the money paid in your city for every taxi drive, food
       | delivery or to give up privacy to a soulless/faceless entity just
       | because you want to say something publicly or keep in touch with
       | people. there is no disruption from their part, its just an old
       | thing put on the internet, they are just in the middle of
       | everyone's life, just sucking everything they can. is the actual
       | state of affairs "efficient"?
       | 
       | there must be fed up engineers and tech people everywhere with
       | the sad state of IT industry.
        
         | 5636588 wrote:
         | The EU already has regulations in place regarding open banking
         | with its Payment Services Directive. I'd imagine a similar
         | framework could be applied to big social tech.
        
       | NoahTheDuke wrote:
       | Congrats! This looks super cool.
       | 
       | Are there any plans for exposing a Clojure API? Given that it's
       | implemented in Clojure, seems like it would be a natural fit.
       | Interop with Java is nice but can be cumbersome compared to the
       | more natural calling conventions and idioms (threading macros
       | instead of `..` builder patterns, etc).
        
         | nathanmarz wrote:
         | Answered this in another comment:
         | https://news.ycombinator.com/item?id=37138526
        
           | NoahTheDuke wrote:
           | Thanks.
        
       | NoraCodes wrote:
       | I would argue that this is not "a Mastodon instance", since it is
       | not running Mastodon - other than that, very very neat work! I'm
       | excited for that "Source Code" link to be live :)
        
         | frandroid wrote:
         | Mastodon-compatible or Mastodon clone... If it quacks and walks
         | like a duck, surely it is part of the aviary family of
         | microblogging services...
        
         | mikae1 wrote:
         | It was a good blog post title choice for making it to the front
         | page of HN.
        
         | nathanmarz wrote:
         | We call it a "Mastodon instance" because we implemented the
         | entire Mastodon API (https://docs.joinmastodon.org/api/). This
         | is in addition to also implementing the ActivityPub API which
         | Mastodon also implements for federation.
        
           | CharlesW wrote:
           | > _We call it a "Mastodon instance" because we implemented
           | the entire Mastodon API..._
           | 
           | Except "Mastodon instance" means an instance of Mastodon,
           | which is open source. Whether or not it was intended to be
           | deceptive (I'd think a group of smart people would know
           | better), this personally left a bad taste in my mouth.
        
           | mattl wrote:
           | Mastodon-compatible would be better.
           | 
           | Mastodon is the name of a piece of software as well as an
           | API, a website, etc.
           | 
           | Naming this stuff is hard but calling it a Mastodon instance
           | would be more confusing.
        
         | [deleted]
        
         | rvnx wrote:
         | I think it's smart from a legal perspective, because the team
         | members seem to partially be coming from companies acquired by
         | Twitter.
         | 
         | So I guess, if you say "it's a Mastodon-clone", you cannot be
         | accused of taking proprietary ideas from Twitter (this is just
         | a guess, they know better).
         | 
         | But technically very interesting and refreshing to see. I
         | really like their approach. It feels they are innovative.
        
         | ollien wrote:
         | Yeah, I think this is just an ActivityPub server that supports
         | the Mastodon extensions, right? I think we should embrace the
         | fact that the federated world can be diverse, rather than just
         | call everything "Mastodon"
        
           | jauntywundrkind wrote:
           | Mastodon has it's own API. It basically offers a very limited
           | ActivityPub API too, but it's own API is very different.
           | 
           | And it's a very slim ActivityPub inplementation. For example,
           | I don't think you can do basic things like get an individual
           | post in ActivityPub. This should be easy simple json-ld to
           | get but it's just 404.
           | https://www.w3.org/TR/activitypub/#retrieving-objects
        
             | colatkinson wrote:
             | Mastodon for sure supports fetching individual posts over
             | ActivityPub. For example:                   curl -L -H
             | 'accept: application/activity+json'
             | 'https://mastodon.social/users/Gargron/statuses/18614983'
             | 
             | It does have a bunch of stuff that isn't federated though,
             | such as Like counts/collections. And of course it only
             | implements the server-to-server (S2S) part of AP, not the
             | client-to-server (C2S) part.
        
             | pmlnr wrote:
             | Mastodon. o. not a.
        
         | Pxtl wrote:
         | Yes, a "mastodon-like Fediverse instance running on our
         | proprietary new application/data framework" sounds like a
         | better description.
         | 
         | And either way, I think the source code to their Mastodonlike
         | will not be usable since it will be running on their Rama
         | server framework.
        
       | doublepg23 wrote:
       | HN seems to be putting you through the wringer, I for one am
       | excited you guys made this and plan to open source it- it looks
       | like a fantastic project.
        
       | [deleted]
        
       | softwaredoug wrote:
       | It's a massive ask, even if the platform was 100x better, for all
       | developers to give up every programming language and database
       | they've ever used to depend on a startups closed source platform
       | for all functionality.
       | 
       | It's hard enough trusting Google or Amazons cloud offerings won't
       | change.
       | 
       | It seems that's what they're proposing right? What am I missing?
        
         | nathanmarz wrote:
         | We're actually not asking anyone to give up anything. First
         | off, it has a simple integration API (which you'll be able to
         | see the details of next week) that allows it to seamlessly
         | integrate with any other backend tool (databases, monitoring
         | systems, queues, etc.). So Rama can be incrementally introduced
         | into any existing architecture.
         | 
         | Second, Rama has a pure Java API and is not a bespoke language.
         | So no new language needs to be learned.
        
           | yazzku wrote:
           | What is the licensing of Rama? Is it libre/open?
        
             | masklinn wrote:
             | > We're keeping it closed-source for now.
        
           | mikae1 wrote:
           | _> Second, Rama has a pure Java API and is not a bespoke
           | language. So no new language needs to be learned._
           | 
           | Isn't Mastodon a Ruby On Rails application?
        
             | theogravity wrote:
             | The article says they re-wrote Mastodon from scratch
             | (probably the backend piece). I'm guessing in Java.
        
               | nathanmarz wrote:
               | Yes, it's 100% written in Java.
        
           | kpw94 wrote:
           | > Rama can be incrementally introduced into any existing
           | architecture
           | 
           | Big if true (and if the opposite, of incrementally _removing_
           | it also works). There have been similar platform efforts in
           | past, such as https://news.ycombinator.com/item?id=20985429 .
           | For that one, the "massive ask to give up every programming
           | language and database they've ever used to depend on a
           | startups closed source platform" seems like the biggest
           | hindrance to adoption.
        
           | softwaredoug wrote:
           | I can imagine this being really useful from the ground up.
           | Because it looks like it wants to be the source of truth,
           | with different views on the data.
           | 
           | It's hard to imagine it for a complex legacy application
           | without having lots of added complexity. It wants to be the
           | unifying programming model for the application. It would seem
           | like running with two RDMS sources of truth simultaneously.
           | 
           | It's like the xkcd "there are 12 ways of doing X, let's
           | create a standard to unify them" now there are 13 ways
        
             | fragmede wrote:
             | That's xkcd 927.
             | 
             | 9, which is 3^2, and 27, which is 3^3. Or 900 is Yoda's
             | age, and 27 which is the 27 club of musicians who committed
             | suicide.
        
               | mkl wrote:
               | Most members of the 27 Club didn't die by suicide:
               | https://en.wikipedia.org/wiki/27_Club
        
           | erlend_sh wrote:
           | So Rama-powered apps need to be written in Java? Or will any
           | JVM language work?
           | 
           | And the Rama core will remain closed-source? That part seems
           | like the toughest sell of all, at a time when the vast
           | majority of developer tooling and backends are open source or
           | at the very least source-available.
        
             | roguas wrote:
             | Since all jvm languages usually have "ffi" to
             | javaapis/javalibs, I would say yes.
        
             | nathanmarz wrote:
             | Any JVM language should work. We've built modules with
             | Clojure.
             | 
             | We're keeping it closed-source for now.
        
               | vanviegen wrote:
               | > We're keeping it closed-source for now.
               | 
               | Rama sounds interesting to me for my 'next big project',
               | but I'd not even consider building it on top of a closed
               | core. I think this is a pretty common sentiment in these
               | circles.
               | 
               | I understand building an OSS business is not easy either.
               | But perhaps there is some middle of the road that you can
               | walk?
               | 
               | - A contractual obligation to open source all (now
               | current) code a couple of years in the future? - Or an
               | almost-OSS license that makes life difficult for
               | competing cloud providers, like
               | https://www.mongodb.com/licensing/server-side-public-
               | license... ?
        
       | clusterhacks wrote:
       | I'm excited to see the docs for Rama. But I am also a little
       | scared of the comment " I came to suspect a new programming
       | paradigm was needed" from Nathan.
       | 
       | It's not so much that I think the comment is wrong or anything,
       | but rather that it seems so similar to what I have heard in the
       | past from power-lisp (or Clojure in this case) super-smart
       | engineers.
       | 
       | I feel like we have reached a point in software development where
       | "better" paradigms don't necessarily gain much adoption. But if
       | Rama wins in the marketplace, that will be interesting. And I am
       | quite excited to see what a smart tech leader and good team have
       | been able to grind out given a years-long timeframe in this
       | programming platform space . . .
        
         | nathanmarz wrote:
         | This is why we exposed Rama as a Java API rather than Clojure
         | or our internal language (which is defined with Clojure macros,
         | so it's technically also Clojure). Rama's Java dataflow API is
         | effectively a subset of our internal language, with operations
         | like "partitioners" being implemented using continuations.
        
           | cutler wrote:
           | Just curious, what advantage over Clojure did reverting to
           | "pure Java" give you? Perf or something else?
        
             | mwcampbell wrote:
             | Presumably approachability for programmers that would be
             | scared away by Clojure. Smart marketing move.
        
       | teakie wrote:
       | [dead]
        
       | riffic wrote:
       | the group involved here may want to be mindful of the Mastodon
       | gGmbH trademarks. Using the Mastodon logo on redplanetlabs.com to
       | pitch a reimplementation of ActivityPub might be seen as
       | infringing.
       | 
       | https://joinmastodon.org/trademark
       | 
       | removed part about the mastodon subreddit since this is clearly
       | not about the Mastodon software per se.
        
         | hedora wrote:
         | Any trademark case is going to have to prove that a reasonable
         | person would think this article is from Mastodon gGmbH, or is
         | talking about their product "Mastodon".
         | 
         | The top of the page reads _" Red Planet Labs"_, the title of
         | the article is _" How we reduced the cost of building Twitter
         | at Twitter-scale by 100x"_ and the first line of the article is
         | _" We built a Twitter-scale Mastodon instance from scratch in
         | only 10k lines of code."_
         | 
         | No reasonable person is going to think that this article has
         | anything to do with the official Mastodon software, so there's
         | no trademark issue here.
        
       | throwaway7382 wrote:
       | Their big reveal after 10 years is "keep waiting".
       | 
       | Move along, nothing to see here.
        
         | frandroid wrote:
         | If you're going to bother creating a throwaway, you should make
         | a more impactful/meaningful statement...
        
       | sandGorgon wrote:
       | nice! is this is cloudflare worker & block storage built in Java
       | ?
        
       | dataangel wrote:
       | I do C++ backend work in a non-web industry and this entire post
       | is Greek to me. Even though this is targeted at developers, you
       | need a better pitch. I get "we did this 100x faster" but the
       | obvious followup question is "how" but then the answer seems to
       | be a ton of flow diagrams with way too many nodes that tell me
       | approximately nothing and some handwaving about something called
       | P-States that are basically defined to be entirely nebulous
       | because they are any kind of data structure.
       | 
       | I'm not saying there's nothing here, but I am adjacent to your
       | core audience and I have no idea whether there is after reading
       | your post. I think you are strongly assuming a shared basis where
       | everybody has worked on the same kind of large scale web app
       | before; I would find it much more useful to have an overview of,
       | "This what you would usually do, here are the problems with it,
       | here is what we do instead" with side by side code comparison of
       | Rama vs what a newbie is likely to hack together with single
       | instance postgres.
        
         | ldayley wrote:
         | Nathan Marz created Apache Storm, coauthored the book "Big
         | Data", and founded an early real-time infrastructure team at
         | Twitter. It's likely the 'curse of knowledge' of working on
         | this specific problem for so long is responsible for the unique
         | and/or unfamiliar style of communication here.
         | 
         | EDIT: Specifics
        
         | HaZeust wrote:
         | ... Maybe the post isn't targeted to your audience at all? How
         | is "C++" and "non-web work" adjacent to web work with web
         | launguage audiences?
        
           | rollcat wrote:
           | OP did not specify what their industry actually is. I've been
           | doing "web work" for 17 years and I'm sharing their concern:
           | where's the TL;DR for this? If this somehow can make me 100x
           | as productive, how about starting with a "hello world"
           | example that shows me how is it different from pip install
           | django, etc?
        
           | slim wrote:
           | he's a developer and curious about the subject. Since it's a
           | blog post, not a scientific paper, the fact that he did not
           | understand could be a communication failure. I think he's
           | being helpful
        
         | sdwr wrote:
         | In a typical architecture, the DB stores data, and the backend
         | calls the DB to make updates and compile views.
         | 
         | Here, the "views" are defined formally (the P-states), and
         | incrementally, automatically updated when the underlying data
         | changes.
         | 
         | Example problem:
         | 
         | Get a list of accounts that follow account 1306
         | 
         | "Classic architecture":
         | 
         | - Naive approach. Search through all accounts follow lists for
         | "1306". Super slow, scales terribly with # of accounts.
         | 
         | - Normal approach. Create a "followed by" table, update it
         | whenever an account follows / unfollows / is deleted / is
         | blocked.
         | 
         | Normal sounds good, but add 10x features, or 1000x users, and
         | it gets trickier. You need to make a new table for each
         | feature, and add conditions to the update calls, and they start
         | overlapping... Or you have to split the database up so it
         | scales, but then you have to pay attention to consistency, and
         | watch which order stuff gets updated in.
         | 
         | Their solution is separating the "true" data tables from the
         | "view" tables, formally defining the relationship between the
         | two, and creating the "view" tables magically behind the
         | scenes.
        
           | ethbr1 wrote:
           | So... at a high level, early React for data? In other words,
           | letting a framework manage update dependency graph tracking,
           | and then cascading updates through its graph in an optimized
           | manner to enhance performance?
           | 
           | Obviously, with tons of implementation difficulties and
           | details, and not actual graph structures, but as a top level
           | analogy.
        
           | endisneigh wrote:
           | I read their post and honestly it's not really that much
           | different than just materialized views in a regular database
           | plus async jobs to do the long running tasks.
           | 
           | It's a ridiculous amount of fluff to describe that. Not to
           | mention it's proprietary and only supports the JVM and
           | doesn't integrate with the tons of tooling designed about
           | RDBMS unless you stream everything to them, defeating the
           | purpose.
           | 
           | What really irks me is that they go on and on bragging about
           | the low LoC count and literally show nothing complete. They
           | should've held on this post and released it simultaneously
           | with the code.
        
             | sdwr wrote:
             | This is all armchair for me, but I think they have
             | containers and sharding built in as well, which is the
             | other half of the puzzle when it comes to scaling.
        
               | endisneigh wrote:
               | Yes, but there are plenty of NewSQL that support views
               | and offer all of that too. Yugabyte, Cockroach, TiDB and
               | that's just off the top of my head and open source. If we
               | count proprietary then you have Fauna, Cloud Spanner and
               | more I'm sure.
        
             | sixo wrote:
             | The difference is that the materialized-view logic lives
             | naturally in the application code; there's no step where
             | they go out of the DB to do computations and then reinsert.
             | 
             | Once SQL materialized views aren't enough, you might do
             | this by replicating your database into Kafka, implementing
             | logic in Flink or something, and reinserting into the same
             | DB/Elasticsearch/etc. Very common architecture. (Writ
             | small, could also use a queue processor like RabbitMQ.)
             | 
             | Their approach is to instead--apparently--make all of these
             | first-class elements of the same ecosystem, not by "putting
             | it all in the database", but by putting the database into
             | the application code. Which seems wild, but colocates data,
             | transformation, and view.
             | 
             | Seems like it would open up a lot of cans of worms, but if
             | you solve those, sounds great.
        
             | leonidasv wrote:
             | IIUIC, the most significant difference from a materialized
             | view is that the Rama infrastructure recompute only the
             | changed data by checking the relationship between fields,
             | while a traditional materialized view recomputes the whole
             | table?
        
       | dustingetz wrote:
       | Summarizing, now edited down with some editorializing for
       | clarity:
       | 
       | What is it? build web-scale reactive backends with an expressive
       | java dataflow API. Instead of a database you develop your own
       | custom app-specific indexes which are reactive, distributed and
       | durable. It's like event sourcing and materialized views but
       | integrated in a linearly scalable way.
       | 
       | > _I cannot emphasize enough how much interacting with indexes as
       | regular data structures instead of magical "data models"
       | liberates backend programming_
       | 
       | > _It allows for true incremental reactivity from the backend up
       | through the frontend. ... enable UI frameworks to be fully
       | incremental instead of doing expensive diffs to find out what
       | changed._
       | 
       | Ok, so in my mind I am positioning this against Materialized /
       | differential dataflow, whose key primitive is a efficient
       | streaming incremental join that works across very large
       | relational tables. Materialized makes SQL reactive, Rama gives
       | you a java dataflow DSL for developing purpose-built reactive
       | database indexes.
       | 
       | How it works? 4 concepts: Depot, ETLs, PState, Query
       | 
       | Depots: "distributed, durable, and replicated logs of data."
       | [Event streams?] "like Kafka except integrated" "All data coming
       | into Rama comes in through depot appends."
       | 
       | ETLs: data arrives via depots, and is ETLed to PStates via "a
       | Java dataflow API for coding topologies that is extremely
       | expressive". "Most of the time spent programming Rama is spent
       | making ETLs."
       | 
       | PStates seem like reactive data structures that are also
       | durable/replicated, these are meant to supersede your database
       | and indexes, letting you build custom purpose-built indexes that
       | contain 100M elements:
       | 
       | > _"partitioned states" are how data is indexed in Rama ...
       | Unlike existing databases, which have rigid indexing models (e.g.
       | "key-value", "relational", "column-oriented", "document",
       | "graph", etc.), PStates have a flexible indexing model. In fact,
       | they have an indexing model already familiar to every programmer:
       | data structures. A PState is an arbitrary combination of data
       | structures. ... nested data structures can efficiently contain
       | hundreds of millions of elements. For example, a "map of maps" is
       | equivalent to a "document database", and a "map of subindexed
       | sorted maps" is equivalent to a "column-oriented database". Any
       | [composition] is valid - e.g. you can have a "map of lists of
       | subindexed maps of lists of subindexed sets"._
       | 
       | Query: once you develop PStates to aggregate relevant data into a
       | custom index of the right ... shape?, query seems sorta like
       | GraphQL selectors over your custom index:
       | 
       | > _Queries in Rama take advantage of the data structure
       | orientation of PStates with a "path-based" API that allows you to
       | concisely fetch and aggregate data from a single partition_
       | 
       | > _"query topologies" ... real-time distributed querying and
       | aggregation over an arbitrary collection of PStates. These are
       | the analogue of "predefined queries" in traditional databases,
       | except programmed via the same Java API as used to program ETLs
       | and far more capable._
        
       | yayitswei wrote:
       | For context, nathanmarz created what is now Apache Storm, which
       | is used for stream processing at some of the world's largest
       | companies, so he knows a thing or two about scale.
        
       | primitivesuave wrote:
       | The "N bots posting X times/second" isn't a very meaningful
       | statistic. A system's reliability is mostly characterized by its
       | performance under stress.
        
       | ceejayoz wrote:
       | Headline: "building Twitter at Twitter-scale"
       | 
       | Article: "building Mastodon at sub-Twitter-scale"
        
         | hinkley wrote:
         | We have twitter at home!
        
         | dang wrote:
         | Thank - I've changed the title to be consistent with what the
         | article says.
        
           | messe wrote:
           | Minor gripe, but there's a misspelling in the title: it
           | should read Mastodon not Mastadon.
        
             | dang wrote:
             | Oops! Fixed now. Thanks to you, Fabricio20, and riffic.
        
           | Fabricio20 wrote:
           | Hey dang, I think you have a typo in the title.. says
           | Mastadon instead of Mastodon!
        
           | riffic wrote:
           | masto not masta
        
           | Kiro wrote:
           | That's not the title of the article and also not what the
           | article says. I would be really pissed if you editorialized
           | the title of my article like that.
        
             | dang wrote:
             | I'm happy to correct it if anyone suggests a better one.
             | The intention is to find a neutral title that accurately
             | reflects what the article itself is saying.
             | 
             | We've learned that when an article's original title
             | generates complaints like
             | https://news.ycombinator.com/item?id=37137317, the thread
             | is likely to get derailed by shallow arguing about the
             | title. It's in both the author's interest and the
             | community's for us to nip that in the bud by (1) putting an
             | accurate and neutral title at the top (preferably using
             | representative language from the article itself), and (2)
             | marking the title complaint offtopic since it no longer
             | applies. These steps nudge the thread toward discussing the
             | article's content rather than merely its title.
        
               | Kiro wrote:
               | Alright, sounds reasonable. I think the problem here is
               | that the author specifically says (in a sibling comment)
               | that the point is not Mastodon and now it's in the title.
               | Maybe they're fine with it though.
        
               | dang wrote:
               | I'm no expert and definitely get things wrong - we only
               | skim things and make a first crack at an attempt, and
               | then rely on other people to refine it. If Nathan or
               | someone else wants to suggest a more accurate and neutral
               | title, we can do that - the goal is simply to clear the
               | discussion space for something more interesting than
               | title fever (https://hn.algolia.com/?dateRange=all&page=0
               | &prefix=true&que...). But now I'm repeating myself!
        
         | nathanmarz wrote:
         | Actually if you read the article you can see we tested way
         | above Twitter-scale. We can easily run this instance at full
         | Twitter-scale by just paying for more servers.
         | 
         | The point isn't the Mastodon instance, but rather that Rama
         | enabled us to build it at scale with in a tiny amount of code
         | and time.
        
           | ceejayoz wrote:
           | Mastodon and Twitter don't do the same amount of work per
           | post. Mastodon doesn't have a recommendation engine, they
           | don't have an advertising engine, they don't scan every post
           | for CSAM, there's no global search, etc. (Some of these
           | things are good not to have, but they still _drastically_
           | change the scope.)
           | 
           | Claiming to have enabled significant scaling of a
           | Mastodon/ActivityPub-compatible instance is fine. Claiming to
           | have replicated Twitter on the cheap is, from the post, not
           | accurate.
        
             | nathanmarz wrote:
             | That's why we're comparing it to the cost of Twitter's
             | original consumer product. As a demonstration of Rama, we
             | scoped this project to the entirety of Mastodon which is
             | roughly equivalent to Twitter's original consumer product
             | (actually, it's probably greater in scope with additional
             | features like hashtag follows and more complex filter/mute
             | capabilities).
             | 
             | All those use cases you listed absolutely can be
             | implemented with Rama, and Rama's extreme cost benefits
             | would apply to those as well.
        
               | [deleted]
        
       | failuser wrote:
       | Is there a breakdown of effort Twitter spent doing the mastodon-
       | level service (serving a feed of the accounts you are subscribed
       | to) vs everything else like ads, algorithmic feed, moderation,
       | fighting spam, copyright claims, localization, GR, PR, safety,
       | etc?
        
       | sharms wrote:
       | The performance on the example Mastodon instance is very
       | responsive - almost anywhere I clicked loaded nearly instantly. I
       | created an account and the only thing I found missing was it
       | doesn't implement full text search unless my user was tagged, but
       | that might be a Mastodon specific item.
       | 
       | I think they have thought a lot about typical hard problems, such
       | as having the timeline processing happen along side the pipeline,
       | taking network / storage etc out of the picture. Nice work!
        
         | nathanmarz wrote:
         | That is indeed an intentional part of Mastodon's design, which
         | we tried to be faithful to as much as possible. We originally
         | implemented search across all statuses and had to reimplement
         | it when we realized Mastodon is a little different.
        
           | sitzkrieg wrote:
           | did you ever consider starting from something already
           | technically performant like pleroma or misskey?
        
             | nathanmarz wrote:
             | Well, we didn't start from anything as we implemented this
             | completely from scratch. I believe Mastodon is much more
             | widely used than those so it seemed like a better target
             | for this.
        
               | [deleted]
        
               | sitzkrieg wrote:
               | yea i misspoke, good distinction lol. certainly makes
               | sense, thanks
        
       | DigitalSea wrote:
       | One of these posts. Dig into the numbers and claims, and you'll
       | see that they're not building something anywhere near Twitter
       | scale.
        
       | whateverman23 wrote:
       | ctrl+f "ads"
       | 
       | ctrl+f "monetization"
       | 
       | ctrl+f "moderation"
       | 
       | ctrl+f "existing infrastructure"
       | 
       | ctrl+f "personalization"
       | 
       | etc etc
       | 
       | Yeah about what I expect from a "we rebuilt twitter for cheap"
       | post. There's no point to the comparisons with the Twitter
       | codebase size/cost. It completely distracts from what is probably
       | a perfectly fine project.
        
         | jscottmiller wrote:
         | That's a fair criticism - this isn't an apples-to-apples
         | comparison. What I find interesting about this is the cost of
         | running the service. Being able to run a twitter-like thing on
         | a hundred or so large aws instances is neat and I'm sure that
         | many folks here dream of that kind of efficiency at their day
         | jobs, but I'm more excited about how this scales down. Can you
         | run a community of a thousand or so posters on a micro or nano
         | instance for a few bucks a month or less? At that scale and
         | cost, donations should easily be able to cover hosting fees and
         | you would surely be able to deputize enough mods to keep things
         | civil (for whatever definition of civil your instance lands
         | on). Ads, monetization, personalization are non-issues (well,
         | not major issues) at that scale.
        
         | 10000truths wrote:
         | The point is that much of that should be unnecessary to sustain
         | the service because hosting costs are significantly (presumably
         | 100x) cheaper.
        
       | raverbashing wrote:
       | They deserve congrats for that since they built the load test to
       | prove this
       | 
       | Of course, for actual production use, there's probably a lot of
       | things still, but this is a very nice works nonetheless
        
         | nathanmarz wrote:
         | I wouldn't call our instance a load test, as it's a legitimate
         | instance available for anyone to use. It's very much
         | production-grade.
        
           | raverbashing wrote:
           | This is what I'm calling load test:
           | 
           | > The instance has 100M bots posting 3,500 times per second
           | at 403 average fanout to demonstrate its scale.
        
           | mdaniel wrote:
           | I would not want to speak for raverbashing but I feel the
           | same way: I actually can't tell if the bug is with soapbox or
           | with your instance but clicking on the first link from your
           | post practically locks up my browser due to _every single
           | Toot_ getting swapped out  "at twitter scale"
           | 
           | If one clicks quick enough to jump to an actual post, it
           | seems relatively static so it's hard to tell if the bots are
           | deleting and recreating their posts or what. In true Xitter
           | clone fashion, trying to view the Posts & replies from any
           | one user is "sign in
           | 
           | Anyway, all of this is not to detract from your framework
           | announcement as much as to have you consider that it's
           | perfectly fine to label that instance as a load test, that's
           | a fine thing, but calling it a legitimate instance seems to
           | be a potential source of confusion
        
             | nathanmarz wrote:
             | We did notice on a less powerful machines the browser
             | getting overwhelmed with the rate of new content (even
             | though we're only streaming 10/s instead of the full 3.5k/s
             | actually happening on the backend). I don't know if the
             | poor performance in this context is due to Soapbox, the
             | browser, or just the hardware.
             | 
             | To get a better feeling of Rama's performance on your
             | hardware, I suggest registering an account which will allow
             | you to poke around the whole platform. It takes just a
             | couple seconds to register and we don't send any emails.
        
       | FridgeSeal wrote:
       | Semi-related: Their homepage (https://redplanetlabs.com/) has to
       | be one of the best looking websites I've seen in a while, buttery
       | smooth as well. I love it.
        
         | alberth wrote:
         | If you like the pretty static background look with light text,
         | checkout https://carrd.co/
         | 
         | It's a website builder with lots of themes similar in design.
        
       | [deleted]
        
       | ThinkBeat wrote:
       | I am confused.
       | 
       | This is meant to be hyped to sell your Rama
       | platform/product/framework? That you have spent 10 years building
       | in secret? During that time you have built a datastore and a
       | Kafke competitor and ?
       | 
       | Should not those 10 years be factored into the time it took to
       | develop this technical demo?
       | 
       | Is it 100x less code including every LOC in all of Rama?
       | 
       | I mean I am sure you picked a use cast that is well suited to
       | creating a Twitterish architecture implementation.
       | 
       | If I went off and wrote a ThinkBeat platform for creating
       | Twitterish systems and then created a Twitterish implementation
       | on top if it, its real easy to reach low LOCs.
        
       | say_it_as_it_is wrote:
       | need to port this to Go...
        
       | itissid wrote:
       | TL;DR: Chat GPT summary of 5 "pages" of the thing:
       | https://chat.openai.com/share/bd6eac38-5bac-4c6f-b405-7ca7d8...
        
       | skybrian wrote:
       | It sounds like interesting technology for someone, but I wonder
       | more about scaling down. What does a developer instance running
       | on a laptop look like?
        
         | nathanmarz wrote:
         | Great question. There's actually two ways to look at this: what
         | does it look like to run Rama in a unit test environment, and
         | what does it look like to run a small-scale single-node Rama
         | application in production?
         | 
         | For the former, Rama has a class called "InProcessCluster" that
         | works identically to a real cluster. It enables Rama
         | applications to be tested and experimented with end-to-end.
         | There's an example of this in the post and this is what we're
         | releasing next week.
         | 
         | For the latter, Rama can be run on a single node with each
         | daemon and module being a separate process. We made it really
         | easy to launch single-node Rama instances with just a couple
         | commands with the "rama" script that comes with the release.
         | That said, we haven't spent much time yet optimizing small-
         | scale Rama deployments and there's likely things we can do to
         | make it more efficient (e.g. combine the Conductor and
         | Supervisor daemons into a single process).
        
           | dunk010 wrote:
           | You're killing it with the replies here +++
        
           | skybrian wrote:
           | Interesting. How about running the cloud? I'm thinking of the
           | many ways someone who wants to start a blog could install
           | Ghost. [1]
           | 
           | [1] https://ghost.org/docs/install/
        
           | joelthelion wrote:
           | Follow up question : do you see Rama as being a good fit for
           | applications that /don't/ need Twitter scale? These have
           | simpler requirements, but I feel the integration you propose
           | could still have value there.
        
             | nathanmarz wrote:
             | Yes, it's a better model for developing backends in
             | general. Our comparison against Mastodon's official
             | implementation demonstrates this, being at least 44% less
             | code.
             | 
             | It's the ability to avoid the impedance mismatches which
             | dominate existing tooling that makes such a difference.
             | With existing databases, including RDBMS's, you have to
             | twist your application to fit their data models. The
             | existence of things like ORMs help, but they add their own
             | layers of complexity.
             | 
             | With Rama, you mold your indexes to exactly match your
             | application's needs. And you're always just working with
             | objects represented however you want, whether appending
             | data to depots, processing data in ETLs, or storing data in
             | PStates.
             | 
             | That computation and storage are integrated and colocated
             | is another way that Rama simplifies application development
             | and deployment.
        
       | sourcecodeplz wrote:
       | Who cares. Mastodon was/is destined to fail. Trigger happy mods
       | ban you from a server, then you're banned from a bunch.
        
         | kaimac wrote:
         | skill issue
        
       | chiefalchemist wrote:
       | "We stood on the shoulders of giants..."
       | 
       | X years from now "We reduced the cost of building _____ at
       | Mastodon-scale by 1000x".
       | 
       | It's certainly interesting, certainly an accomplishment, but it's
       | also the nature of the game. The present eating the past, to be
       | eaten by the future. Rinse. Repeat.
        
       | LeifCarrotson wrote:
       | > How is it possible that we've reduced the cost of building
       | scalable applications by multiple orders of magnitude?
       | 
       | > You can begin to understand this by starting with a simple
       | observation: you can describe Mastodon (or Twitter, Reddit,
       | Slack, Gmail, Uber, etc.) in total detail in a matter of hours.
       | It has profiles, follows, timelines, statuses, replies, boosts,
       | hashtags, search, follow suggestions, and so on. It doesn't take
       | that long to describe all the actions you can take on Mastodon
       | and what those actions do. So the real question you should be
       | asking is: given that software is entirely abstraction and
       | automation, why does it take so long to build something you can
       | describe in hours?
       | 
       | > At its core Rama is a coherent set of abstractions...
       | 
       | This conclusion is alarming to read from a company that's trying
       | to sell a new platform. The vast majority of the work in building
       | Twitter or Reddit is not about building a coherent set of
       | abstractions, it's working with an often incoherent reality,
       | dealing with a myriad of laws that describe, as if your web app
       | were a human clerk at a post office, how to handle PII and credit
       | cards and CSAM filters and audits and copyright claims and on and
       | on...
       | 
       | I'm honestly shocked that the technical implementation of a
       | simplified, coherent platform took a full 9 person-months. That
       | shouldn't be the hard part. What I'd want to know as a
       | prospective customer is how you handle exceptions to your
       | beautiful, idealized architecture, when some foreign country
       | requires that you only store comments posted by their citizens
       | within their borders or something like that.
        
         | amendegree wrote:
         | ~~full text search doesn't appear to work... so it's possible
         | they punted on one of the harder parts, which is fast efficient
         | accurate fuzzy search, which moderation and a lot of those
         | other harder things rely on.~~
         | 
         | eta: they say that had it but removed it because apparently
         | it's not something mastodon supports. so I guess it is a pretty
         | good high level implementation.
        
         | ciconia wrote:
         | > I'm honestly shocked that the technical implementation of a
         | simplified, coherent platform took a full 9 person-months.
         | 
         | To be fair they developed this whole new platform to build this
         | app with. I guess that's where the effort went.
        
           | titanomachy wrote:
           | Not exactly:
           | 
           | > Our implementation is built on top of a new platform called
           | Rama that we at Red Planet Labs have developed over the past
           | 10 years.
        
         | newZWhoDis wrote:
         | So the things that make it difficult are all things you
         | shouldn't be doing in the first place? Well that certainly
         | helps.
         | 
         | You shouldn't be handling PII/raw CC's anyways (assuming
         | FinTech is not your core business)
         | 
         | Secretly scanning your customers private messages against an
         | illegal and immoral hash table from a pseudo-government entity?
         | Are you law enforcement? No? Then fucking stop.
         | 
         | Copyright claims? Fuck 'em. Only do what you are absolutely,
         | positively, no way-out legally bound to do. No more no less.
         | Require formal, written requests and comply in the maximum
         | amount of time allowed.
         | 
         | Audits? What kind of audit? If they're non-financial you're
         | probably doing something wrong.
         | 
         | Corporate squares have ruined the tech scene, and it's time to
         | resist.
        
         | nathanmarz wrote:
         | Building Twitter/Mastodon *not at scale* isn't that hard and
         | certainly doesn't take 200 person-years. Building it *at scale*
         | is a completely different story. Remember the fail-whale? That
         | was years of Twitter struggling to scale their product.
         | 
         | That said, as we described in the post our implementation of
         | Mastodon is less code than Mastodon's official implementation.
         | So not only is Rama orders of magnitude more efficient for
         | building applications at scale, it's also much faster for
         | building first versions of an application.
        
           | roguas wrote:
           | Well since you use clojure, you probably know that to have
           | small codebase, people often pick clojure. Going from point A
           | to point Z quickly is rarely a goal for startups, going
           | through A.. B... C... quickly, is the goal. I am still
           | looking through all this, but a thought of having to bet on
           | some java api + hope and pray it will jump over all unknown
           | hoops, hm.
           | 
           | Comparisons to twitter are unfair, twitter is not really
           | technical gem or is it? It's pretty impressive to build it
           | with 3 ppl in 3 months, but hmm also seems feasible using
           | other tech, given all blueprints are out there.
        
             | nathanmarz wrote:
             | Well, as mentioned in the post Instagram literally just
             | built and released their own barebones Twitter clone this
             | year, and it took them 25 person years. They were also able
             | to leverage all their existing infrastructure powering
             | similar products.
             | 
             | So I would not say it's remotely feasible to do this in
             | less than one person-year with any other technology.
             | 
             | https://www.washingtonpost.com/technology/2023/07/29/meta-
             | th...
        
       | _dwt wrote:
       | Hmmm, "Rama is programmed entirely with a Java API - no custom
       | languages or DSLs" according to the landing page, but this sure
       | looks like an embedded DSL for dataflow graphs to me - Expr and
       | Ops everywhere. Odd angle to take.
        
         | [deleted]
        
         | nathanmarz wrote:
         | I consider "DSL" as something that's its own language with it's
         | own lexer and parser, like SQL. The Rama API is just Java -
         | classes, interfaces, methods, etc. Everything you do in Rama,
         | from defining indexes, performing queries, or writing dataflow
         | ETLs, is done in Java.
        
           | chc4 wrote:
           | This is usually referred to as an "embedded DSL" - you have a
           | DSL embedded in a normal programming language using its first
           | class constructs.
        
             | gfodor wrote:
             | Yep the original term DSL was for custom languages, the
             | eventual introduction of using it for these kinds of
             | literate APIs was done later. Using it in the original way
             | unqualified is fine imo.
        
         | goostavos wrote:
         | Odd thing to split hairs over.
        
           | gkfuhff wrote:
           | When someone makes a distinction that you don't immediately
           | appreciate, maybe don't just dismiss it as splitting hairs,
           | as if the world was a simple place.
        
           | dcre wrote:
           | It's not a small detail. It's one of the headline claims!
        
       | gfodor wrote:
       | Something I'm immediately thinking about with this is change
       | management and inertia at the early stages of a new, underdefined
       | project. Less code is great, the big question is how such a
       | system compares to the usual hack-and-slash method of getting a
       | v1 up and running as you search for PMF from the perspectives of
       | ops, cost, data migrations, rapid deployments, and so on.
       | Presumably, the idea here is to start from the beginning with
       | Rama, skipping over the usual "monolith fetches from RDBMS" happy
       | paths, even for your basic prototype, this way you don't slip
       | into a situation like Twitter did where that grew slowly into an
       | unscalable monstrosity requiring a rewrite. So an article focused
       | on the "easy" part that's required in the beginning of rapid
       | change, as much as it's not as important as the "simple" part
       | that shines later at scale, seems useful.
        
         | nathanmarz wrote:
         | Thanks, this is a good idea for a another post.
         | 
         | The basic operation Rama provides for evolving an application
         | over time is "module update". This lets you update the code for
         | an existing module, including adding new depots, PStates, and
         | topologies.
        
       | trollied wrote:
       | > We spent nine person-months building our scalable Mastodon
       | instance.
       | 
       | + the time spent creating Rama, the platform that enables it.
       | 
       | Very dishonest leaving that out.
        
         | nathanmarz wrote:
         | You're missing the point. Rama is a generic platform that
         | provides a new baseline for how expensive it is to build
         | applications at scale. There's nothing about Rama specific to
         | social networks. What we're showing is that Rama creates a new
         | era in software engineering where the cost of building
         | applications at scale is radically reduced. With Rama, anyone
         | embarking on a new application today has a radically different
         | economic outlook for the end-to-end cost of developing that
         | application from prototype through large scale.
        
           | 3cats-in-a-coat wrote:
           | If I grasp the essence of Rama:
           | 
           | - "Depots" are event streams (for event sourced data
           | repositories)
           | 
           | - ETL read one or more streams and project them to indexable
           | read models...
           | 
           | - Which read models are called "PStates" and represent nested
           | combinations of indices like hashtables, b-trees, linked
           | lists and so on. The point of those being they have the data
           | in fast to query way.
           | 
           | - And you have query engine which splits a query into 1+
           | index sub-queries and then aggregates.
           | 
           | Am I missing something, this seems relatively standard event-
           | sourced / CQRS-like architecture, but streamlined to avoid
           | redundancy and reimplementation of common abstractions.
           | 
           | It would've helped if the terms were less obscure than
           | "depots" and "PStates".
        
             | nathanmarz wrote:
             | From the post:
             | 
             |  _Individually, none of these concepts are new. I'm sure
             | you've seen them all before. You may be tempted to dismiss
             | Rama's programming model as just a combination of event
             | sourcing and materialized views. But what Rama does is
             | integrate and generalize these concepts to such an extent
             | that you can build entire backends end-to-end without any
             | of the impedance mismatches or complexity that characterize
             | and overwhelm existing systems._
             | 
             | You have the general model correct, but here are a few
             | clarifications:
             | 
             | - PStates are partitioned, durable, replicated indexes that
             | are represented as arbitrary combinations of data
             | structures. A PState can be as simple an an integer per
             | partition, or it can be complex like a map of lists of maps
             | of sets. PStates allow you to shape your indexes to
             | perfectly match your application's use cases.
             | 
             | - I wouldn't call Rama queries an "engine", as it's
             | considerably more straightforward in how it works than
             | something like SQL. The base query API is called "paths",
             | which are an imperative way to concisely reach into one
             | partition of one PState to fetch or aggregate values.
             | There's also "query topologies" which are predefined, on-
             | demand distributed computations that can fetch and
             | aggregate data from many partitions of many PStates.
        
               | 3cats-in-a-coat wrote:
               | Thanks, I will read more soon! I'm curious... how do you
               | resolve the "impedance mismatch" between some "canonical"
               | models that business decisions are made, based upon,
               | which need to be synchronous with the depots (and
               | mutually synchronous with other models sharing fragments
               | of the same data), and the eventually consistent read
               | models, which have a more lax constraint on how up to
               | date they are?
               | 
               | How do you ensure consistency here? How do you organize
               | it in the data flow?
               | 
               | Say I update a user, because that user seems to still be
               | there in the query result/indexes, but actually an event
               | for this user being deleted has happened some time ago?
               | 
               | This can also happen I suppose of the depots run queries
               | themselves on PState in order to determine if a certain
               | event is valid at all or not, and how exactly to carry it
               | out.
        
               | nathanmarz wrote:
               | The impedance mismatches you're used to from using
               | databases are gone because:
               | 
               | - You can finely tune your indexes to be exactly the
               | optimal shape for your application (data structure). You
               | can see this in our Mastodon implementation with the big
               | variety of data structures we used for all the use cases.
               | - You're generally just using regular Java objects
               | everywhere: appending to depots, during ETL processing,
               | and stored in indexes.
               | 
               | How you coordinate data creation with view updates is a
               | deeper topic, so I'll just summarize one of the basic
               | mechanisms Rama provides for coordinating this. Depot
               | appends can have an "ack level" that determines the
               | conditions before Rama tells you that depot append has
               | completed. The default level is "full ack" which includes
               | all streaming topologies colocated with that depot fully
               | processing that record. With this level, when the depot
               | append completes you know that all associated indexes
               | (PStates) have been updated.
               | 
               | There's also "append ack", which only waits for the depot
               | append to be replicated on the depot, and "no ack", which
               | is fire and forget. These all have their uses depending
               | the specific needs of an application.
        
               | 3cats-in-a-coat wrote:
               | Thanks! So we can see these ACKs as "wait and
               | synchronize" signals I suppose? However how can we ensure
               | an "all or nothing" between all parties trying to ACK a
               | conditions they're mutually dependent on? I.e.
               | transactionality or atomicity?
        
             | dustingetz wrote:
             | you're missing automatic/free linear scaling
        
               | 3cats-in-a-coat wrote:
               | Systems that promise "free linear scaling" without
               | qualifiers either withhold or have not analyzed/realized
               | their bottlenecks yet. Say if there is eventual
               | consistency maybe the "eventuality" becomes so long that
               | the service fails at its purpose. Or the communication
               | link bandwidth is exhausted between key business logic
               | (mutation event generating) services, and so on.
               | 
               | The only systems that scale linearly are stateless
               | systems. Mastodon is not stateless. And even stateless
               | systems hit some bottlenecks eventually, as they exist
               | and run in a scale-variant Universe.
               | 
               | So this claim by itself doesn't immediately impress me,
               | just turns my red lights on, awaiting further
               | investigation. But we can of course discuss why this
               | claim is made and how is it supported. The article is
               | long so I've not had the chance to read it entirely yet.
               | 
               | But we have X number of event streams mapped through Y
               | number of ETLs to produce Z number of read model indices,
               | in a shape that seems to form a highly interlinked DAG,
               | which eventually loops back on itself in terms of message
               | flow. Just the increased cross-chatter here as we
               | introduce more features suggests non-linear scaling.
        
           | yid wrote:
           | > What we're showing is that Rama creates a new era in
           | software engineering where the cost of building applications
           | at scale is radically reduced.
           | 
           | Bold of you to come to HN with the breathless hyperbolic
           | marketing fluff that may work on Twitter...
        
             | nathanmarz wrote:
             | I think we provided a ton of substance backing up that
             | claim, and we will provide even more next week when we
             | release the build of Rama that anyone can use and its
             | corresponding 100k words of documentation.
        
         | Zak wrote:
         | Not from the perspective of this being a demo application to
         | sell Rama. The pitch is that if you use Rama, you can achieve
         | similar results.
        
       | boredumb wrote:
       | neat read but I was expecting to read about twitter migrating and
       | literally 100x savings being had.
        
         | dunk010 wrote:
         | You never know.
        
       | buro9 wrote:
       | Measuring "Twitter Scale" by tweets per second seems to be not
       | how I would measure it.
       | 
       | Updates per second to end users who follow the 7K tweets per
       | second seems more realistic, it's the timelines and notifications
       | that hurt, not the top of ingest tweets per second prior to the
       | fan out... and then of course it's whether you can do that
       | continuously so as not to back up on it.
        
         | nathanmarz wrote:
         | That's why we're saying "at 403 fanout". The bottleneck of
         | Mastodon/Twitter is timeline writes, which is posts/second
         | multiplied by the average number of followers per post. So our
         | instance is doing 1.4M timeline writes / second.
         | 
         | Another important metric is "time to deliver to follower
         | timelines", which is tricky due to how much variance there can
         | be every second due to the extremely unbalanced social graph.
         | When someone with 20M followers posts, that multiples the
         | number of needed timeline writes by 15x. We went into depth in
         | our post on how we handled that to provide fairness by
         | preventing these big users from hogging all the resources all
         | at once.
        
           | faitswulff wrote:
           | I heard somewhere that one of the particular challenges of
           | Twitter's scale is not the average fanout, but the outliers
           | where millions or tens of millions of users follow a single
           | account. Does your simulation take that into account?
        
             | nathanmarz wrote:
             | Yes, we discussed this at length in the post.
        
       | duped wrote:
       | This is what they've been hyping on Twitter for a week?
       | 
       | FWIW, why hype at all? Why "We'll more in a week. Then more in
       | two weeks." Show the code today!
        
         | newaccount74 wrote:
         | Considering the length and amount of detail in this blog post,
         | I understand why they would need another week to get the code
         | ready (assuming there will be more docs)
        
           | nathanmarz wrote:
           | We're releasing 100k words of high-quality documentation next
           | week.
        
       | jitl wrote:
       | This architecture seems very similar to existing offerings in the
       | "in-memory data grid" category, like Apache Ignite and Hazelcast.
       | I'm more familiar with Ignite (I built a toy Notion backend with
       | it over a few afternoons in 2020).
       | 
       | The way Ignite works overall is similar. You make a cluster of
       | JVM processes, your data partitioned and replicated across the
       | cluster, and you upload some JARs of business logic to the
       | cluster to do things. Your business logic can specify locality so
       | it runs on the same nodes as the relevant data, which ideally
       | makes things a lot faster compared to systems where you need to
       | pull all your data across the wire from a DB. Like Rama, Ignite
       | uses a Java API for everything, including serializing and storing
       | plain 'ol java objects.
       | 
       | Ignite's architecture isn't focused on "ETL" into "PStates".
       | Instead it's more about distributed "caches" of data. It does
       | have streaming for ingestion
       | (https://ignite.apache.org/docs/latest/data-streaming), but you
       | can transactionally update the datastore directly
       | (https://ignite.apache.org/docs/latest/key-value-
       | api/transact...). It also has a "continuous query" feature for
       | those reactive queries to retrieve data
       | (https://ignite.apache.org/docs/latest/key-value-
       | api/continuo...).
       | 
       | Rama's data-structure oriented PState index seems easier to work
       | with than building indexes yourself on top of Ignite's KV cache,
       | but Ignite also offers an SQL language, so you can insert your
       | data into the KV cache however, add some custom SQL functions,
       | and then accept more flexible SQL querying of your data compared
       | to the very purpose-built PCache things, but still be able to do
       | lower-level or more performance-oriented logic with data
       | locality.
       | 
       | Anyways, if you like some of this stuff but want to use an
       | existing, already battle-tested open source project, you can look
       | for these "in-memory data grid", "distributed cache", kind of
       | projects. There's a few more out there that have similar JVM
       | cluster computing models.
        
         | theptip wrote:
         | Hazelcast has been on my list to explore for a while. Anyone
         | have pointers to a good sample project / deep-dive in the same
         | sort of spirit as the OP here?
         | 
         | Also would love to hear folks' thoughts on the sort of usecase
         | where this data grid excels.
        
       | 2Gkashmiri wrote:
       | whats the server specs of this demo running at?
       | 
       | is it baremetal?
       | 
       | vps?
       | 
       | how about doing a comparison on consumer grade vps like 1
       | vcpu/4GB ram setup comparison between your product and mastodon
       | or pleroma for example?
       | 
       | i mean sure you can build a twitter scale product but federation
       | means people can do that on their own and with your tech, they
       | dont have to worry about scaling issues.
        
       | polishdude20 wrote:
       | "We spent nine person-months building our scalable Mastodon
       | instance. "
       | 
       | Nono, you can't say that when later on you say it's built on top
       | of Rama. You literally spent 10 years building the framework to
       | even make this.
       | 
       | And yes, you built this in 10k lines of code but how many lines
       | of code is Rama? This seems disingenuous.
        
         | beders wrote:
         | No it is not disingenuous. They didn't built Rama to build a
         | twitter clone.
         | 
         | And you can't take the "twitter engine" out of twitter and
         | build other apps with it. A lot of it is custom built to fit
         | the twitter data model.
         | 
         | Unlike - it seems - Rama.
        
         | scratcheee wrote:
         | Actually took them longer than that even.
         | 
         | They had to invent the computer first, and before that they had
         | to create a universe capable of sustaining both life and
         | computers.
        
         | xmonkee wrote:
         | Going by their claims, they are showing off their generalizable
         | platform, Rama, by building an application on top of it. The
         | application is an example, not the product. For example,
         | someone implements a Todo app on their hot new javascript
         | framework in 10 mins, your objection would be, "But it took you
         | 2 years to make the framework, so actually it took you 2 years
         | plus 10 mins". Why stop there? It also took many years to build
         | the underlying language, networking layers, infrastructure,
         | processors, materials etc etc. You have to draw the line at the
         | point where the application specific code starts and the
         | generalizable platform ends, no?
        
         | bo1024 wrote:
         | Their point is to show off the power of Rama, I.e. it is
         | possible to build such applications on top with little work.
        
           | dunk010 wrote:
           | Exactly, why are so many people missing this point. It's not
           | "we built a narrow, tedious framework for knocking off
           | Twitter clones", it's "We built a platform that turns data
           | processing on its head and look in a couple of months you can
           | clone Twitter just imagine what YOU can do with this."
           | 
           | I see parallels though to Datomic, where they turned the
           | database inside out, co-located the app logic and data and
           | indexes, etc. There are a bunch of great videos on YT about
           | Datomic by Rich Hickey & co, worth a watch and I think shine
           | a light on the approach here, too.
        
             | bo1024 wrote:
             | I think they didn't do a good job making the point clear
             | for people who just clicked the link without context. It
             | starts off talking a lot about the Mastodon clone and then
             | gradually starts talking about Rama as it goes.
        
               | dunk010 wrote:
               | People should probably close the TikTok and pick up a
               | book instead to increase their attention spans then :-D
        
         | colonwqbang wrote:
         | The JVM took years to write. It took decades to develop the
         | technology necessary to build a modern microcomputer. Before
         | that, millennia to invent written language. And now that those
         | platforms (including Rama) all exist, one can deliver a
         | Mastodon server on top of them in about 9 man-months.
        
         | [deleted]
        
       | RomanPushkin wrote:
       | > ...10k lines of code. This is 100x less code than the ~1M lines
       | Twitter
       | 
       | I wish I didn't see this comparison, which is not fair at all.
       | Everyone in their right mind understands that the number of
       | features is much less, that's why you have 10k lines.
       | 
       | Add large-scale distributed live video support at the top of
       | that, and you won't get any close to 10k lines. It's only one of
       | many many examples. I really wish you compare Mastodon to Twitter
       | 0.1 and don't do false advertising
       | 
       | > 100M bots posting 3,500 times per second... to demonstrate its
       | scale
       | 
       | I'm wondering why 100M bots post only 3500 times per second? Is
       | it 3500 per second for each bot? Seems like it's not, since https
       | termination will consume the most of resources in this case. So
       | I'm afraid it's just not enough.
       | 
       | When I worked in Statuspage, we had support of 50-100k requests
       | per second, because this is how it works - you have spikes, and
       | traffic which is not evenly distributed. TBH, if it's only 3500
       | per second total, then I have to admit it is not enough.
        
         | MikePlacid wrote:
         | So
         | 
         | >> 100M bots posting 3,500 times per second...
         | 
         | and
         | 
         | > We used the OpenAI API to generate 50,000 statuses for the
         | bots to choose from at random.
         | 
         | I wonder: 100M OpenAI bots talking to each other continuously
         | and with much vigor - how is this affecting OpenAI's uhm...
         | intellect?
        
           | sdwr wrote:
           | They generated 50,000 statuses once, put them in a text file,
           | and pick between them randomly. So not at all.
        
         | nathanmarz wrote:
         | We're comparing just to the original consumer product, which is
         | about the same as Mastodon is today. That's why we said
         | "original consumer product" and not "Twitter's current consumer
         | product".
         | 
         | Mastodon actually has more features than the original Twitter
         | consumer product like hashtag follows, global timelines, and
         | more sophisticated filtering/muting capabilities.
         | 
         | Some people argue it's not so expensive to build a scalable
         | Twitter with modern tools, which is why we also included the
         | comparison against Threads. That's a very recent data point
         | showing how ridiculously expensive it is to build applications
         | like this, and they didn't even start from scratch as
         | Instagram/Meta already had infrastructure powering similar
         | products.
        
           | doctorpangloss wrote:
           | I work in gaming, so I cannot speak to your specific
           | experiences. Entity Component Systems are extremely
           | performant, really good science, and shipping in middlewares
           | like Unity. However, in order to ship an ECS game, in my
           | experience, you have to have already made your whole game
           | first in a normal approach, in order to have everything be
           | fully specified sufficiently that you can correctly create an
           | ECS implementation. In practice, this means ECS is used to
           | make first person shooters, which have decades of well
           | specified traditions and behavior, and V2 of simulators, like
           | for Cities Skylines 2 and Two Point Campus.
           | 
           | So this is not meant to diminish the achievements of what you
           | have built at all, it is more intellectually honest to say
           | that "any high performance framework is most suitable for
           | projects that are exact clones of pre-existing, mature things
           | with battle-hardened specifications and end user behavior."
           | While this might cover some greenfield projects, including
           | the best capitalized ones that may matter to you, it does
           | diminish the appeal of a framework for the vast majority of
           | success stories from small & poorly capitalized teams. Those
           | small & poor teams are very innovation and serendipity driven
           | and hence rarely copying a pre-existing thing. And even if
           | they try to become well-capitalized, they are almost always
           | doing so by having worked on the thing they are copying
           | already (i.e., already shipping version 1.0 for years).
        
           | deltree7 wrote:
           | [dead]
        
           | ehutch79 wrote:
           | Are you sure about that.
           | 
           | With things like twitter, the ui is not the hard part. Things
           | like moderation are the secret sauce. All the corner cases
           | and support for devopsy stuff likely account for a lot.
           | Routing to specific instances for celebrities and such.
        
             | justrealist wrote:
             | Nathan worked at Twitter so while he might be wrong, I
             | don't think it's reasonable to assume he's just naive
             | http://nathanmarz.com/blog/leaving-twitter.html.
        
         | littlestymaar wrote:
         | > Add large-scale distributed live video support at the top of
         | that, and you won't get any close to 10k lines.
         | 
         | But Twitter isn't, and was never, about live video support:
         | this is pure feature creep and that's how you get headcount
         | inflation and a company that can be run for 17 years without
         | making profit (AKA terrible business).
         | 
         | > When I worked in Statuspage, we had support of 50-100k
         | requests per second
         | 
         | Having served 150kqps in the past as part of a very small team
         | (3 back-end eng.), this isn't necessarily as big of a deal as
         | you make it sound: it mostly depends on your workload and
         | whether or not you need consistency (or even persistence at
         | all) in your data.
         | 
         | In practice, building scalable system is hard mostly because
         | it's hard to get the management forgot their vanity ideas that
         | go against your (their, actually) system's scalability.
        
         | WheelsAtLarge wrote:
         | We see this type of post regularly. Something like, "How I
         | built a better <pick your app> clone by myself in a month."
         | Well, no, usually it's just a bare skeleton with the least
         | amount of functionality. Not only that, the software is the
         | least of the functionality. The organizational structure around
         | the app is what matters most to keep it going. It's an
         | attention seeking ploy and the whole thing usually disappears
         | real quick.
        
         | hosh wrote:
         | How much of Twitter's code base is dedicated to things like
         | security, compliance, and moderation?
         | 
         | Granted, a decentralized platform would eliminate some of
         | those, just by being decentralized
        
           | adventured wrote:
           | None of those things get eliminated by decentralization, they
           | get distributed to whatever the point of control / ownership
           | is.
           | 
           | Mastodon still requires security, compliance and moderation.
           | And those requirements are going to keep getting more
           | challenging by the year. It'll end up being another reason
           | nobody will want to host content in a decentralized manner,
           | the burden will become obnoxious.
        
             | Fomite wrote:
             | During the first wave of Twitter exodus, several people in
             | my professional circle asked if they should be hosting
             | professional, field-specific Mastadon servers, etc.
             | 
             | My answer, born of moderating a modestly sized forum, was
             | "Absolutely not under any circumstances."
        
             | hosh wrote:
             | An organization trying to maintain an ISO certification
             | will have drastically different policies and controls than
             | a small shop, or even a hobbyist group.
             | 
             | Everyone (theoretically) would be complying to statues in
             | its broadest sense, but jurisdiction, regulations, industry
             | best practices, reporting requirements, and appetite for
             | risk is going to be different from organization to
             | organization. It's not one-size-fits-all.
             | 
             | So some of these things are eliminated because the people
             | hosting those are not put under the same kind of scrutiny
             | as say, a Twitter.
        
       ___________________________________________________________________
       (page generated 2023-08-15 23:00 UTC)