[HN Gopher] When imperfect systems are good: Bluesky's lossy tim...
       ___________________________________________________________________
        
       When imperfect systems are good: Bluesky's lossy timelines
        
       Author : cyndunlop
       Score  : 347 points
       Date   : 2025-02-19 17:48 UTC (5 hours ago)
        
 (HTM) web link (jazco.dev)
 (TXT) w3m dump (jazco.dev)
        
       | nightpool wrote:
       | Note that all of this reflects design decisions on Bluesky's
       | closed-source "AppView" server--any federated servers interacting
       | with Bluesky would need to construct their own timelines, and do
       | not get the benefit of the work described here.
        
         | xrisk wrote:
         | What reason does Bluesky give for not opening up their AppView
         | code?
         | 
         | Another notable component that is closed source is the
         | discovery feed generator, where at least there is _some_
         | reason.
        
           | muscomposter wrote:
           | what else? profit by means of doing work that benefits first
           | and foremost the private proprietors of the closed source
           | 
           | if they gave it away (which used to be unfeasible until the
           | digital era) they feel they're loosing their valuable effort
           | which they're wont on concentrating, not diluting.
        
           | verdverm wrote:
           | The App View frontend is open source:
           | https://github.com/bluesky-social/social-app
           | 
           | Much of the backend is open source as well:
           | https://github.com/bluesky-social/atproto/tree/main/packages
           | 
           | What is not are the extra services they run to provide a
           | better and faster UX. Even if it was open source, it likely
           | costs 10s of thousands to run per month (they have moved
           | largely to "onprem" hardware instead of the cloud aiui)
        
             | nightpool wrote:
             | That's the frontend code, it doesn't include the backend
             | API services, which are closed source.
        
               | verdverm wrote:
               | Which is what I said in the second sentence
        
               | nightpool wrote:
               | AppView is a specific term of art within the Bluesky
               | federation architecture:
               | https://atproto.com/guides/glossary#app-view, you were
               | incorrect in identifying the public frontend repo as the
               | AppView.
        
               | verdverm wrote:
               | A frontend is (can be) part of an App View. It is quite
               | literally the app you view the network through. There can
               | also be headless app views and app views which have no
               | backend
        
               | half-kh-hacker wrote:
               | this is not correct
        
               | half-kh-hacker wrote:
               | the backend (the AppView) can be found here:
               | 
               | https://github.com/bluesky-
               | social/atproto/tree/main/packages...
               | 
               | there are various supporting services written in Go as
               | well
               | 
               | https://github.com/bluesky-social/indigo
        
             | half-kh-hacker wrote:
             | that's not the appview, that's the client
        
               | verdverm wrote:
               | App View is a bit fuzzy of a term. To me it seems like a
               | combination of frontend, backend, custom lexicon, and
               | supporting services. There isn't really another place in
               | the spec or design where clients or browsers fit in,
               | which do in fact provide a view of the network via an
               | app.
        
           | dingnuts wrote:
           | when I read the spec it seemed like the operator of an
           | AppView & Relay would be most in need of compensation for
           | their hosting costs due to the amount of demand on those
           | components so I believe the spec allows an operator to
           | implement their own AppView & monetize it as that operator
           | sees fit, so that they can afford to operate the service and
           | maybe even make money off of it so that they can make it
           | their full time jobs.
        
             | verdverm wrote:
             | It seems this way to me as well. ATProto fundamentally
             | changes how monetization works in social media by removing
             | lockin. It's going to be interesting to see what emerges
             | from this design decision.
             | 
             | Another interesting way to view ATProto is that it could be
             | a collection of headless features and network browsers that
             | leverage those feature providers.
        
           | iameli wrote:
           | I asked this and got
           | 
           | > We did a backend rewrite from postgres to scylla and it has
           | a bunch of deployment specific stuff, but is functionally
           | identical to the open source postgres version. Its not really
           | a "v2" in terms of new features, we just made it make use of
           | our hardware really well[1]
           | 
           | [1]: https://bsky.app/profile/iame.li/post/3l7e3jfqit22s
        
             | nightpool wrote:
             | Thanks, so are both the Postgres and Scylla versions
             | maintained in terms of new features?
             | 
             | I wasn't aware that AppView v1 was open source, and the
             | most recent info I'm aware of on the topic is
             | https://alice.bsky.sh/post/3laega7icmi2q,
             | https://github.com/bluesky-social/atproto/discussions/2961
             | and https://docs.bsky.app/docs/advanced-guides/federation-
             | archit..., and everything I've heard about Bluesky was that
             | open source appview is "still coming".
        
               | psionides wrote:
               | It's not coming, it never went away... As I understand
               | it, the "business layer" with all the logic is above the
               | data later, shared by the Postgres and Scylla versions,
               | and the data layer just makes queries to the database. I
               | think they are using the Postgres version locally for
               | development.
        
         | haileyok wrote:
         | This is not true. Third party PDSes are fully supported by our
         | app view, and our app view generates timelines for all the
         | users on those PDSes.
        
           | nightpool wrote:
           | What does this have to do with third party app views?
        
             | psionides wrote:
             | You didn't specify what kind of federated servers you were
             | thinking about
        
             | madeofpalk wrote:
             | The statement "any federated servers interacting with
             | Bluesky" is ambiguous, because Bluesky's federated model
             | means there's many different types of servers, and one
             | user's view of what a "federated server" could be vastly
             | different from another.
             | 
             | Federated PDS-s (which is probably the closest to what
             | people mean when they say they want to federate on bluesky)
             | would not need to reconstruct timelines if their users use
             | the bsky.app appview.
        
               | nightpool wrote:
               | Thanks, that's a fair point that I was overlooking. When
               | I say a "federated server", I don't just mean a self-
               | hosted PDS, I mean a third party app that potentially has
               | its own lexicon and design decisions. Creating a robust
               | third-party app that can meaningfully interact with the
               | Bluesky network is still a very difficult engineering
               | challenge, which I think this article does a good job
               | demonstrating--that was the tension I was trying to
               | underscore in my comment. Bluesky may be solving those
               | engineering challenges for those clients who are
               | satisfied with Bluesky's frontend and AppView, but every
               | single other app built on top of ATProto will have to
               | resolve those same challenges. This is directly
               | downstream from Bluesky's "global firehose" topology and
               | various design decisions that stem from that.
        
         | pfraze wrote:
         | As others have noted, the appview is open source. The dataplane
         | has two implementations, one in postgres and another in scylla.
         | The scylla dataplane is closed, the postgres one is open.
         | 
         | The interesting next stage for the postgres implementation is
         | to create a sync engine for partial syncs of the network, so
         | that an appview can run affordably. We ran some benches on the
         | current state of the postgres implementation and found we could
         | index 300k users on a $100/mo vps. I think with a couple of
         | weeks of optimization that could reach 1mm users.
        
           | nightpool wrote:
           | This is great to hear--my current understanding of the most
           | recent state of the art on the topic is
           | https://alice.bsky.sh/post/3laega7icmi2q which mentions that
           | the self-hosted appview is not yet open source. So I'm glad
           | to hear the situation has changed in the past 3 months.
        
             | psionides wrote:
             | It was open source (except the Scylla database layer) from
             | the beginning, AFAIK - that blog post just says that they
             | haven't set it up yet, because that's the hardest part to
             | run
        
         | evbogue wrote:
         | My thinking has evolved on this topic significantly as of late.
         | My current thinking is we should create a secure gossip network
         | on top of the Bluesky API, and forgot about all the DAG-CBOR
         | stuff that gets stripped from the Jetstream. Hash the posts on
         | the gossip layer and if posts change then diff them. This is
         | all prep for when X billionaire buys out Bluesky then we just
         | pop some signing key crypto on top of this gossip layer and
         | wow! It's distributed!
        
           | pfraze wrote:
           | isnt that ssb?
        
             | evbogue wrote:
             | reverse-ssb
        
       | dang wrote:
       | [stub for offtopicness]
        
         | amazingamazing wrote:
         | I don't understand the infatuation with blue sky. The minute
         | they need money it'll go the way of the Reddit and twitter.
        
           | xrisk wrote:
           | People want the old Twitter, and Bluesky is close to that. It
           | also cosplays being decentralized to people who don't look
           | too closely.
        
             | dom96 wrote:
             | What makes it not decentralised?
        
               | xrisk wrote:
               | The fact that you have to be on "the" relay to
               | meaningfully participate on the network.
               | 
               | If you instead claim that users can always choose to use
               | other 3P relays, then you immediately lose all the nice
               | things that Bluesky is able to do well today (search,
               | discoverability, a "discover" algorithm). Indeed, you
               | fall back to the same old problems that every other
               | decentralized social network has.
               | 
               | Bluesky is just a shittier version of Nostr, except that
               | the people over at Nostr don't pretend.
        
               | immibis wrote:
               | The approximately a million dollars a year that it costs
               | to run another copy.
        
               | BizarroLand wrote:
               | https://dustycloud.org/blog/how-decentralized-is-bluesky/
        
           | Larrikin wrote:
           | If everything good is assumed to eventually become bad, why
           | not use things while they are good and then immediately move
           | on when it becomes bad?
        
             | treyd wrote:
             | Not everything good becomes bad. That premise is wrong.
             | 
             | Bluesky accepted VC money. For a social platform that means
             | its death certificate has already been signed.
             | 
             | What you're ignoring with that framing is that we can use
             | social media that operates outside the VC startup pipeline
             | and doesn't have enshittification baked in from the start.
        
             | sodality2 wrote:
             | Your actions' consequences are not limited to benefiting
             | from the thing like it would for a product - with social
             | media, you improve the networking effect for the soon-to-be
             | bad. (Nothing against bluesky, I don't know or think it
             | will do so)
        
           | VectorLock wrote:
           | People seem to lark on and on about how it has better
           | "default moderation" than Mastodon.
        
             | verdverm wrote:
             | It's not that it is "better" but that the choice is
             | individual, not up to the mastodon server. In Mastodon, you
             | trade Elon for some other group of individuals, so what
             | happens if they make decision on moderation or content you
             | do not agree with?
             | 
             | ATProto is designed around accounts that are independent of
             | data host, application, and moderation, all in the name of
             | giving users individual control over these things. It's
             | like if every Mastodon user ran their own server, but
             | without the overhead
        
               | VectorLock wrote:
               | >It's like if every Mastodon user ran their own server
               | 
               | No, it's like every Mastodon user used the same server,
               | and all the coordination is done by one server that
               | nobody can replicate.
        
               | verdverm wrote:
               | Every user in ATProto gets their own database that
               | amounts to a tar file (technically sqlite with car export
               | format)
               | 
               | This is nothing like having a single server for every
               | user. Perhaps you are confusing Bluesky (one app) with
               | ATProtocol the shared network? There are already
               | independent servers and apps operating separate from
               | Bluesky
        
               | fc417fc802 wrote:
               | Are you suggesting the "big few" can't largely censor a
               | given account?
               | 
               | I don't see how ATProto is doing noticeably better than
               | the scenario where a large ActivityPub instance blocks
               | your external account.
        
               | verdverm wrote:
               | Generally, yes. Currently, because Bluesky requires the
               | use of their labeler if you use their app, this could
               | happen.
               | 
               | Two points of note
               | 
               | 1. You can participate in Bluesky without the Bluesky
               | app, so you can remove this requirement by using an
               | alternative app
               | 
               | 2. The most blocked account is blocked by around 0.25% of
               | the full network (https://clearsky.app/)
               | 
               | This second point does not account for users banned from
               | Bluesky by Bluesky for breaking the ToS or PDS abuse.
        
               | fc417fc802 wrote:
               | > does not account for users banned from Bluesky by
               | Bluesky for breaking the ToS or PDS abuse.
               | 
               | Then you are missing the point. I am asking how much
               | censorship power the largest node in the network has.
               | 
               | If being blocked by the largest provider means 95% of
               | users can't see me anymore then the situation is
               | _strictly worse_ than Mastodon vs ActivityPub-at-large.
        
               | immibis wrote:
               | You have the opportunity to demonstrate this. I am banned
               | from Bluesky. (They didn't tell me why - just a generic
               | "you violated community guidelines")
               | 
               | Tell me, concretely, how people can choose to continue
               | following me, even though I am banned.
               | 
               | Profile: immibis.bsky.social
        
               | verdverm wrote:
               | Create an account you own instead of having someone else
               | run it. Maybe you can get your data, maybe you can ask
               | Bluesky for a review (there were bugs and scaling issues
               | against bot networks that cause false positives)
               | 
               | I'm not seeing that handle resolve in the normal places.
               | Do you have the DID? You should use a custom domain so
               | that you can control the the reference and lookup.
               | 
               | You can run your own PDS and manage complete account
               | lifecycle
        
               | immibis wrote:
               | So after you're banned from Bluesky you create another
               | account on a different server and hope the admins of your
               | original server, which still hosts all the people you
               | want to follow, don't block your new account from
               | interacting with their server?
               | 
               | You said it was different from Mastodon, but how is this
               | different from Mastodon?
        
               | anamexis wrote:
               | Follow the instructions under "Self-hosting PDS" here:
               | https://github.com/bluesky-social/pds
        
         | Boogie_Man wrote:
         | Bluesky is the Conservative Dad Beer of "left" short form
         | social media.
         | 
         | I implore everyone to use something better like Mastodon or
         | maybe minds
        
         | glerk wrote:
         | Bluesky is great technology, but the actual content is just the
         | left-wing version of the truthsocial/gab echo chamber.
        
           | perching_aix wrote:
           | Wow that doesn't sound like a hyperbole at all.
        
           | hooverd wrote:
           | Say what you will about Bluesky, but at least Jay isn't
           | paling around with honest to god neo-nazis.
        
           | timeon wrote:
           | You can add X to the truthsocial/gab group.
        
           | ddejohn wrote:
           | This is such a lazy, uninformed take that people just love to
           | repeat. 1) the left on Bluesky is full of in-fighting because
           | neolib left are convinced that Harris lost because of
           | racism/sexism and the progressive left spend a lot of their
           | time trying to educate (and dunk on) them for their braindead
           | takes, and 2) any social media platform will become an echo
           | chamber if you only choose to follow people that echo your
           | sentiments. As long as Bluesky isn't actively censoring and
           | suspending journalists and other public figures, there is
           | _no_ equivalence to Truthsocial or X and only a clown
           | /shill/psyop would suggest as much.
           | 
           | It's really not that hard to find enriching content from all
           | walks of life on Bluesky -- if somebody can't find it, they
           | just suck at the internet.
           | 
           | To be clear, I _do_ have grievances with Bluesky, and I do
           | not have high hopes for its future -- but that 's because I
           | personally believe that social media in general is both
           | fatally flawed from the start and detrimental to society, and
           | will never _not_ devolve into ad-riddled or otherwise
           | enshittified services. I am not a Bluesky shill, I 'm just
           | here to call out the silly false equivalence with
           | Truthsocial, etc.
        
             | glerk wrote:
             | > the left on Bluesky is full of in-fighting
             | 
             | yes, the right is full of infighting too as shown by the
             | recent H1B debate, that doesn't contradict my point.
             | 
             | > any social media platform will become an echo chamber if
             | you only choose to follow people that echo your sentiments
             | 
             | bluesky is almost 100% political and almost 100% left-wing.
             | There is literally no one else to follow, at least for now.
             | X still has non-political content, I mainly follow AI,
             | technology and cryptocurrency, and I couldn't find similar
             | content on bluesky.
        
               | fullstop wrote:
               | > bluesky is almost 100% political and almost 100% left-
               | wing. There is literally no one else to follow, at least
               | for now. X still has non-political content, I mainly
               | follow AI, technology and cryptocurrency, and I couldn't
               | find similar content on bluesky.
               | 
               | Not op, but chiming in. There's a lot of content
               | regarding aquatics and home automation (separate topics).
               | I avoid the politics stuff entirely, and much of the
               | crypto stuff on X tends to be promoting scams and rug-
               | pulls.
        
               | GlickWick wrote:
               | I use Bluesky and literally only see Gamedev content.
               | Unlike X or whatever, I control what I see.
        
               | gs17 wrote:
               | > bluesky is almost 100% political and almost 100% left-
               | wing.
               | 
               | A big contributor to this feeling is their default
               | "Discover" feed being very mediocre. "Less of this" and
               | "more of this" do not seem to impact what it gives you,
               | neither do what you like, respond to, follow, or who you
               | block. Some days it's entirely cat pictures, other days
               | it's entirely politics (my suggested accounts to follow
               | are 100% of the time in this category). Finding the good
               | content is very difficult, and the handful of accounts I
               | follow are largely accounts I had to manually search for
               | or was given a direct link to somewhere else, which would
               | never have come up naturally. And to try to fix it, I
               | took the advice to use the block feature, er, liberally,
               | and I think it made the problem worse.
               | 
               | I even wouldn't mind the politics being in the feed if it
               | didn't show me the exact same things repeated again and
               | again. I get that determining if two posts are too
               | similar is difficult, but it could at least not show me
               | the same image again and again and again...
               | 
               | I've found
               | https://bsky.app/profile/skyfeed.xyz/feed/discover to be
               | a slightly better version of the Discover feed, but it's
               | a lot less dynamic.
        
         | zoul wrote:
         | I would be so much more interested in Bluesky if it were
         | technically impossible for a random super rich guy to buy and
         | bend it to his whims.
        
           | culi wrote:
           | Isn't that the whole point of bs? Empowering users to take
           | their data where they want. It's completely open-sourced and
           | well-documented. If someone buys bluesky you can move all
           | your data to a different service that follows the same
           | protocol
        
             | plagiarist wrote:
             | Can I move my followers/following graph as well? Moving the
             | actual content is barely a consolation prize if you lose
             | your entire audience in the process.
        
       | nasso_dev wrote:
       | Interesting! I wonder what value they chose for the
       | `reasonable_limit`.
        
         | Retr0id wrote:
         | ought to be possible to reverse-engineer it by following a
         | large number of active accounts and seeing what percentage of
         | their posts actually hit your feed
        
       | bitmasher9 wrote:
       | It's really impressive how well Bluesky is performing. It really
       | feels like a throwback to older social media platforms with its
       | simplicity and lack of dark-patterns. I'm concerned that all the
       | great work on the platform, protocol, etc won't shine in the long
       | term as they eventually need to find a revenue source.
        
         | autobodie wrote:
         | Absolutely. The profit motive is the root of most evil. It is a
         | shame that so many are trained to believe it is the only motive
         | available.
        
           | gkoberger wrote:
           | I completely agree with this... but without profit, people
           | can't get paid, and they'll stop building. I do hate this
           | incredibly need for growth, of course, but financial growth
           | is necessary to pay people and give them raises and allow
           | them to have upward mobility at the company.
           | 
           | I hope Bluesky is able to find a model that works for them
           | AND for consumers. (I do know it's an open protocol, so it'll
           | live on without Bluesky itself! However, as this post shows,
           | it's a lot of work to build on the prototype... so if not
           | them, who? And if someone else, how will they become
           | sustainable?)
        
             | jandrese wrote:
             | At the same time I feel like a lot of companies grow much
             | larger than they need to be simply because of bigger is
             | better mentality. How many of Uber's 30,000ish employees
             | are involved with making sure the app and backend database
             | are working properly? Are they really doing 600 times more
             | work than Craigslist at connecting sellers with buyers?
        
               | bitmasher9 wrote:
               | You cannot compare uber to Craigslist.
               | 
               | Uber takes on so much more responsibility of the
               | transaction. Setting price, handling disputes, real time
               | coordination, etc.
        
               | gkoberger wrote:
               | I'm an Uber hater, but... yes.
               | 
               | Like, sure, they don't need every single one of those
               | 30,000... but they have to have ground teams in every
               | city in the world. Connections with every airport.
               | Connections with almost every restaurant in the world.
               | Customer support and safety (okay I know they don't nail
               | this, but still). They need to pay out drivers in each
               | country. The app needs to work in hundreds of countries,
               | all with different laws, currencies, languages and more.
               | Some places let you pick up anywhere, others require
               | specific locations. And that's not even including
               | marketing, partnerships, HR, finance, etc.
               | 
               | I don't think the employees are the problem with Uber,
               | it's the shareholders. They need to make X back, so that
               | delta is where drivers get squeezed.
        
               | redcobra762 wrote:
               | Aren't you actually arguing in _favor_ of profit-driven
               | behavior? You 're not disagreeing with profit as a
               | motivator, you're questioning if the 30,000 employees is
               | the maximal way to achieve profit.
        
             | tdb7893 wrote:
             | It's semantics but I like to separate money from profits.
             | You need money to pay people and to survive but you don't
             | need to be raking in endlessly growing piles of it. This is
             | something that was really demoralizing about working for a
             | big company, they could be making like 50000000000 a year
             | in just profits but still be ruthless in getting more. Like
             | I just want to make a product I'm proud of and I'm happy
             | living a simple life, I am happier now making less money
             | but not feeling like I'm endlessly milking customers.
        
             | cyberax wrote:
             | On the other hand, running something like BlueSky is not
             | terribly expensive. A foundation with a reasonable
             | endowment can do that indefinitely.
             | 
             | Initially, it can be funded by selling tools that do
             | analytics or by donations (like Wikipedia).
        
               | bbor wrote:
               | Yes! If the venture capitalists that are already involved
               | stick to their stated principles and don't demand eternal
               | growth (which... fingers crossed?), I think bsky has an
               | extremely feasible, promising future.
               | 
               | They've intentionally kept a low footprint to keep
               | expenses down, and while income via donation is out of
               | the picture (unless AT Proto grows into a full ecosystem,
               | I suppose?), cosmetics are a tried-and-true model for
               | supporting something that most users use for free, but
               | that some power users spend all day on and want shiny
               | stuff for. They'll probably end up exploring Discord-
               | esque paywalled features for power users as well, which
               | isn't necessarily _ideal_ but is leagues better than
               | getting on the currently-dying vicious cycle of Display
               | Ads, IMO.
        
               | jarjoura wrote:
               | If Bluesky ever gets close to becoming a serious threat
               | to Meta's walled garden, the effort to fight back against
               | them will take a lot of capital. Just the legal battles
               | alone will cost a fortune.
               | 
               | Wikipedia isn't a threat to anyone, they just have to
               | generate enough capital to exist.
        
             | impossiblefork wrote:
             | Yes, but there is a path, and it's simplicity.
             | 
             | Lichess, is it bad? It basically solves the whole problem.
             | If well-designed distributed social media site could be
             | something like that. Donations are enough to support one
             | guy at least.
        
             | bbor wrote:
             | I totally get/relate to your perspective, but to be the
             | annoying leftie in your ear:
             | 
             | A) Sustainable revenue is a requirement for any company,
             | yes, but the unlimited (above-inflation) growth demanded by
             | most large corporations is absolutely not. Lots and lots of
             | companies operate for a long time without expecting massive
             | growth, raises n' all. MBAs pejoratively call such
             | companies "lifestyle businesses"--as in "just pays for
             | people to live"--but I'd call them "normal, healthy
             | companies".
             | 
             | B) More fundamentally: the idea that a social media network
             | can only be built by a single corporation owned by
             | investors is an omnipresent, yet extremely toxic,
             | assumption. Mastodon represents another extreme end of the
             | capital<->labor spectrum where anyone can contribute to the
             | network at any time with their own instance, but I think
             | Bluesky is a hint of a less-pure--and therefor more
             | feasible--future.
             | 
             | To use the language of my favorite dream, Chomskian
             | Anarcho-Syndicalism: imagine a social media network
             | organized by a democratic non-profit entity akin to the
             | Python or Linux Foundations, that then contracts out work
             | to a hierarchy of smaller, purpose-built teams
             | ("syndicates"), each of which may in turn contract w/ other
             | teams. Each team would have to attract talent and negotiate
             | enough income to pay them sufficiently still, of course,
             | but there would be no team leader to make a surplus profit
             | from the system -- any "surplus" would stay at the non-
             | profit level, and thus necessarily be reinvested back into
             | the product.
             | 
             | In the current system, the reason Bluesky didn't do this
             | off the bat is obvious: no one would loan them startup
             | funds, as ownership investment is the de facto universal
             | way to start up an unproven venture. But we can dream
             | bigger and better, IMHO; both on a smaller scale by
             | building upon already-proven open protocols like AT Proto,
             | and on a larger scale by structuring the state & economy to
             | support this kind of model equally, if not primarily.
        
               | jarjoura wrote:
               | All of the big tech companies today are the result of
               | 100s of smaller, well intentioned tech companies that got
               | acquired into these behemoths.
               | 
               | I always look at how WhatsApp played out as the company.
               | They were the good guys, and didn't want to get acquired.
               | Zuckerberg, almost bankrupt FB at the time giving into
               | all of the ridiculous demands WhatsApp made. No one at
               | WhatsApp thought it was going to happen, until it did and
               | did result in a once-in-a-lifetime transfer of wealth to
               | several hundred employees.
        
             | autobodie wrote:
             | > _but without profit, people can 't get paid, and they'll
             | stop building_
             | 
             | I wholeheartledly disagree. People build things all the
             | time for things other than profit. In fact, most of the
             | greatest things ever built were a loss for those who built
             | them.
             | 
             | Dignity is the best motivator. Profit only supercedes
             | dignity when dignity is not on offer.
        
               | krapp wrote:
               | Profit supercedes dignity when one needs to eat, because
               | one cannot eat dignity.
               | 
               | Being able to spend a significant amount of time and
               | effort on passion projects is a luxury most people can't
               | afford.
        
           | jarjoura wrote:
           | There's no reason Bluesky has emulate what FB Newsfeed and
           | Twitter/X did to solve engagement by promoting certain items
           | over others.
           | 
           | At the very least, they do have hindsight to learn from.
        
           | pessimizer wrote:
           | Bluesky is a private for-profit company that has taken $37M
           | in venture capital.
           | 
           | https://www.piratewires.com/p/interview-with-jack-dorsey-
           | mik...
           | 
           | > That was the second moment I thought, uh, nope. This is
           | literally repeating all the mistakes we made as a company.
           | This is not a protocol that's truly decentralized. It's
           | another app. It's another app that's just kind of following
           | in Twitter's footsteps, but for a different part of the
           | population.
           | 
           | > Everything we wanted around decentralization, everything we
           | wanted in terms of an open source protocol, suddenly became a
           | company with VCs and a board. That's not what I wanted,
           | that's not what I intended to help create.
        
         | mullingitover wrote:
         | They've done an incredible job running with an extremely low
         | headcount and crazy efficient use of hardware. It would be easy
         | to 10x their expenses if they were blindly following the
         | standard cloud deployment playbook. Hopefully this level of
         | efficiency mean they don't have to work as hard and can stay
         | pre-revenue, a pure play, for a very long time.
        
         | culi wrote:
         | I love Mastodon but I have to admit that BlueSky has clearly
         | out-engineered them. Of course they started with much more
         | expertise and resources. I hope ActivityPub compatibility soon
         | to unite the two
        
       | knallfrosch wrote:
       | Anyone following hundreds of thousands of users is obviously a
       | bot account scraping content. I'd ban them and call it a day.
       | 
       | However, I do love reading about the technical challenge. I think
       | Twitter has a special architecture for celebrities with millions
       | of followers. Given Bluesky is a quasi-clone, I wonder why they
       | did not follow in these footsteps.
        
         | psionides wrote:
         | You don't need to follow anyone (or even have an account) to
         | scrape content... Someone following a huge amount of accounts
         | usually wants to get a lot of followers quickly this way
         | through follow-backs.
        
         | ruined wrote:
         | if you want to scrape all the content, that's what the firehose
         | is for, and it's allowed.
         | 
         | the only reason to mass-follow is for spam purposes.
        
           | Retr0id wrote:
           | This does assume that scrapers are smart, and often they're
           | really not. They have infrastructure for scraping HTML from
           | webpages at scale and that is the hammer they use for all
           | nails. (e.g. Wikipedia has to fight off scraper traffic
           | despite full archives being available as torrents, etc.)
           | 
           | In this case I agree though, they're all spammers and/or
           | "clout farmers", or trying to make an account seem more
           | authentic for future scams. They want to generate follow
           | notifications in the hope that some will follow them back
           | (and if they don't, they unfollow again after some interval).
        
             | sarchertech wrote:
             | 100%. I ran a job board where we provided a nice machine
             | readable XML feed of all of our jobs, but we had bots that
             | insisted on using the standard search box. Searching by
             | city using an alphabetized list.
             | 
             | Geographic search to was the most expensive thing they
             | could have done and no matter what we did we couldn't get
             | them to use the XML feed.
             | 
             | I even tried returning a link to the feed when we detected
             | a bot. No dice. They just kept working around the bot
             | detection.
        
         | culi wrote:
         | Maybe not hundreds of thousands but I'd follow anybody that
         | looks remotely interesting and then primarily use customized
         | feeds. E.g. if I wanna hear about union news, my personal irl
         | network, etc I check that feed
        
         | tshaddox wrote:
         | Or just enforce a maximum number of followed accounts.
        
           | ARandumGuy wrote:
           | No matter how high you set a maximum limit for interactions
           | on social media (followers, friends, posts, etc), _someone_
           | will reach the limit and complain about it. I can see why
           | Bluesky would prefer a  "soft limit", where going above the
           | limit will degrade the experience. It gives more flexibility
           | to adjust things later, and prevents obnoxious complaints
           | from power users with outsized influence.
        
             | tshaddox wrote:
             | I'm skeptical that the people who would complain about that
             | wouldn't find something else to complain about if you
             | resolved the first complaint. I'd recommend implementing
             | product features that you think are reasonable and
             | accepting the fact that you will get complaints from people
             | who disagree.
        
         | steveklabnik wrote:
         | > Given Bluesky is a quasi-clone, I wonder why they did not
         | follow in these footsteps.
         | 
         | There are only six users with over a million followers, and
         | none with two million yet.
         | 
         | I'm sure they'll get there.
        
       | ChuckMcM wrote:
       | As a systems enthusiast I enjoy articles like this. It is really
       | easy to get into the mindset of "this must be perfect".
       | 
       | In the Blekko search engine back end we built an index that was
       | 'eventually consistent' which allowed updates to the index to be
       | propagated to the user facing index more quickly, at the expense
       | that two users doing the exact same query would get slightly
       | different results. If they kept doing those same queries they
       | would eventually get the exact same results.
       | 
       | Systems like this bring in a lot of control systems theory
       | because they have the potential to oscillate if there is positive
       | feedback (and in search engines that positive feedback comes from
       | the ranker which is looking at which link you clicked and giving
       | it a higher weight) and it is important that they not go crazy.
       | Some of the most interesting, and most subtle, algorithm work was
       | done keeping that system "critically damped" so that it would
       | converge quickly.
       | 
       | Reading this description of how user's timelines are sharded and
       | the same sorts of feedback loops (in this case 'likes' or
       | 'reposts') sounds like a pretty interesting problem space to
       | explore.
        
         | culi wrote:
         | What became of Blekko?
        
           | an_ko wrote:
           | > It was acquired by IBM in March 2015, and the service was
           | discontinued.
           | 
           | -- https://en.wikipedia.org/wiki/Blekko
           | 
           | Perhaps GP has a more interesting answer though.
        
             | ChuckMcM wrote:
             | That's the correct answer, IBM wanted the crawler mostly to
             | feed Watson. Building a full search engine (crawler,
             | indexer, ranker, API, web application) for the English
             | language was a hell of an accomplishment but by the time
             | Blekko was acquired Google was paying out tens of billions
             | of dollars to people to send _them_ and only them their
             | search queries. For a service that nominally has to live on
             | advertising revenue getting humans to use it was the only
             | way to be net profitable, and you can 't spend billions
             | buying traffic and hope to make it back on advertising as
             | the #3 search engine in the English speaking markets.
             | 
             | There are other ways to monetize search (look at Kagi for
             | example) than advertising. Blekko missed that window
             | though. (too early, Google needed to get a crappy as it is
             | today to make the value of a spam free search engine
             | desirable)
        
               | chrisweekly wrote:
               | Not my Q but thanks for the interesting history.
               | 
               | Also, (for other readers), I'm a huge fan of Kagi. Highly
               | recommended.
        
         | PaulHoule wrote:
         | An airline reservation system has to be perfect (no slack in
         | today's skies), a hotel reservation can be 98% perfect so long
         | as there is some slack and you don't mind putting somebody up
         | in a better room than they paid for from time to time.
         | 
         | A social media system doesn't need to be perfect at all. It was
         | clear to me from the beginning that Bluesky's feeds aren't very
         | fast, not like they are crazy slow, but if it saves money or
         | effort it's no problem if notifications are delayed 30s.
        
           | singleshot_ wrote:
           | Does the fact that an airline booking system must be perfect
           | explain why so many flights are overbooked or cancelled?
        
             | rconti wrote:
             | No, overbooking is a business decision justified by the
             | fact that, statistically, not all passengers will actually
             | show up for their flight, and lower load factors cost
             | money.
        
               | josefresco wrote:
               | What is the "no show" rate?
        
               | nightpool wrote:
               | A 2019 study of 5 European airports in 2019 had no-show
               | rates of 14.4%: https://www.ozion-
               | airport.com/product/comparative-analysis-n...
               | 
               | However, my understanding is that airlines have much more
               | sophisticated per-flight and per-passenger models that
               | calculate the predicted no-show factor based on the
               | historical rates for that particular route (e.g. you're
               | more likely to get more no-shows in business class flying
               | from NYC to SF compared to holiday travelers with a
               | reservation on the Florida Keys)
        
               | SteveNuts wrote:
               | That blows my mind, I would expect maybe 1 or 2
               | passengers per plane at most. I'm trying to think of what
               | factors would cause that many no-shows, it has to be
               | mostly missed connections?
               | 
               | I can't imagine spending hundreds of dollars and just not
               | showing up.
        
               | lhoff wrote:
               | A friend of mine works for a Management Consultancy firm
               | and they have full flex tickets if they miss the 8pm
               | flight home they can take the next one or fly back the
               | next morning. All without additional fees. So I believe
               | business travel is the biggest factor when it comes to
               | missed flights.
               | 
               | Side note: His employer is the biggest client of a major
               | European airline.
        
               | shagie wrote:
               | Delays getting to the airport and missing the plane.
               | Cancelations with full refund. "Hidden city" ticketing.
               | Layover delays. Businesses booking blocks. Flexible
               | flights ( https://www.travelperk.com/guides/flexible-
               | travel/flexible-f... ). Changing / rebooking flights for
               | an earlier or later time.
        
               | packetlost wrote:
               | I'm sure other factors such as sudden illness and
               | migrateable tickets make a sizeable chunk too.
        
               | vidarh wrote:
               | Keep in mind they sell a lot of tickets where one of the
               | features that allows for a premium price is that they
               | allow late cancellations or changes to other flights.
               | Holiday travelers are pretty "reliable", but business
               | travelers might have changed needs at the drop of a hat
               | (say you meet another prospective client on a business
               | trip and decide to stay another day to fit in a face-to-
               | face meeting).
        
               | artee_49 wrote:
               | I think you'll have to pay a team millions to figure that
               | out, it is unlikely to be a static rate but rather
               | decided based on multiple traits like time of year, time
               | of flight, distance of flight, cost of ticket, etc.
        
           | rconti wrote:
           | Especially for a free service!
           | 
           | Think about other ad-supported sites. If you're an engineer
           | working on an ad-supported product, the perfect consistency
           | you strive for in your code is not the product. The product
           | is the sum of all of the content the user sees. And the costs
           | of the tradeoffs you make are paid for by ads.
           | 
           | Am I willing to see 10x more ads for perfect consistency?
           | Definitely not.
        
           | darknavi wrote:
           | It's funny because from my experience airline systems are
           | very imperfect (timing wise).
           | 
           | I (unwisely) tried to purchase an Icelandair ticket via the
           | Chase travel portal. I would get a reservation number, go buy
           | seats on Icelandair's website, and a few days later the
           | entire reservation would vanish into the ether. Rinse and
           | repeat 3x.
           | 
           | I can't remember the exact verbiage, but basically tickets
           | can be "reserved" and "booked". One means the ticket is
           | allocated, and one means the ticket is actually paid for. I
           | eventually sat on the phone with an executive support person
           | as they booked the ticket and got it all the way through. It
           | turns out Chase reserves a ticket on an airline but as an SLA
           | of ~3 days to actually pay for the ticket. Icelandair's
           | requires a ticket to be paid with in 24 hours, so it was
           | timing out.
        
           | gamedever wrote:
           | Miscommunication leads to bad outcomes. One missed message
           | out of order could easily lead to a fight, a lawsuit, a flash
           | mob, threats of violence - that then need to be taken
           | seriously, swatting, DOXxing, etc...
           | 
           | Msg 1: I hate ___insert_controversal_person_category_here___
           | 
           | Msg 2: Is the kind of statement that really sets me off
           | 
           | Msg 1 has a very different meaning if you don't see Msg 2.
        
             | pjc50 wrote:
             | This can already happen without help from the platform.
        
               | gamedever wrote:
               | Sure, but that doesn't mean the platform should make it
               | worse.
               | 
               | Trying to have a conversation on flaky platform is hell.
        
         | gregw134 wrote:
         | Would you be willing to share more about how you guys did click
         | ranking at Blekko? It's an interesting problem.
        
         | snailmailman wrote:
         | I guess I hadn't considered that search engines could be
         | reranking pages on the fly as I click them. I've been seeing my
         | DuckDuckGo results shuffle around for a while now thinking it's
         | an awful bug.
         | 
         | Like I click one page, don't find what I want, and go back
         | thinking "no, I want that other result that was below" and it's
         | an entirely different page with shuffled results, missing the
         | one that I think might have been good.
        
           | PaulHoule wrote:
           | That's connected with a basic usability complaint about
           | current web interfaces, that ads and recommended content
           | aren't stable. You very well might want to engage with an ad
           | after you are done engaging what you wanted to engage with
           | but you might never see it again. Similarly, you might see
           | two or three videos that you want to click on on the side of
           | a YouTube video you're watching but you can only click on one
           | (though if you are thinking ahead you can open these in
           | another tab.)
           | 
           | On top of that immediate frustration, the YouTube style
           | interface here
           | 
           | https://marvelpresentssalo.com/wp-
           | content/uploads/2015/09/id...
           | 
           | collects terrible data for recommendations because, even
           | though it gives them information that you liked the thumbnail
           | for a video, they can't come to any conclusion about whether
           | or not you liked any of the other videos. TikTok, by focusing
           | on one video at a time, collects much better information.
        
           | cgriswald wrote:
           | I don't use DDG, but in my (very limited, just now) testing
           | it doesn't seem to shuffle results unless you reload the page
           | in some way. Is it possible you're browser is reloading the
           | page when you go back? If so, setting DDG to open links in
           | new tabs might fix this problem.
        
         | dwedge wrote:
         | Similar to how Google images loads lower quality blurred
         | thumbnails towards the bottom of the window at first so that
         | the user thinks they loaded faster
        
       | sphars wrote:
       | When I go directly to a user's profile and see all their posts,
       | sometimes one of their posts isn't in my timeline where it should
       | be. I follow less than 100 users on Bluesky, but I guess this
       | explains why I occasionally don't see a user's post in my
       | timeline.
       | 
       | Lossy indeed.
        
         | Eric_WVGG wrote:
         | Are you using an app, website, or combination?
         | 
         | Various clients (I'm writing one) interpret the timeline
         | differently, as a feed that shows literally everything includes
         | could things that most people would find undesirable or
         | irrelevant. (replies to strangers, replies to replies to
         | replies, etc)
        
         | Retr0id wrote:
         | If another user you follow reposted or replied to a post, it
         | can affect its order in your following feed. You shouldn't be
         | seeing any loss as described in the article from following only
         | 100 users.
        
       | trhway wrote:
       | So the system design puts the burden on what seems to be
       | synchronous, not queued, writes to get easy reads. I usually
       | prefer simpler cheaper writes at the cost of more complicated
       | reads as the reads scale and parallelize better.
        
         | pfraze wrote:
         | you're underestimating the read load, by a lot
        
       | skybrian wrote:
       | This design makes sense if you didn't previously have any limit
       | on the number of people an account could follow. But why not have
       | a limit?
        
         | whyrusleeping wrote:
         | people get so up in arms when you suggest there might be a
         | limit on how many people they can follow.
        
       | timewizard wrote:
       | > This process involves looking up all of your followers, then
       | inserting a new row into each of their Timeline tables in reverse
       | chronological order with a reference to your post.
       | 
       | Seriously? Isn't this the nut of your problem right here?
        
         | jsnell wrote:
         | What alternative design did you have in mind, given that a
         | Twitter-like data model of individual follows is likely a
         | strict product requirement?
         | 
         | There are obviously other ways of doing it (doing the timeline
         | propagation in a batch job, fanning out the reads rather than
         | the writes), but they've got their own problems. Probably worse
         | ones.
        
           | pphysch wrote:
           | Wouldn't a hybrid approach makes sense?
           | 
           | Periodically classify users as hot/cold based on their
           | activity, build hot-follower timelines on write, and build
           | cold-follower timelines on read.
        
             | jsnell wrote:
             | You'd still have exactly the same hot write path, it'd just
             | have maybe 50% of the load. That could be a legit
             | optimization, but not having it hardly warrants an
             | incredulous "seriously?" like the OP's.
             | 
             | (And the same for the inverse hybrid strategy of
             | quarantining the writes of highly followed users and
             | handling their fan-out at read time. A neat optimization,
             | and maybe even absolutely once you have accounts with 100M
             | followers. But the vast majority of posts would still be
             | handled via the original strategy.)
        
       | rakoo wrote:
       | Ok I'm curious: since this strategy sacrifices consistency, has
       | anyone thoughts about something that is not full fan-out on reads
       | or on writes ?
       | 
       | Let's imagine something like this: instead of writing to every
       | user's timeline, it is written once for each shard containing at
       | least one follower. This caps the fan-out at write time to
       | hundreds of shards. At read time, getting the content for a given
       | users reads that hot slice and filters actual followers. It
       | definitely has more load but
       | 
       | - the read is still colocated inside the shard, so latency
       | remains low
       | 
       | - for mega-followers the page will not see older entries anyway
       | 
       | There are of course other considerations, but I'm curious about
       | what the load for something like that would look like (and I
       | don't have the data nor infrastructure to test it)
        
       | rconti wrote:
       | > Additionally, beyond this point, it is reasonable for us to not
       | necessarily have a perfect chronology of everything posted by the
       | many thousands of users they follow, but provide enough content
       | that the Timeline always has something new.
       | 
       | While I'm fine with the solution, the wording of this sentence
       | led me to believe that the solution was going to be imperfect
       | chronology, not dropped posts in your feed.
        
       | artee_49 wrote:
       | I am a bit perplexed though as to why they have implemented fan-
       | out in a way that each "page" is blocking fetching further pages,
       | they would not have been affected by the high tail latencies if
       | they had not done this,
       | 
       | "In the case of timelines, each "page" of followers is 10,000
       | users large and each "page" must be fanned out before we fetch
       | the next page. This means that our slowest writes will hold up
       | the fetching and Fanout of the next page."
       | 
       | Basically means that they block on each page, process all the
       | items on the page, and then move on to the next page. Why
       | wouldn't you rather decouple page fetcher and the processing of
       | the pages?
       | 
       | A page fetching activity should be able to continuously keep
       | fetching further set of followers one after another and should
       | not wait for each of the items in the page to be updated to
       | continue.
       | 
       | Something that comes to mind would be to have a fetcher component
       | that fetches pages, stores each page in S3 and publishes the
       | metadata (content) and the S3 location to a queue (SQS) that can
       | be consumed by timeline publishers which can scale independently
       | based on load. You can control the concurrency in this system
       | much better, and you could also partition based on the shards
       | with another system like Kafka by utilizing the shards as keys in
       | the queue to even "slow down" the work without having to
       | effectively drop tweets from timelines (timelines are eventually
       | consistent regardless).
       | 
       | I feel like I'm missing something and there's a valid reason to
       | do it this way.
        
         | abound wrote:
         | I interpreted this as a batch write, e.g. "write these 10k
         | entries and then come back". The benefit of that is way less
         | overhead versus 10k concurrent background routines each writing
         | individual rows to the DB. The downside is, as you've noted,
         | that you can't "stream" new writes in as older ones finish.
         | 
         | There's a tradeoff here between batch size and concurrency, but
         | perhaps they've already benchmarked it and "single-threaded"
         | batches of 10k writes performed best.
        
       | exabrial wrote:
       | I honestly am annoyed to use websites and services like this.
       | Annoys the crap out of me and everyone else, but since it's petty
       | much forced down their throats, the "eventually" is "eventually
       | everyone stops complaining".
        
       | einpoklum wrote:
       | Centrally-controlled social media platforms are not a good thing,
       | period. Neither Twitter/X, nor BlueSky. Let's not fete them.
        
       | mifydev wrote:
       | "Hot Shards in Your Area" - 10/10 heading
        
       | NoGravitas wrote:
       | The funny thing is that all of the centralization in Bluesky is
       | defended as being necessary to provide things like global search
       | and all replies in a thread, things that Mastodon simply punts on
       | in the name of decentralization. But then ultimately, Bluesky has
       | to relax those goals after all.
        
       | arcastroe wrote:
       | I found it odd to base the loss-factor on the number of people
       | you follow, rather than a truer indication of timeline-update-
       | frequency. What if I follow 4k accounts, but each of those
       | accounts only posts once a decade? My timeline would be become
       | unnecessarily lossy.
        
       | crabbone wrote:
       | Anecdotally, I ran into a similar solution "by chance".
       | 
       | Long ago, I worked for a dating site. Our CTO at the time was a
       | "guest of honor" who was brought in by a family friend who was
       | working in the marketing at the time. The CTO was a university
       | professor who took on a job as a courtesy (he didn't need the
       | money nor fame, he had enough of both, and actually liked
       | teaching).
       | 
       | But he instituted a lot of experimental practices in the company.
       | S.a. switching roles every now and then (anyone in the company
       | could apply for a different role except administration and try
       | themselves wearing a different hat), or having company-wide
       | discussions of problems where employees would have to prepare a
       | presentation on their current work (that was very unusual at the
       | time, but the practice became more institutional in larger
       | companies afterwards).
       | 
       | Once he announced a contest for the problem he was trying to
       | solve. Since we were building a dating site, the obvious problem
       | was matching. The problem was that the more properties there were
       | to match on, the longer it would take (beside other problems that
       | is). So, the program was punishing site users who took time to
       | fill out the questionnaires as well as they could and favored the
       | "slackers".
       | 
       | I didn't have any bright ideas on how to optimize the matching /
       | search for matches. So, ironically, I asked "what if we just
       | threw away properties beyond certain threshold randomly?" I was
       | surprised that my idea received any traction at all. And the
       | answer was along the lines of "that would definitely work, but I
       | wouldn't know how to explain this behavior to the users". Which,
       | at the time, I took to be yet another eccentricity of the old
       | man... but hey, the idea stuck with me for a long time!
        
       | cavisne wrote:
       | AWS has a cool general approach to this problem (one badly
       | behaving user effecting others on their shard)
       | 
       | https://aws.amazon.com/builders-library/workload-isolation-u...
       | 
       | The basic idea is to assign each user to multiple shards,
       | decreasing the changes of another user sharing _all_ their shards
       | with the badly behaving user.
       | 
       | Fixing this issue as described in the article makes sense, but if
       | they did shuffle sharding in the first place it would cover any
       | new issues without effecting many other users.
        
         | artee_49 wrote:
         | I think shuffle sharding is beneficial for read-only replica
         | cases, not for writing scenarios like this. You'll have to
         | write to the primary and not to a "virtual node". Right? Or am
         | I understand it incorrectly? I just read that article now.
        
       | pornel wrote:
       | I wonder why timelines aren't implemented as a hybrid gather-
       | scatter choosing strategy depending on account popularity (a
       | combination of fan-out to followers and a lazy fetch of popular
       | followed accounts when follower's timeline is served).
       | 
       | When you have a celebrity account, instead of fanning out every
       | message to millions of followers' timelines, it would be cheaper
       | to do nothing when the celebrity posts, and later when serving
       | each follower's timeline, fetch the celebrity's posts and merge
       | them into the timeline. When millions of followers do that, it
       | will be cheap read-only fetch from a hot cache.
        
         | ericvolp12 wrote:
         | This is probably what we'll end up with in the long-run. Things
         | have been fast enough without it (aside from this issue) but
         | there's a lot of low-hanging fruit for Timelines architecture
         | updates. We're spread pretty thin from a engineering-hours
         | standpoint atm so there's a lot of intense prioritization going
         | on.
        
         | locusofself wrote:
         | Why do they "insert" even non-celebrity posts into each
         | follower's timeline? That is not intuitive to me.
        
       | JadeNB wrote:
       | I understand that it's a different point, but how can someone
       | write a whole essay called "When imperfect systems are good"
       | without once mentioning Gabriel or
       | https://en.wikipedia.org/wiki/Worse_is_better?
        
       ___________________________________________________________________
       (page generated 2025-02-19 23:00 UTC)