[HN Gopher] When imperfect systems are good: Bluesky's lossy tim...
___________________________________________________________________
When imperfect systems are good: Bluesky's lossy timelines
Author : cyndunlop
Score : 347 points
Date : 2025-02-19 17:48 UTC (5 hours ago)
(HTM) web link (jazco.dev)
(TXT) w3m dump (jazco.dev)
| nightpool wrote:
| Note that all of this reflects design decisions on Bluesky's
| closed-source "AppView" server--any federated servers interacting
| with Bluesky would need to construct their own timelines, and do
| not get the benefit of the work described here.
| xrisk wrote:
| What reason does Bluesky give for not opening up their AppView
| code?
|
| Another notable component that is closed source is the
| discovery feed generator, where at least there is _some_
| reason.
| muscomposter wrote:
| what else? profit by means of doing work that benefits first
| and foremost the private proprietors of the closed source
|
| if they gave it away (which used to be unfeasible until the
| digital era) they feel they're loosing their valuable effort
| which they're wont on concentrating, not diluting.
| verdverm wrote:
| The App View frontend is open source:
| https://github.com/bluesky-social/social-app
|
| Much of the backend is open source as well:
| https://github.com/bluesky-social/atproto/tree/main/packages
|
| What is not are the extra services they run to provide a
| better and faster UX. Even if it was open source, it likely
| costs 10s of thousands to run per month (they have moved
| largely to "onprem" hardware instead of the cloud aiui)
| nightpool wrote:
| That's the frontend code, it doesn't include the backend
| API services, which are closed source.
| verdverm wrote:
| Which is what I said in the second sentence
| nightpool wrote:
| AppView is a specific term of art within the Bluesky
| federation architecture:
| https://atproto.com/guides/glossary#app-view, you were
| incorrect in identifying the public frontend repo as the
| AppView.
| verdverm wrote:
| A frontend is (can be) part of an App View. It is quite
| literally the app you view the network through. There can
| also be headless app views and app views which have no
| backend
| half-kh-hacker wrote:
| this is not correct
| half-kh-hacker wrote:
| the backend (the AppView) can be found here:
|
| https://github.com/bluesky-
| social/atproto/tree/main/packages...
|
| there are various supporting services written in Go as
| well
|
| https://github.com/bluesky-social/indigo
| half-kh-hacker wrote:
| that's not the appview, that's the client
| verdverm wrote:
| App View is a bit fuzzy of a term. To me it seems like a
| combination of frontend, backend, custom lexicon, and
| supporting services. There isn't really another place in
| the spec or design where clients or browsers fit in,
| which do in fact provide a view of the network via an
| app.
| dingnuts wrote:
| when I read the spec it seemed like the operator of an
| AppView & Relay would be most in need of compensation for
| their hosting costs due to the amount of demand on those
| components so I believe the spec allows an operator to
| implement their own AppView & monetize it as that operator
| sees fit, so that they can afford to operate the service and
| maybe even make money off of it so that they can make it
| their full time jobs.
| verdverm wrote:
| It seems this way to me as well. ATProto fundamentally
| changes how monetization works in social media by removing
| lockin. It's going to be interesting to see what emerges
| from this design decision.
|
| Another interesting way to view ATProto is that it could be
| a collection of headless features and network browsers that
| leverage those feature providers.
| iameli wrote:
| I asked this and got
|
| > We did a backend rewrite from postgres to scylla and it has
| a bunch of deployment specific stuff, but is functionally
| identical to the open source postgres version. Its not really
| a "v2" in terms of new features, we just made it make use of
| our hardware really well[1]
|
| [1]: https://bsky.app/profile/iame.li/post/3l7e3jfqit22s
| nightpool wrote:
| Thanks, so are both the Postgres and Scylla versions
| maintained in terms of new features?
|
| I wasn't aware that AppView v1 was open source, and the
| most recent info I'm aware of on the topic is
| https://alice.bsky.sh/post/3laega7icmi2q,
| https://github.com/bluesky-social/atproto/discussions/2961
| and https://docs.bsky.app/docs/advanced-guides/federation-
| archit..., and everything I've heard about Bluesky was that
| open source appview is "still coming".
| psionides wrote:
| It's not coming, it never went away... As I understand
| it, the "business layer" with all the logic is above the
| data later, shared by the Postgres and Scylla versions,
| and the data layer just makes queries to the database. I
| think they are using the Postgres version locally for
| development.
| haileyok wrote:
| This is not true. Third party PDSes are fully supported by our
| app view, and our app view generates timelines for all the
| users on those PDSes.
| nightpool wrote:
| What does this have to do with third party app views?
| psionides wrote:
| You didn't specify what kind of federated servers you were
| thinking about
| madeofpalk wrote:
| The statement "any federated servers interacting with
| Bluesky" is ambiguous, because Bluesky's federated model
| means there's many different types of servers, and one
| user's view of what a "federated server" could be vastly
| different from another.
|
| Federated PDS-s (which is probably the closest to what
| people mean when they say they want to federate on bluesky)
| would not need to reconstruct timelines if their users use
| the bsky.app appview.
| nightpool wrote:
| Thanks, that's a fair point that I was overlooking. When
| I say a "federated server", I don't just mean a self-
| hosted PDS, I mean a third party app that potentially has
| its own lexicon and design decisions. Creating a robust
| third-party app that can meaningfully interact with the
| Bluesky network is still a very difficult engineering
| challenge, which I think this article does a good job
| demonstrating--that was the tension I was trying to
| underscore in my comment. Bluesky may be solving those
| engineering challenges for those clients who are
| satisfied with Bluesky's frontend and AppView, but every
| single other app built on top of ATProto will have to
| resolve those same challenges. This is directly
| downstream from Bluesky's "global firehose" topology and
| various design decisions that stem from that.
| pfraze wrote:
| As others have noted, the appview is open source. The dataplane
| has two implementations, one in postgres and another in scylla.
| The scylla dataplane is closed, the postgres one is open.
|
| The interesting next stage for the postgres implementation is
| to create a sync engine for partial syncs of the network, so
| that an appview can run affordably. We ran some benches on the
| current state of the postgres implementation and found we could
| index 300k users on a $100/mo vps. I think with a couple of
| weeks of optimization that could reach 1mm users.
| nightpool wrote:
| This is great to hear--my current understanding of the most
| recent state of the art on the topic is
| https://alice.bsky.sh/post/3laega7icmi2q which mentions that
| the self-hosted appview is not yet open source. So I'm glad
| to hear the situation has changed in the past 3 months.
| psionides wrote:
| It was open source (except the Scylla database layer) from
| the beginning, AFAIK - that blog post just says that they
| haven't set it up yet, because that's the hardest part to
| run
| evbogue wrote:
| My thinking has evolved on this topic significantly as of late.
| My current thinking is we should create a secure gossip network
| on top of the Bluesky API, and forgot about all the DAG-CBOR
| stuff that gets stripped from the Jetstream. Hash the posts on
| the gossip layer and if posts change then diff them. This is
| all prep for when X billionaire buys out Bluesky then we just
| pop some signing key crypto on top of this gossip layer and
| wow! It's distributed!
| pfraze wrote:
| isnt that ssb?
| evbogue wrote:
| reverse-ssb
| dang wrote:
| [stub for offtopicness]
| amazingamazing wrote:
| I don't understand the infatuation with blue sky. The minute
| they need money it'll go the way of the Reddit and twitter.
| xrisk wrote:
| People want the old Twitter, and Bluesky is close to that. It
| also cosplays being decentralized to people who don't look
| too closely.
| dom96 wrote:
| What makes it not decentralised?
| xrisk wrote:
| The fact that you have to be on "the" relay to
| meaningfully participate on the network.
|
| If you instead claim that users can always choose to use
| other 3P relays, then you immediately lose all the nice
| things that Bluesky is able to do well today (search,
| discoverability, a "discover" algorithm). Indeed, you
| fall back to the same old problems that every other
| decentralized social network has.
|
| Bluesky is just a shittier version of Nostr, except that
| the people over at Nostr don't pretend.
| immibis wrote:
| The approximately a million dollars a year that it costs
| to run another copy.
| BizarroLand wrote:
| https://dustycloud.org/blog/how-decentralized-is-bluesky/
| Larrikin wrote:
| If everything good is assumed to eventually become bad, why
| not use things while they are good and then immediately move
| on when it becomes bad?
| treyd wrote:
| Not everything good becomes bad. That premise is wrong.
|
| Bluesky accepted VC money. For a social platform that means
| its death certificate has already been signed.
|
| What you're ignoring with that framing is that we can use
| social media that operates outside the VC startup pipeline
| and doesn't have enshittification baked in from the start.
| sodality2 wrote:
| Your actions' consequences are not limited to benefiting
| from the thing like it would for a product - with social
| media, you improve the networking effect for the soon-to-be
| bad. (Nothing against bluesky, I don't know or think it
| will do so)
| VectorLock wrote:
| People seem to lark on and on about how it has better
| "default moderation" than Mastodon.
| verdverm wrote:
| It's not that it is "better" but that the choice is
| individual, not up to the mastodon server. In Mastodon, you
| trade Elon for some other group of individuals, so what
| happens if they make decision on moderation or content you
| do not agree with?
|
| ATProto is designed around accounts that are independent of
| data host, application, and moderation, all in the name of
| giving users individual control over these things. It's
| like if every Mastodon user ran their own server, but
| without the overhead
| VectorLock wrote:
| >It's like if every Mastodon user ran their own server
|
| No, it's like every Mastodon user used the same server,
| and all the coordination is done by one server that
| nobody can replicate.
| verdverm wrote:
| Every user in ATProto gets their own database that
| amounts to a tar file (technically sqlite with car export
| format)
|
| This is nothing like having a single server for every
| user. Perhaps you are confusing Bluesky (one app) with
| ATProtocol the shared network? There are already
| independent servers and apps operating separate from
| Bluesky
| fc417fc802 wrote:
| Are you suggesting the "big few" can't largely censor a
| given account?
|
| I don't see how ATProto is doing noticeably better than
| the scenario where a large ActivityPub instance blocks
| your external account.
| verdverm wrote:
| Generally, yes. Currently, because Bluesky requires the
| use of their labeler if you use their app, this could
| happen.
|
| Two points of note
|
| 1. You can participate in Bluesky without the Bluesky
| app, so you can remove this requirement by using an
| alternative app
|
| 2. The most blocked account is blocked by around 0.25% of
| the full network (https://clearsky.app/)
|
| This second point does not account for users banned from
| Bluesky by Bluesky for breaking the ToS or PDS abuse.
| fc417fc802 wrote:
| > does not account for users banned from Bluesky by
| Bluesky for breaking the ToS or PDS abuse.
|
| Then you are missing the point. I am asking how much
| censorship power the largest node in the network has.
|
| If being blocked by the largest provider means 95% of
| users can't see me anymore then the situation is
| _strictly worse_ than Mastodon vs ActivityPub-at-large.
| immibis wrote:
| You have the opportunity to demonstrate this. I am banned
| from Bluesky. (They didn't tell me why - just a generic
| "you violated community guidelines")
|
| Tell me, concretely, how people can choose to continue
| following me, even though I am banned.
|
| Profile: immibis.bsky.social
| verdverm wrote:
| Create an account you own instead of having someone else
| run it. Maybe you can get your data, maybe you can ask
| Bluesky for a review (there were bugs and scaling issues
| against bot networks that cause false positives)
|
| I'm not seeing that handle resolve in the normal places.
| Do you have the DID? You should use a custom domain so
| that you can control the the reference and lookup.
|
| You can run your own PDS and manage complete account
| lifecycle
| immibis wrote:
| So after you're banned from Bluesky you create another
| account on a different server and hope the admins of your
| original server, which still hosts all the people you
| want to follow, don't block your new account from
| interacting with their server?
|
| You said it was different from Mastodon, but how is this
| different from Mastodon?
| anamexis wrote:
| Follow the instructions under "Self-hosting PDS" here:
| https://github.com/bluesky-social/pds
| Boogie_Man wrote:
| Bluesky is the Conservative Dad Beer of "left" short form
| social media.
|
| I implore everyone to use something better like Mastodon or
| maybe minds
| glerk wrote:
| Bluesky is great technology, but the actual content is just the
| left-wing version of the truthsocial/gab echo chamber.
| perching_aix wrote:
| Wow that doesn't sound like a hyperbole at all.
| hooverd wrote:
| Say what you will about Bluesky, but at least Jay isn't
| paling around with honest to god neo-nazis.
| timeon wrote:
| You can add X to the truthsocial/gab group.
| ddejohn wrote:
| This is such a lazy, uninformed take that people just love to
| repeat. 1) the left on Bluesky is full of in-fighting because
| neolib left are convinced that Harris lost because of
| racism/sexism and the progressive left spend a lot of their
| time trying to educate (and dunk on) them for their braindead
| takes, and 2) any social media platform will become an echo
| chamber if you only choose to follow people that echo your
| sentiments. As long as Bluesky isn't actively censoring and
| suspending journalists and other public figures, there is
| _no_ equivalence to Truthsocial or X and only a clown
| /shill/psyop would suggest as much.
|
| It's really not that hard to find enriching content from all
| walks of life on Bluesky -- if somebody can't find it, they
| just suck at the internet.
|
| To be clear, I _do_ have grievances with Bluesky, and I do
| not have high hopes for its future -- but that 's because I
| personally believe that social media in general is both
| fatally flawed from the start and detrimental to society, and
| will never _not_ devolve into ad-riddled or otherwise
| enshittified services. I am not a Bluesky shill, I 'm just
| here to call out the silly false equivalence with
| Truthsocial, etc.
| glerk wrote:
| > the left on Bluesky is full of in-fighting
|
| yes, the right is full of infighting too as shown by the
| recent H1B debate, that doesn't contradict my point.
|
| > any social media platform will become an echo chamber if
| you only choose to follow people that echo your sentiments
|
| bluesky is almost 100% political and almost 100% left-wing.
| There is literally no one else to follow, at least for now.
| X still has non-political content, I mainly follow AI,
| technology and cryptocurrency, and I couldn't find similar
| content on bluesky.
| fullstop wrote:
| > bluesky is almost 100% political and almost 100% left-
| wing. There is literally no one else to follow, at least
| for now. X still has non-political content, I mainly
| follow AI, technology and cryptocurrency, and I couldn't
| find similar content on bluesky.
|
| Not op, but chiming in. There's a lot of content
| regarding aquatics and home automation (separate topics).
| I avoid the politics stuff entirely, and much of the
| crypto stuff on X tends to be promoting scams and rug-
| pulls.
| GlickWick wrote:
| I use Bluesky and literally only see Gamedev content.
| Unlike X or whatever, I control what I see.
| gs17 wrote:
| > bluesky is almost 100% political and almost 100% left-
| wing.
|
| A big contributor to this feeling is their default
| "Discover" feed being very mediocre. "Less of this" and
| "more of this" do not seem to impact what it gives you,
| neither do what you like, respond to, follow, or who you
| block. Some days it's entirely cat pictures, other days
| it's entirely politics (my suggested accounts to follow
| are 100% of the time in this category). Finding the good
| content is very difficult, and the handful of accounts I
| follow are largely accounts I had to manually search for
| or was given a direct link to somewhere else, which would
| never have come up naturally. And to try to fix it, I
| took the advice to use the block feature, er, liberally,
| and I think it made the problem worse.
|
| I even wouldn't mind the politics being in the feed if it
| didn't show me the exact same things repeated again and
| again. I get that determining if two posts are too
| similar is difficult, but it could at least not show me
| the same image again and again and again...
|
| I've found
| https://bsky.app/profile/skyfeed.xyz/feed/discover to be
| a slightly better version of the Discover feed, but it's
| a lot less dynamic.
| zoul wrote:
| I would be so much more interested in Bluesky if it were
| technically impossible for a random super rich guy to buy and
| bend it to his whims.
| culi wrote:
| Isn't that the whole point of bs? Empowering users to take
| their data where they want. It's completely open-sourced and
| well-documented. If someone buys bluesky you can move all
| your data to a different service that follows the same
| protocol
| plagiarist wrote:
| Can I move my followers/following graph as well? Moving the
| actual content is barely a consolation prize if you lose
| your entire audience in the process.
| nasso_dev wrote:
| Interesting! I wonder what value they chose for the
| `reasonable_limit`.
| Retr0id wrote:
| ought to be possible to reverse-engineer it by following a
| large number of active accounts and seeing what percentage of
| their posts actually hit your feed
| bitmasher9 wrote:
| It's really impressive how well Bluesky is performing. It really
| feels like a throwback to older social media platforms with its
| simplicity and lack of dark-patterns. I'm concerned that all the
| great work on the platform, protocol, etc won't shine in the long
| term as they eventually need to find a revenue source.
| autobodie wrote:
| Absolutely. The profit motive is the root of most evil. It is a
| shame that so many are trained to believe it is the only motive
| available.
| gkoberger wrote:
| I completely agree with this... but without profit, people
| can't get paid, and they'll stop building. I do hate this
| incredibly need for growth, of course, but financial growth
| is necessary to pay people and give them raises and allow
| them to have upward mobility at the company.
|
| I hope Bluesky is able to find a model that works for them
| AND for consumers. (I do know it's an open protocol, so it'll
| live on without Bluesky itself! However, as this post shows,
| it's a lot of work to build on the prototype... so if not
| them, who? And if someone else, how will they become
| sustainable?)
| jandrese wrote:
| At the same time I feel like a lot of companies grow much
| larger than they need to be simply because of bigger is
| better mentality. How many of Uber's 30,000ish employees
| are involved with making sure the app and backend database
| are working properly? Are they really doing 600 times more
| work than Craigslist at connecting sellers with buyers?
| bitmasher9 wrote:
| You cannot compare uber to Craigslist.
|
| Uber takes on so much more responsibility of the
| transaction. Setting price, handling disputes, real time
| coordination, etc.
| gkoberger wrote:
| I'm an Uber hater, but... yes.
|
| Like, sure, they don't need every single one of those
| 30,000... but they have to have ground teams in every
| city in the world. Connections with every airport.
| Connections with almost every restaurant in the world.
| Customer support and safety (okay I know they don't nail
| this, but still). They need to pay out drivers in each
| country. The app needs to work in hundreds of countries,
| all with different laws, currencies, languages and more.
| Some places let you pick up anywhere, others require
| specific locations. And that's not even including
| marketing, partnerships, HR, finance, etc.
|
| I don't think the employees are the problem with Uber,
| it's the shareholders. They need to make X back, so that
| delta is where drivers get squeezed.
| redcobra762 wrote:
| Aren't you actually arguing in _favor_ of profit-driven
| behavior? You 're not disagreeing with profit as a
| motivator, you're questioning if the 30,000 employees is
| the maximal way to achieve profit.
| tdb7893 wrote:
| It's semantics but I like to separate money from profits.
| You need money to pay people and to survive but you don't
| need to be raking in endlessly growing piles of it. This is
| something that was really demoralizing about working for a
| big company, they could be making like 50000000000 a year
| in just profits but still be ruthless in getting more. Like
| I just want to make a product I'm proud of and I'm happy
| living a simple life, I am happier now making less money
| but not feeling like I'm endlessly milking customers.
| cyberax wrote:
| On the other hand, running something like BlueSky is not
| terribly expensive. A foundation with a reasonable
| endowment can do that indefinitely.
|
| Initially, it can be funded by selling tools that do
| analytics or by donations (like Wikipedia).
| bbor wrote:
| Yes! If the venture capitalists that are already involved
| stick to their stated principles and don't demand eternal
| growth (which... fingers crossed?), I think bsky has an
| extremely feasible, promising future.
|
| They've intentionally kept a low footprint to keep
| expenses down, and while income via donation is out of
| the picture (unless AT Proto grows into a full ecosystem,
| I suppose?), cosmetics are a tried-and-true model for
| supporting something that most users use for free, but
| that some power users spend all day on and want shiny
| stuff for. They'll probably end up exploring Discord-
| esque paywalled features for power users as well, which
| isn't necessarily _ideal_ but is leagues better than
| getting on the currently-dying vicious cycle of Display
| Ads, IMO.
| jarjoura wrote:
| If Bluesky ever gets close to becoming a serious threat
| to Meta's walled garden, the effort to fight back against
| them will take a lot of capital. Just the legal battles
| alone will cost a fortune.
|
| Wikipedia isn't a threat to anyone, they just have to
| generate enough capital to exist.
| impossiblefork wrote:
| Yes, but there is a path, and it's simplicity.
|
| Lichess, is it bad? It basically solves the whole problem.
| If well-designed distributed social media site could be
| something like that. Donations are enough to support one
| guy at least.
| bbor wrote:
| I totally get/relate to your perspective, but to be the
| annoying leftie in your ear:
|
| A) Sustainable revenue is a requirement for any company,
| yes, but the unlimited (above-inflation) growth demanded by
| most large corporations is absolutely not. Lots and lots of
| companies operate for a long time without expecting massive
| growth, raises n' all. MBAs pejoratively call such
| companies "lifestyle businesses"--as in "just pays for
| people to live"--but I'd call them "normal, healthy
| companies".
|
| B) More fundamentally: the idea that a social media network
| can only be built by a single corporation owned by
| investors is an omnipresent, yet extremely toxic,
| assumption. Mastodon represents another extreme end of the
| capital<->labor spectrum where anyone can contribute to the
| network at any time with their own instance, but I think
| Bluesky is a hint of a less-pure--and therefor more
| feasible--future.
|
| To use the language of my favorite dream, Chomskian
| Anarcho-Syndicalism: imagine a social media network
| organized by a democratic non-profit entity akin to the
| Python or Linux Foundations, that then contracts out work
| to a hierarchy of smaller, purpose-built teams
| ("syndicates"), each of which may in turn contract w/ other
| teams. Each team would have to attract talent and negotiate
| enough income to pay them sufficiently still, of course,
| but there would be no team leader to make a surplus profit
| from the system -- any "surplus" would stay at the non-
| profit level, and thus necessarily be reinvested back into
| the product.
|
| In the current system, the reason Bluesky didn't do this
| off the bat is obvious: no one would loan them startup
| funds, as ownership investment is the de facto universal
| way to start up an unproven venture. But we can dream
| bigger and better, IMHO; both on a smaller scale by
| building upon already-proven open protocols like AT Proto,
| and on a larger scale by structuring the state & economy to
| support this kind of model equally, if not primarily.
| jarjoura wrote:
| All of the big tech companies today are the result of
| 100s of smaller, well intentioned tech companies that got
| acquired into these behemoths.
|
| I always look at how WhatsApp played out as the company.
| They were the good guys, and didn't want to get acquired.
| Zuckerberg, almost bankrupt FB at the time giving into
| all of the ridiculous demands WhatsApp made. No one at
| WhatsApp thought it was going to happen, until it did and
| did result in a once-in-a-lifetime transfer of wealth to
| several hundred employees.
| autobodie wrote:
| > _but without profit, people can 't get paid, and they'll
| stop building_
|
| I wholeheartledly disagree. People build things all the
| time for things other than profit. In fact, most of the
| greatest things ever built were a loss for those who built
| them.
|
| Dignity is the best motivator. Profit only supercedes
| dignity when dignity is not on offer.
| krapp wrote:
| Profit supercedes dignity when one needs to eat, because
| one cannot eat dignity.
|
| Being able to spend a significant amount of time and
| effort on passion projects is a luxury most people can't
| afford.
| jarjoura wrote:
| There's no reason Bluesky has emulate what FB Newsfeed and
| Twitter/X did to solve engagement by promoting certain items
| over others.
|
| At the very least, they do have hindsight to learn from.
| pessimizer wrote:
| Bluesky is a private for-profit company that has taken $37M
| in venture capital.
|
| https://www.piratewires.com/p/interview-with-jack-dorsey-
| mik...
|
| > That was the second moment I thought, uh, nope. This is
| literally repeating all the mistakes we made as a company.
| This is not a protocol that's truly decentralized. It's
| another app. It's another app that's just kind of following
| in Twitter's footsteps, but for a different part of the
| population.
|
| > Everything we wanted around decentralization, everything we
| wanted in terms of an open source protocol, suddenly became a
| company with VCs and a board. That's not what I wanted,
| that's not what I intended to help create.
| mullingitover wrote:
| They've done an incredible job running with an extremely low
| headcount and crazy efficient use of hardware. It would be easy
| to 10x their expenses if they were blindly following the
| standard cloud deployment playbook. Hopefully this level of
| efficiency mean they don't have to work as hard and can stay
| pre-revenue, a pure play, for a very long time.
| culi wrote:
| I love Mastodon but I have to admit that BlueSky has clearly
| out-engineered them. Of course they started with much more
| expertise and resources. I hope ActivityPub compatibility soon
| to unite the two
| knallfrosch wrote:
| Anyone following hundreds of thousands of users is obviously a
| bot account scraping content. I'd ban them and call it a day.
|
| However, I do love reading about the technical challenge. I think
| Twitter has a special architecture for celebrities with millions
| of followers. Given Bluesky is a quasi-clone, I wonder why they
| did not follow in these footsteps.
| psionides wrote:
| You don't need to follow anyone (or even have an account) to
| scrape content... Someone following a huge amount of accounts
| usually wants to get a lot of followers quickly this way
| through follow-backs.
| ruined wrote:
| if you want to scrape all the content, that's what the firehose
| is for, and it's allowed.
|
| the only reason to mass-follow is for spam purposes.
| Retr0id wrote:
| This does assume that scrapers are smart, and often they're
| really not. They have infrastructure for scraping HTML from
| webpages at scale and that is the hammer they use for all
| nails. (e.g. Wikipedia has to fight off scraper traffic
| despite full archives being available as torrents, etc.)
|
| In this case I agree though, they're all spammers and/or
| "clout farmers", or trying to make an account seem more
| authentic for future scams. They want to generate follow
| notifications in the hope that some will follow them back
| (and if they don't, they unfollow again after some interval).
| sarchertech wrote:
| 100%. I ran a job board where we provided a nice machine
| readable XML feed of all of our jobs, but we had bots that
| insisted on using the standard search box. Searching by
| city using an alphabetized list.
|
| Geographic search to was the most expensive thing they
| could have done and no matter what we did we couldn't get
| them to use the XML feed.
|
| I even tried returning a link to the feed when we detected
| a bot. No dice. They just kept working around the bot
| detection.
| culi wrote:
| Maybe not hundreds of thousands but I'd follow anybody that
| looks remotely interesting and then primarily use customized
| feeds. E.g. if I wanna hear about union news, my personal irl
| network, etc I check that feed
| tshaddox wrote:
| Or just enforce a maximum number of followed accounts.
| ARandumGuy wrote:
| No matter how high you set a maximum limit for interactions
| on social media (followers, friends, posts, etc), _someone_
| will reach the limit and complain about it. I can see why
| Bluesky would prefer a "soft limit", where going above the
| limit will degrade the experience. It gives more flexibility
| to adjust things later, and prevents obnoxious complaints
| from power users with outsized influence.
| tshaddox wrote:
| I'm skeptical that the people who would complain about that
| wouldn't find something else to complain about if you
| resolved the first complaint. I'd recommend implementing
| product features that you think are reasonable and
| accepting the fact that you will get complaints from people
| who disagree.
| steveklabnik wrote:
| > Given Bluesky is a quasi-clone, I wonder why they did not
| follow in these footsteps.
|
| There are only six users with over a million followers, and
| none with two million yet.
|
| I'm sure they'll get there.
| ChuckMcM wrote:
| As a systems enthusiast I enjoy articles like this. It is really
| easy to get into the mindset of "this must be perfect".
|
| In the Blekko search engine back end we built an index that was
| 'eventually consistent' which allowed updates to the index to be
| propagated to the user facing index more quickly, at the expense
| that two users doing the exact same query would get slightly
| different results. If they kept doing those same queries they
| would eventually get the exact same results.
|
| Systems like this bring in a lot of control systems theory
| because they have the potential to oscillate if there is positive
| feedback (and in search engines that positive feedback comes from
| the ranker which is looking at which link you clicked and giving
| it a higher weight) and it is important that they not go crazy.
| Some of the most interesting, and most subtle, algorithm work was
| done keeping that system "critically damped" so that it would
| converge quickly.
|
| Reading this description of how user's timelines are sharded and
| the same sorts of feedback loops (in this case 'likes' or
| 'reposts') sounds like a pretty interesting problem space to
| explore.
| culi wrote:
| What became of Blekko?
| an_ko wrote:
| > It was acquired by IBM in March 2015, and the service was
| discontinued.
|
| -- https://en.wikipedia.org/wiki/Blekko
|
| Perhaps GP has a more interesting answer though.
| ChuckMcM wrote:
| That's the correct answer, IBM wanted the crawler mostly to
| feed Watson. Building a full search engine (crawler,
| indexer, ranker, API, web application) for the English
| language was a hell of an accomplishment but by the time
| Blekko was acquired Google was paying out tens of billions
| of dollars to people to send _them_ and only them their
| search queries. For a service that nominally has to live on
| advertising revenue getting humans to use it was the only
| way to be net profitable, and you can 't spend billions
| buying traffic and hope to make it back on advertising as
| the #3 search engine in the English speaking markets.
|
| There are other ways to monetize search (look at Kagi for
| example) than advertising. Blekko missed that window
| though. (too early, Google needed to get a crappy as it is
| today to make the value of a spam free search engine
| desirable)
| chrisweekly wrote:
| Not my Q but thanks for the interesting history.
|
| Also, (for other readers), I'm a huge fan of Kagi. Highly
| recommended.
| PaulHoule wrote:
| An airline reservation system has to be perfect (no slack in
| today's skies), a hotel reservation can be 98% perfect so long
| as there is some slack and you don't mind putting somebody up
| in a better room than they paid for from time to time.
|
| A social media system doesn't need to be perfect at all. It was
| clear to me from the beginning that Bluesky's feeds aren't very
| fast, not like they are crazy slow, but if it saves money or
| effort it's no problem if notifications are delayed 30s.
| singleshot_ wrote:
| Does the fact that an airline booking system must be perfect
| explain why so many flights are overbooked or cancelled?
| rconti wrote:
| No, overbooking is a business decision justified by the
| fact that, statistically, not all passengers will actually
| show up for their flight, and lower load factors cost
| money.
| josefresco wrote:
| What is the "no show" rate?
| nightpool wrote:
| A 2019 study of 5 European airports in 2019 had no-show
| rates of 14.4%: https://www.ozion-
| airport.com/product/comparative-analysis-n...
|
| However, my understanding is that airlines have much more
| sophisticated per-flight and per-passenger models that
| calculate the predicted no-show factor based on the
| historical rates for that particular route (e.g. you're
| more likely to get more no-shows in business class flying
| from NYC to SF compared to holiday travelers with a
| reservation on the Florida Keys)
| SteveNuts wrote:
| That blows my mind, I would expect maybe 1 or 2
| passengers per plane at most. I'm trying to think of what
| factors would cause that many no-shows, it has to be
| mostly missed connections?
|
| I can't imagine spending hundreds of dollars and just not
| showing up.
| lhoff wrote:
| A friend of mine works for a Management Consultancy firm
| and they have full flex tickets if they miss the 8pm
| flight home they can take the next one or fly back the
| next morning. All without additional fees. So I believe
| business travel is the biggest factor when it comes to
| missed flights.
|
| Side note: His employer is the biggest client of a major
| European airline.
| shagie wrote:
| Delays getting to the airport and missing the plane.
| Cancelations with full refund. "Hidden city" ticketing.
| Layover delays. Businesses booking blocks. Flexible
| flights ( https://www.travelperk.com/guides/flexible-
| travel/flexible-f... ). Changing / rebooking flights for
| an earlier or later time.
| packetlost wrote:
| I'm sure other factors such as sudden illness and
| migrateable tickets make a sizeable chunk too.
| vidarh wrote:
| Keep in mind they sell a lot of tickets where one of the
| features that allows for a premium price is that they
| allow late cancellations or changes to other flights.
| Holiday travelers are pretty "reliable", but business
| travelers might have changed needs at the drop of a hat
| (say you meet another prospective client on a business
| trip and decide to stay another day to fit in a face-to-
| face meeting).
| artee_49 wrote:
| I think you'll have to pay a team millions to figure that
| out, it is unlikely to be a static rate but rather
| decided based on multiple traits like time of year, time
| of flight, distance of flight, cost of ticket, etc.
| rconti wrote:
| Especially for a free service!
|
| Think about other ad-supported sites. If you're an engineer
| working on an ad-supported product, the perfect consistency
| you strive for in your code is not the product. The product
| is the sum of all of the content the user sees. And the costs
| of the tradeoffs you make are paid for by ads.
|
| Am I willing to see 10x more ads for perfect consistency?
| Definitely not.
| darknavi wrote:
| It's funny because from my experience airline systems are
| very imperfect (timing wise).
|
| I (unwisely) tried to purchase an Icelandair ticket via the
| Chase travel portal. I would get a reservation number, go buy
| seats on Icelandair's website, and a few days later the
| entire reservation would vanish into the ether. Rinse and
| repeat 3x.
|
| I can't remember the exact verbiage, but basically tickets
| can be "reserved" and "booked". One means the ticket is
| allocated, and one means the ticket is actually paid for. I
| eventually sat on the phone with an executive support person
| as they booked the ticket and got it all the way through. It
| turns out Chase reserves a ticket on an airline but as an SLA
| of ~3 days to actually pay for the ticket. Icelandair's
| requires a ticket to be paid with in 24 hours, so it was
| timing out.
| gamedever wrote:
| Miscommunication leads to bad outcomes. One missed message
| out of order could easily lead to a fight, a lawsuit, a flash
| mob, threats of violence - that then need to be taken
| seriously, swatting, DOXxing, etc...
|
| Msg 1: I hate ___insert_controversal_person_category_here___
|
| Msg 2: Is the kind of statement that really sets me off
|
| Msg 1 has a very different meaning if you don't see Msg 2.
| pjc50 wrote:
| This can already happen without help from the platform.
| gamedever wrote:
| Sure, but that doesn't mean the platform should make it
| worse.
|
| Trying to have a conversation on flaky platform is hell.
| gregw134 wrote:
| Would you be willing to share more about how you guys did click
| ranking at Blekko? It's an interesting problem.
| snailmailman wrote:
| I guess I hadn't considered that search engines could be
| reranking pages on the fly as I click them. I've been seeing my
| DuckDuckGo results shuffle around for a while now thinking it's
| an awful bug.
|
| Like I click one page, don't find what I want, and go back
| thinking "no, I want that other result that was below" and it's
| an entirely different page with shuffled results, missing the
| one that I think might have been good.
| PaulHoule wrote:
| That's connected with a basic usability complaint about
| current web interfaces, that ads and recommended content
| aren't stable. You very well might want to engage with an ad
| after you are done engaging what you wanted to engage with
| but you might never see it again. Similarly, you might see
| two or three videos that you want to click on on the side of
| a YouTube video you're watching but you can only click on one
| (though if you are thinking ahead you can open these in
| another tab.)
|
| On top of that immediate frustration, the YouTube style
| interface here
|
| https://marvelpresentssalo.com/wp-
| content/uploads/2015/09/id...
|
| collects terrible data for recommendations because, even
| though it gives them information that you liked the thumbnail
| for a video, they can't come to any conclusion about whether
| or not you liked any of the other videos. TikTok, by focusing
| on one video at a time, collects much better information.
| cgriswald wrote:
| I don't use DDG, but in my (very limited, just now) testing
| it doesn't seem to shuffle results unless you reload the page
| in some way. Is it possible you're browser is reloading the
| page when you go back? If so, setting DDG to open links in
| new tabs might fix this problem.
| dwedge wrote:
| Similar to how Google images loads lower quality blurred
| thumbnails towards the bottom of the window at first so that
| the user thinks they loaded faster
| sphars wrote:
| When I go directly to a user's profile and see all their posts,
| sometimes one of their posts isn't in my timeline where it should
| be. I follow less than 100 users on Bluesky, but I guess this
| explains why I occasionally don't see a user's post in my
| timeline.
|
| Lossy indeed.
| Eric_WVGG wrote:
| Are you using an app, website, or combination?
|
| Various clients (I'm writing one) interpret the timeline
| differently, as a feed that shows literally everything includes
| could things that most people would find undesirable or
| irrelevant. (replies to strangers, replies to replies to
| replies, etc)
| Retr0id wrote:
| If another user you follow reposted or replied to a post, it
| can affect its order in your following feed. You shouldn't be
| seeing any loss as described in the article from following only
| 100 users.
| trhway wrote:
| So the system design puts the burden on what seems to be
| synchronous, not queued, writes to get easy reads. I usually
| prefer simpler cheaper writes at the cost of more complicated
| reads as the reads scale and parallelize better.
| pfraze wrote:
| you're underestimating the read load, by a lot
| skybrian wrote:
| This design makes sense if you didn't previously have any limit
| on the number of people an account could follow. But why not have
| a limit?
| whyrusleeping wrote:
| people get so up in arms when you suggest there might be a
| limit on how many people they can follow.
| timewizard wrote:
| > This process involves looking up all of your followers, then
| inserting a new row into each of their Timeline tables in reverse
| chronological order with a reference to your post.
|
| Seriously? Isn't this the nut of your problem right here?
| jsnell wrote:
| What alternative design did you have in mind, given that a
| Twitter-like data model of individual follows is likely a
| strict product requirement?
|
| There are obviously other ways of doing it (doing the timeline
| propagation in a batch job, fanning out the reads rather than
| the writes), but they've got their own problems. Probably worse
| ones.
| pphysch wrote:
| Wouldn't a hybrid approach makes sense?
|
| Periodically classify users as hot/cold based on their
| activity, build hot-follower timelines on write, and build
| cold-follower timelines on read.
| jsnell wrote:
| You'd still have exactly the same hot write path, it'd just
| have maybe 50% of the load. That could be a legit
| optimization, but not having it hardly warrants an
| incredulous "seriously?" like the OP's.
|
| (And the same for the inverse hybrid strategy of
| quarantining the writes of highly followed users and
| handling their fan-out at read time. A neat optimization,
| and maybe even absolutely once you have accounts with 100M
| followers. But the vast majority of posts would still be
| handled via the original strategy.)
| rakoo wrote:
| Ok I'm curious: since this strategy sacrifices consistency, has
| anyone thoughts about something that is not full fan-out on reads
| or on writes ?
|
| Let's imagine something like this: instead of writing to every
| user's timeline, it is written once for each shard containing at
| least one follower. This caps the fan-out at write time to
| hundreds of shards. At read time, getting the content for a given
| users reads that hot slice and filters actual followers. It
| definitely has more load but
|
| - the read is still colocated inside the shard, so latency
| remains low
|
| - for mega-followers the page will not see older entries anyway
|
| There are of course other considerations, but I'm curious about
| what the load for something like that would look like (and I
| don't have the data nor infrastructure to test it)
| rconti wrote:
| > Additionally, beyond this point, it is reasonable for us to not
| necessarily have a perfect chronology of everything posted by the
| many thousands of users they follow, but provide enough content
| that the Timeline always has something new.
|
| While I'm fine with the solution, the wording of this sentence
| led me to believe that the solution was going to be imperfect
| chronology, not dropped posts in your feed.
| artee_49 wrote:
| I am a bit perplexed though as to why they have implemented fan-
| out in a way that each "page" is blocking fetching further pages,
| they would not have been affected by the high tail latencies if
| they had not done this,
|
| "In the case of timelines, each "page" of followers is 10,000
| users large and each "page" must be fanned out before we fetch
| the next page. This means that our slowest writes will hold up
| the fetching and Fanout of the next page."
|
| Basically means that they block on each page, process all the
| items on the page, and then move on to the next page. Why
| wouldn't you rather decouple page fetcher and the processing of
| the pages?
|
| A page fetching activity should be able to continuously keep
| fetching further set of followers one after another and should
| not wait for each of the items in the page to be updated to
| continue.
|
| Something that comes to mind would be to have a fetcher component
| that fetches pages, stores each page in S3 and publishes the
| metadata (content) and the S3 location to a queue (SQS) that can
| be consumed by timeline publishers which can scale independently
| based on load. You can control the concurrency in this system
| much better, and you could also partition based on the shards
| with another system like Kafka by utilizing the shards as keys in
| the queue to even "slow down" the work without having to
| effectively drop tweets from timelines (timelines are eventually
| consistent regardless).
|
| I feel like I'm missing something and there's a valid reason to
| do it this way.
| abound wrote:
| I interpreted this as a batch write, e.g. "write these 10k
| entries and then come back". The benefit of that is way less
| overhead versus 10k concurrent background routines each writing
| individual rows to the DB. The downside is, as you've noted,
| that you can't "stream" new writes in as older ones finish.
|
| There's a tradeoff here between batch size and concurrency, but
| perhaps they've already benchmarked it and "single-threaded"
| batches of 10k writes performed best.
| exabrial wrote:
| I honestly am annoyed to use websites and services like this.
| Annoys the crap out of me and everyone else, but since it's petty
| much forced down their throats, the "eventually" is "eventually
| everyone stops complaining".
| einpoklum wrote:
| Centrally-controlled social media platforms are not a good thing,
| period. Neither Twitter/X, nor BlueSky. Let's not fete them.
| mifydev wrote:
| "Hot Shards in Your Area" - 10/10 heading
| NoGravitas wrote:
| The funny thing is that all of the centralization in Bluesky is
| defended as being necessary to provide things like global search
| and all replies in a thread, things that Mastodon simply punts on
| in the name of decentralization. But then ultimately, Bluesky has
| to relax those goals after all.
| arcastroe wrote:
| I found it odd to base the loss-factor on the number of people
| you follow, rather than a truer indication of timeline-update-
| frequency. What if I follow 4k accounts, but each of those
| accounts only posts once a decade? My timeline would be become
| unnecessarily lossy.
| crabbone wrote:
| Anecdotally, I ran into a similar solution "by chance".
|
| Long ago, I worked for a dating site. Our CTO at the time was a
| "guest of honor" who was brought in by a family friend who was
| working in the marketing at the time. The CTO was a university
| professor who took on a job as a courtesy (he didn't need the
| money nor fame, he had enough of both, and actually liked
| teaching).
|
| But he instituted a lot of experimental practices in the company.
| S.a. switching roles every now and then (anyone in the company
| could apply for a different role except administration and try
| themselves wearing a different hat), or having company-wide
| discussions of problems where employees would have to prepare a
| presentation on their current work (that was very unusual at the
| time, but the practice became more institutional in larger
| companies afterwards).
|
| Once he announced a contest for the problem he was trying to
| solve. Since we were building a dating site, the obvious problem
| was matching. The problem was that the more properties there were
| to match on, the longer it would take (beside other problems that
| is). So, the program was punishing site users who took time to
| fill out the questionnaires as well as they could and favored the
| "slackers".
|
| I didn't have any bright ideas on how to optimize the matching /
| search for matches. So, ironically, I asked "what if we just
| threw away properties beyond certain threshold randomly?" I was
| surprised that my idea received any traction at all. And the
| answer was along the lines of "that would definitely work, but I
| wouldn't know how to explain this behavior to the users". Which,
| at the time, I took to be yet another eccentricity of the old
| man... but hey, the idea stuck with me for a long time!
| cavisne wrote:
| AWS has a cool general approach to this problem (one badly
| behaving user effecting others on their shard)
|
| https://aws.amazon.com/builders-library/workload-isolation-u...
|
| The basic idea is to assign each user to multiple shards,
| decreasing the changes of another user sharing _all_ their shards
| with the badly behaving user.
|
| Fixing this issue as described in the article makes sense, but if
| they did shuffle sharding in the first place it would cover any
| new issues without effecting many other users.
| artee_49 wrote:
| I think shuffle sharding is beneficial for read-only replica
| cases, not for writing scenarios like this. You'll have to
| write to the primary and not to a "virtual node". Right? Or am
| I understand it incorrectly? I just read that article now.
| pornel wrote:
| I wonder why timelines aren't implemented as a hybrid gather-
| scatter choosing strategy depending on account popularity (a
| combination of fan-out to followers and a lazy fetch of popular
| followed accounts when follower's timeline is served).
|
| When you have a celebrity account, instead of fanning out every
| message to millions of followers' timelines, it would be cheaper
| to do nothing when the celebrity posts, and later when serving
| each follower's timeline, fetch the celebrity's posts and merge
| them into the timeline. When millions of followers do that, it
| will be cheap read-only fetch from a hot cache.
| ericvolp12 wrote:
| This is probably what we'll end up with in the long-run. Things
| have been fast enough without it (aside from this issue) but
| there's a lot of low-hanging fruit for Timelines architecture
| updates. We're spread pretty thin from a engineering-hours
| standpoint atm so there's a lot of intense prioritization going
| on.
| locusofself wrote:
| Why do they "insert" even non-celebrity posts into each
| follower's timeline? That is not intuitive to me.
| JadeNB wrote:
| I understand that it's a different point, but how can someone
| write a whole essay called "When imperfect systems are good"
| without once mentioning Gabriel or
| https://en.wikipedia.org/wiki/Worse_is_better?
___________________________________________________________________
(page generated 2025-02-19 23:00 UTC)