[HN Gopher] Jetstream: Shrinking the AT Protocol Firehose by >99%
       ___________________________________________________________________
        
       Jetstream: Shrinking the AT Protocol Firehose by >99%
        
       Author : keybits
       Score  : 142 points
       Date   : 2024-09-24 10:50 UTC (12 hours ago)
        
 (HTM) web link (jazco.dev)
 (TXT) w3m dump (jazco.dev)
        
       | out_of_protocol wrote:
       | Why providing non-compressed version at all? This is new
       | protocol, no need for backwards compatibility. Dictionary may be
       | baked into protocol itself, being fixed for specific version.
       | E.g. protocol v1 uses that fixed v1 dictionary. Useful for
       | replaying stored events on both sides
        
         | jacoblambda wrote:
         | A non-compressed version is almost certainly cheaper for
         | anything local (ex self-hosting your own services that consume
         | the firehose on the same machine or for testing).
         | 
         | There's not really a good reason to do compression if the
         | stream is just going to be consumed locally. Instead you can
         | skip that step and broadcast over memory to the other local
         | services.
        
           | out_of_protocol wrote:
           | It could be a flag, normally disabled. Also, i'm not sure
           | about "cheaper" side, since disk ops are not free, maybe
           | uncompressing zstd IS cheaper than writing, reading huge
           | blobs from disk, exchanging info between apps
        
         | cowsandmilk wrote:
         | This isn't compression, they are throwing features of the
         | original stream out.
        
           | out_of_protocol wrote:
           | We're discussing here compiled output, plain json and the
           | same json but zstd-compressed
        
         | ericvolp12 wrote:
         | Jetstream isn't an official change to the Protocol, it's an
         | optimization I made for my own services that I realized a lot
         | of other devs would appreciate. The major driving force behind
         | it was both the bandwidth savings but also making the Firehose
         | a lot easier to use for devs that aren't familiar with AT Proto
         | and MSTs. Jetstream is a much more approachable way for people
         | to dip their toe into my favorite part of AT Proto: the public
         | event stream.
        
           | out_of_protocol wrote:
           | As I understand article, there are new API (unofficial , for
           | now)
           | 
           | - Jetstream(1) (no compression) and
           | 
           | - Jetstream(2) (zstd compression).
           | 
           | And my comment means (1) not really needed, except some
           | specific scenarios
        
             | ericvolp12 wrote:
             | It's impossible to use the compressed version of the stream
             | without using a client that has the baked-in ZSTD
             | dictionary. This is a usability issue for folks using
             | languages without a Jetstream client who just want to
             | consume the websocket as JSON. It also makes things like
             | using websocat and unix pipes to build some kind of
             | automation a lot harder (though probably not impossible).
             | 
             | FWIW the default mode is uncompressed unless the client
             | explicitly requests compression with a custom header. I
             | tried using per-message-deflate but the support for it in
             | the websocket libraries I was using was very poor and it
             | has the same problem as streaming compression in terms of
             | CPU usage on the Jetstream server.
        
               | slantedview wrote:
               | > It also makes things like using websocat and unix pipes
               | to build some kind of automation a lot harder
               | 
               | Would anybody realistically be using those tools with
               | this volume of data, for anything but testing?
        
               | stonogo wrote:
               | It's not that much data. Certainly nothing zstdcat can't
               | handle.
        
       | wrigby wrote:
       | I thought this was going to be a strange read about the Hayes
       | command set[1] at first glance.
       | 
       | 1: https://en.m.wikipedia.org/wiki/Hayes_AT_command_set
        
         | zwirbl wrote:
         | I thought exactly the same
        
         | rapnie wrote:
         | My guess was NATS Jetstream [0].
         | 
         | [0] https://docs.nats.io/nats-concepts/jetstream
        
       | gooseus wrote:
       | I thought this was going to be about NATS Jetstream, but it is
       | not.
       | 
       | https://docs.nats.io/nats-concepts/jetstream
        
         | fakwandi_priv wrote:
         | Why is this being downvoted? Seems like a valid concern to
         | raise if you find two pieces of software somewhat having the
         | same functionality.
        
           | kylecazar wrote:
           | It has the same name, not the same functionality. I am not a
           | downvoter... but it's probably because reading a few
           | sentences of this blog post would reveal what it is.
        
             | dboreham wrote:
             | Perhaps the complaint is more about project namers spending
             | zero time checking for uniqueness.
        
               | gs17 wrote:
               | A problem shared with AT as well.
        
           | gooseus wrote:
           | Was I being downvoted?
           | 
           | It wasn't even a criticism, just an observation for anyone
           | else who was thinking the same or was interested in another
           | popular project with a similar name (and seemingly similar
           | functions? didn't look too hard).
           | 
           | Naming things is hard and we all kinda share one global tech
           | namespace, so this is gonna inevitably happen.
        
         | Kinrany wrote:
         | I thought this was about BlueSky using NATS!
        
       | JoshMandel wrote:
       | Server-Sent Events (SSE) with standard gzip compression could be
       | a simpler solution -- or maybe I'm missing something about the
       | websocket + zstd approach.
       | 
       | SSE Benefits: Standard HTTP protocol, Built-in gzip compression,
       | Simpler client implementation
        
         | jeroenhd wrote:
         | Well-configured zstd can save a lot of bandwidth over gzip at
         | this scale without major performance impact, especially with
         | the custom dictionary. Initialising zstd with a custom
         | dictionary also isn't very difficult for the client side.
         | 
         | As for application development, I think web socket APIs are
         | generally exposed much better and used much easier than SSEs. I
         | agree that SSEs are a more appropriate technology to use here,
         | but they're used so little that I don't think the tooling is
         | good. Just about every language has a dedicated websocket
         | client library, but SSEs are usually implemented as a weird
         | side effect of a HTTP connection you need to keep alive
         | manually.
         | 
         | The stored ZSTD objects make sense, as you only need to
         | compress once rather than compress for every stream (as the
         | author details). It also helps store the data collected more
         | efficiently on the server side if that's what you want to do.
        
         | qixxiq wrote:
         | I don't have an understanding of SSE in depth, but one of the
         | points the post is arguing for is compress once (using zstd
         | dictionary) and send that to every client.
         | 
         | The dictionary allows for better compression without needing a
         | large amount of data, and sending every client the same
         | compressed binary data saves a lot of CPU time in compression.
         | Streams, usually, require running the compression for each
         | client.
        
       | scirob wrote:
       | Was expecting Nats Jetstream but this is also cool
        
         | vundercind wrote:
         | Was expecting the Hayes modem command language.
        
       | pohl wrote:
       | The "bring it all home" screenshot shows a CPU Utilization graph,
       | and the units of measurements on the vertical axis appears to be
       | milliseconds. Could someone help me understand what that
       | measurement might be?
        
         | anamexis wrote:
         | The graph is labeled - CPU seconds per second.
        
           | pohl wrote:
           | Missed that, thank you.
        
       | Ericson2314 wrote:
       | I gotta say, I am not very excited about "let's throw away all
       | the security properties for performance!" (and also "CBOR is too
       | hard!")
       | 
       | If everyone is on one server (remains to be seen), and all the
       | bots blindly trust it because they are cheap and lazy, what the
       | hell is the point?
        
         | evbogue wrote:
         | Or why can't one verify a msg on it's own isolated from all of
         | the other events on the PDS.
        
           | ericvolp12 wrote:
           | The full Firehose provides two major verification features.
           | First it includes a signature that can be validated letting
           | you know the updates are signed by the repo owner. Second, by
           | providing the MST proof, it makes it hard or impossible for
           | the repo owner to omit any changes to the repo contents in
           | the Firehose events. If some records are created or deleted
           | without emitting events, the next event emitted will show
           | that something's not right and you should re-sync your copy
           | of the repo to understand what changed.
        
         | hinkley wrote:
         | If you're going to try data reduction and compression, _always
         | try compression first_. It may reveal that the 10x reduction
         | you were looking at is only 2x and not worth the trouble.
         | 
         | Reduction first may show the compression is less useful.
         | Verbose, human friendly protocols compressed win out in
         | maintenance tasks, and it's a marathon not a sprint.
        
           | jonathanyc wrote:
           | As a corollary, if you try to be too clever with your data
           | reduction strategy, you might walk yourself into a dead end /
           | local maximum by making the job of off-the-shelf compression
           | algorithms more difficult.
        
         | skybrian wrote:
         | Centralization on trusted servers is going to happen but if
         | they speak a common protocol, at least they can be swapped out.
         | For JetStream, anyone can run an instance, though it will cost
         | them more.
         | 
         | It's sort of like the right to fork in Open Source; it doesn't
         | mean people fork all the time or verify every line of code
         | themselves. There's still trust involved.
         | 
         | I wonder if some security features could be added back, though?
        
         | wmf wrote:
         | We've seen this over and over. If you do things the "right" way
         | devs just don't show up because it's too much work.
        
           | therein wrote:
           | Depends on what the right way is to be honest. If the "right
           | way" is moving the project into zero-knowledge proofs
           | territory, it does push out a lot of the developers. It is
           | not that cryptography in general is pushing people out in
           | that case but ZKP complexity is.
        
         | steveklabnik wrote:
         | > If everyone is on one server (remains to be seen)
         | 
         | Even internally, BlueSky runs a number of "servers." It's
         | already internally federated. And you can transfer your account
         | to your own server if you want to, though that is very much
         | still beta quality, to be fair.
         | 
         | You're not really "on a server" in the same sense as other
         | things. It's closer to "I create content addressable storage"
         | than "The database on this instance knows my
         | username/password."
        
           | apitman wrote:
           | And if you're curious what your Bluesky server is:
           | DID=$(curl -s https://<username>.bsky.social/.well-
           | known/atproto-did)       curl https://plc.directory/$DID | jq
           | '.service[0].serviceEndpoint'
           | 
           | Or if you're using a custom domain (a la example.com), you
           | can get your DID from the DNS:                 dig +short
           | _atproto.example.com TXT
        
             | steveklabnik wrote:
             | Thanks for this! I'd never checked. Turns out I'm on
             | https://morel.us-east.host.bsky.network. I do want to host
             | my own PDS someday, but then I'd be responsible for keeping
             | it up...
        
               | apitman wrote:
               | Indie hosted PDS is on my list as well.
               | 
               | I'm also currently trying to understand the tradeoffs for
               | did:plc more. It's unclear to me just how centralized it
               | is. Will it always require a single central directory, or
               | is it more like Certificate Transparency? Based on what
               | I've heard about the recover process, I believe it's the
               | latter, but I still need to dig into it more.
        
               | steveklabnik wrote:
               | I don't know myself, but given that the discussion there,
               | from what I've heard, is along the lines of "should be
               | moved into an independent foundation," my assumption is
               | that it will always require a directory. But this is
               | probably the part of the tech stack that I know the least
               | details about.
        
       | madduci wrote:
       | Nice feat!
       | 
       | I wonder if a rewrite of this in C++ would even bump further the
       | performance and optimise the overall system.
        
         | szundi wrote:
         | Or in rust haha
        
       | xbar wrote:
       | I'm never not going to look for Hayes command set topics when
       | people talk about BlueSky.
        
         | gs17 wrote:
         | You're not the only one. I don't get why they couldn't have
         | named it something that wasn't very similar to something
         | already around for several decades, or at least insist on the
         | shortened ATproto name (one word, lower case p). Sure, in
         | practice, no one will actually confuse them, but that could be
         | said for Java and JavaScript.
        
       | S0y wrote:
       | >Before this new surge in activity, the firehose would produce
       | around 24 GB/day of traffic. After the surge, this volume jumped
       | to over 232 GB/day!
       | 
       | >Jetstream is a streaming service that consumes an AT Proto
       | com.atproto.sync.subscribeRepos stream and converts it into
       | lightweight, friendly JSON.
       | 
       | So let me get this straight. if you did want to run Jetstream
       | yourself you'd still need to be able to handle the 232 GB/day of
       | bandwidth?
       | 
       | This always has been my issue with Bluesky/AT Protocol, For all
       | the talk about their protocol being federated, It really doesn't
       | seem realistic for anyone to run any of the infrastructure
       | themselves. You're always going to be reliant on a big player
       | that has the capital to keep everything running smoothly. At this
       | point I don't really see how it's any different then being on any
       | of the old centralized social media.
        
         | PhilippGille wrote:
         | Based on the article OP runs his Jetstream instance with 12
         | consumers (subsets of the full stream if I understand
         | correctly) on a $5 VPS on OVH
        
         | pfraze wrote:
         | Old social media never gave full access to the firehose so
         | there's a pretty big difference.
         | 
         | If you want large scale social networks, you need to work with
         | a large scale of data. Since federated open queries aren't
         | feasible, you need big machines.
         | 
         | If you want a smaller scale view of the network, do a crawl of
         | a subset of the users. That's a perfectly valid usage of
         | atproto, and is how ActivityPub works by nature.
        
           | S0y wrote:
           | >Old social media never gave full access to the firehose so
           | there's a pretty big difference.
           | 
           | That is good, but it's still a centralized source of truth.
           | 
           | >If you want large scale social networks, you need to work
           | with a large scale of data. Since federated open queries
           | aren't feasible, you need big machines.
           | 
           | Thats just simply not true. ActivityPub does perfectly
           | without the need of any bulky machine or node acting as a
           | relay for the rest of the network. Every single ActivityPub
           | service only ever interacts with other discovered services.
           | Messages aren't broadcast through a central firehose, they're
           | sent directly to who needs to receive them. This is a
           | fundamental difference with how both protocols work. With
           | ATProto you NEED to connect to some centralized relay that
           | will broker your messages for you. With ActivityPub, there is
           | no middle man, Instances just talk directly to each other.
           | This is why ActivityPub has a discovery problem by the way,
           | but it's just a symptom of real federation.
           | 
           | >and is how ActivityPub works by nature.
           | 
           | It's not. See Above.
        
             | str4d wrote:
             | > > If you want large scale social networks, you need to
             | work with a large scale of data. Since federated open
             | queries aren't feasible, you need big machines.
             | 
             | > Thats just simply not true.
             | 
             | > [snip]
             | 
             | > This is why ActivityPub has a discovery problem by the
             | way, but it's just a symptom of real federation.
             | 
             | You're actually agreeing with them! The "discovery problem"
             | is because "federated open queries aren't feasible".
             | 
             | > With ATProto you NEED to connect to some centralized
             | relay that will broker your messages for you.
             | 
             | You can connect to PDSs directly to fetch data if you want;
             | this is exactly what the relays do!
             | 
             | If you want to build a client that behaves more like
             | ActivityPub instances, and does not depend on a relay, you
             | could do so:
             | 
             | - Run your own PDS locally, hosting your repo.
             | 
             | - Your client reads your repo via your PDS to see the
             | accounts you follow.
             | 
             | - Your client looks up the PDSs of those accounts (which
             | are listed in their DID documents).
             | 
             | - Your client connects to those PDSs, fetches data from
             | them, builds a feed locally, and displays it to you.
             | 
             | This is approximately a pull-based version of ActivityPub.
             | It would have the same scaling properties as ActivityPub
             | (in fact better, as you only fetch what you need, rather
             | than being pushed whatever the origins think you need). It
             | would also suffer from the same discovery problem as
             | ActivityPub (you only see what the accounts you follow
             | post).
             | 
             | At that point, you would not be consuming any of the
             | _output_ of a relay. You would still want relays to connect
             | to your PDS to pull data into their _input_ in order for
             | other users to see your posts, but that's because those
             | users have chosen to get their data via a relay (to get
             | around the discovery problem). Other users could instead
             | use the same code you're using, and themselves fetch data
             | directly from your PDS without a relay, if they wanted to
             | suffer from the discovery problem in exchange for not
             | depending on a relay.
        
               | S0y wrote:
               | It doesn't change the fact that if someone were to do
               | that, it wouldn't be supported by anyone let alone the
               | main bluesky firehose. I think it's pretty disingenuous
               | to just say "You can do it" when what your suggesting is
               | so far off the intended usage of the Protocol that it
               | might as well be a brand new implementation. As a matter
               | of fact, people DO already do this. they use ActivityPub
               | and talk to bluesky using a bridge.
               | 
               | The core of the issue is, Bluesky's current model is
               | unsustainable. The cost of running the main relay is
               | going to keep rising. The barrier to discovery keeps
               | getting higher and higher. It might cost 150$/month now
               | to mirror the relay, but what's going to happen when it's
               | 1000$?
        
               | CyberDildonics wrote:
               | Blue Sky supposedly has 5.5 million active users.
               | 
               | https://en.wikipedia.org/wiki/Bluesky
               | 
               | By your own numbers it averages 2.7 MB/s. This is
               | manageable with a good cable internet connection or a
               | small vps. This is a small number for 5.5 million active
               | users.
               | 
               | What happens _if_ it expands to 10 times its current
               | active users? Who knows, maybe only the 354 million
               | people around the world with access to gigabit broadband
               | can run a _full_ server at home and the rest of the
               | people and companies that want to run _full_ servers will
               | have to rent a vps.
               | 
               | https://www.telecompetitor.com/gigabit-availability-
               | report-1...
               | 
               | The point here is that this is not a practical problem.
               | How many of these servers do you really need?
        
             | rudyfraser wrote:
             | AP also has relays as a part of the architecture, just less
             | well documented https://joinfediverse.wiki/index.php?mobile
             | action=toggle_vie...
        
               | S0y wrote:
               | They may share a name, but they both work in very
               | different ways.
        
             | pfraze wrote:
             | > That is good, but it's still a centralized source of
             | truth.
             | 
             | It's not. It's a trustless aggregator. The PDSes are the
             | sources of truth, and you can crawl them directly. The
             | relay is just an optimization.
             | 
             | > Messages aren't broadcast through a central firehose
             | 
             | ATProto works like the web does. People publish information
             | on their servers, and then relays crawl them and emit their
             | crawl through a firehose.
             | 
             | > ActivityPub does perfectly without the need of any bulky
             | machine or node acting as a relay for the rest of the
             | network
             | 
             | ActivityPub doesn't do large scale aggregated views of the
             | activity. The peer-wise exchanges means that views get
             | localized; this is why there's not network-wide search,
             | metrics, or algorithms.
             | 
             | > This is why ActivityPub has a discovery problem by the
             | way,
             | 
             | right
             | 
             | > but it's just a symptom of real federation.
             | 
             | "real" ?
        
               | S0y wrote:
               | >The PDSes are the sources of truth, and you can crawl
               | them directly. The relay is just an optimization.
               | 
               | This is such a massive understatement. the relay is the
               | single most important piece in the entire Bluesky stack.
               | 
               | Let me ask you this, is it possible for me to connect to
               | a PDS directly, right now, via the bluesky app? or is
               | this something that will be possible in the future?
               | 
               | >ATProto works like the web does. People publish
               | information on their servers, and then relays crawl them
               | and emit their crawl through a firehose.
               | 
               | >ActivityPub doesn't do large scale aggregated views of
               | the activity.
               | 
               | So are relays really just an optimization or an integral
               | part of how ATProtocol is supposed to work? ActivityPub
               | doesn't require relays to function properly. This is why
               | I say it's real federation. You can't truly be federated
               | if you require centralization.
        
               | orf wrote:
               | > You can't truly be federated if you require
               | centralization.
               | 
               | I'm not so sure: isn't the certificate transparency log a
               | pretty good example of a federated group of disparate
               | members successfully sharing a view of the world?
               | 
               | That requires some form of centralization to be useful
               | (else it's not really a log, more of a series of
               | disconnected scribbles), and it's definitely a true
               | federated network.
        
               | pfraze wrote:
               | > Let me ask you this, is it possible for me to connect
               | to a PDS directly, right now, via the bluesky app?
               | 
               | Well, yes, that's what you do when you log in. If you
               | open devtools you'll see that the app is communicating
               | with your PDS.
               | 
               | > So are relays really just an optimization or an
               | integral part of how ATProtocol is supposed to work?
               | 
               | I think the issue here is that you're mentally slicing
               | the stack in a different way that atproto does. You
               | expect each node the be a full instance of the
               | application, and that the network gets partitioned by a
               | bunch of applications exchanging peerwise.
               | 
               | A better mental model of atproto is that it's a network
               | of cross-org microservices.
               | https://atproto.com/articles/atproto-for-distsys-
               | engineers gives a decent intuition about it.
        
               | apitman wrote:
               | > If you open devtools you'll see that the app is
               | communicating with your PDS.
               | 
               | That's pretty cool. Does bsky.app do a DID resolve or
               | does it have a faster back channel for determining PDS
               | addresses for bsky.social?
        
         | CyberDildonics wrote:
         | That's only about 2.7 MB/s on average.
         | 
         | If someone wants to run a server, they probably would pay for a
         | VPS with a gigabit connection, which would be able to do 120
         | MB/s.
         | 
         | You might need to pay for extra bandwidth, but it is probably
         | less than a night out every month.
        
         | rudyfraser wrote:
         | > It really doesn't seem realistic for anyone to run any of the
         | infrastructure themselves. You're always going to be reliant on
         | a big player that has the capital to keep everything running
         | smoothly
         | 
         | I run a custom rust-based firehose consumer on the main
         | firehose using a cheap digitalocean droplet and don't cross 15%
         | CPU usage even during the peak 33 Mb/s bandwidth described in
         | this article.
         | 
         | The core team seem to put a lot of intention into being able to
         | host different parts of the network. The most resource
         | intensive of which is the relay which produces the event stream
         | that Jetstream and others consume from. One of their devs did a
         | breakdown of that and showed you could run it at $150 per month
         | which is pricey but not unattainable with grants or
         | crowdfunding
         | https://whtwnd.com/bnewbold.net/entries/Notes%20on%20Running...
        
           | purlane wrote:
           | I can attest secondhand that it's possible to run a relay for
           | ~EUR75/month, which is well within the range of many
           | hobbyists.
        
         | AlienRobot wrote:
         | Sometimes I think all of this could have been avoided if people
         | knew how to use RSS.
        
       | hinkley wrote:
       | Given that "AT Protocol" already has a definition in IT that's as
       | old as OP's grandma, what is this AT Protocol they are talking
       | about here?
       | 
       | Introduce your jargon before expositing, please.
        
         | gs17 wrote:
         | The article kind of assumes you know what it is in order to be
         | interested in it, but it's the protocol used by Bluesky instead
         | of ActivityPub.
        
           | hinkley wrote:
           | Which makes it a bad submission for HN. If you want exposure,
           | prepare for it.
        
             | skrtskrt wrote:
             | you sound like you're a blast to be around
        
         | marssaxman wrote:
         | I wondered something similar when I clicked the link: "who is
         | still using enough AT commands that a compressed representation
         | would matter, and how would you even DO that?" But this is
         | clearly something else.
        
           | vardump wrote:
           | Anyone who writes software that uses GSM modems for example.
           | Like in embedded systems.
        
             | jpm_sd wrote:
             | Iridium satellite modems too.
        
               | hinkley wrote:
               | I'm shocked those things are still up there. That project
               | tried to fail so many times.
        
               | jpm_sd wrote:
               | "those things" meaning the all-new Iridium constellation,
               | launched in the late 2010s?
               | 
               | https://en.wikipedia.org/wiki/Iridium_satellite_constella
               | tio...
        
             | marssaxman wrote:
             | Oh, for sure - I've done some of that myself - but I would
             | never associate the word "firehose" with such low-powered
             | systems!
        
           | hinkley wrote:
           | And then I had to look up CBOR too, which at least is a thing
           | Wikipedia has heard of. I mostly use compressed wire
           | protocols and ignore the flavor of the month binary
           | representations.
        
         | wmf wrote:
         | You mean the Hayes AT command set? We didn't call it a protocol
         | back in the say.
        
           | dboreham wrote:
           | But it still is a protocol.
        
         | purlane wrote:
         | The Wikipedia article does a pretty good job at giving an
         | overview for it.
         | 
         | https://en.wikipedia.org/wiki/AT_Protocol
        
         | dboreham wrote:
         | Had same confusion. Wondered why it would need compression...
        
       | londons_explore wrote:
       | > Before this new surge in activity, the firehose would produce
       | around 24 GB/day of traffic.
       | 
       | The firehose is all public data going into the network right?
       | 
       | Isn't that pretty tiny for a worldwide social network?
       | 
       | And the fact one country can cause a 10x surge in traffic also
       | suggests its worldwide footprint must be tiny...
        
         | londons_explore wrote:
         | 24GB/day is very much a "we could host this on one server with
         | a bit of read caching" scale.
         | 
         | Every year or so, you plop in one more 10TB SSD.
        
           | ericvolp12 wrote:
           | Yes the actual record content on the network isn't huge at
           | the moment but the firehose doesn't include blobs (images and
           | videos) which take up significantly more space. Either way,
           | yeah it's pretty lightweight. Total number of records on the
           | network is around 2.5Bn in the ~1.5 years Bluesky has been
           | around.
           | 
           | I aggregated some stats when we hit 10M users here - https://
           | bsky.app/profile/did:plc:q6gjnaw2blty4crticxkmujt/po...
        
         | ericvolp12 wrote:
         | The 10x surge in traffic was us gaining 3.5M new users over the
         | course of a week (growing the entire userbase by >33%) and all
         | these users have been incredibly active on a daily basis.
         | 
         | Lots of these numbers are public and the impact of the surge
         | can be seen here:
         | https://bskycharts.edavis.dev/static/dynazoom.html?plugin_na...
         | 
         | Note the graphs in that link only show users that take a
         | publicly visible action (i.e. post, like, follow, etc.) and
         | won't show lurkers at all.
        
         | str4d wrote:
         | > The firehose is all public data going into the network right?
         | 
         | It's the "main subset" of the public data, being "events on the
         | network": for the Bluesky app that's posts, reposts, replies,
         | likes, etc. Most of those records are just metadata (e.g. a
         | repost record references the post being reposted, rather than
         | embedding it). Post / reply records include the post text
         | (limited to 300 graphemes).
         | 
         | In particular, the firehose traffic does _not_ include images
         | or videos; it only includes references to "blob"s. Clients
         | separately fetch those blobs from the PDSs (or from a CDN
         | caching the data).
        
       | bcrl wrote:
       | I find it baffling that the difference in cost of serving
       | 41GB/day vs 232GB/day is worth spending any dev time on. We're
       | talking about a whopping 21.4Mbps on average, which costs me
       | roughly CAD$3.76/month in transit (and my transit costs are about
       | to be cut in half for 2 x 10Gbps links thanks to contracts being
       | up and the market being very competitive). 1 hour of dev time is
       | upwards of 2 years of bandwidth usage at that rate.
        
         | ericvolp12 wrote:
         | The current relay firehose has more than 250 subscribers. It's
         | served more than 8.5Gbps in real-world peak traffic sustained
         | for ~12 hours a day. That being said, Jetstream is a lot more
         | friendly for devs to get started with consuming than the full
         | protocol firehose, and helps grow the ecosystem of cool
         | projects people build on the open network.
         | 
         | Also, this was a fun thing I built mostly in my free time :)
        
           | tecleandor wrote:
           | Also it's just not those concrete 190GBs a day. It's the 6x
           | traffic you can fit in the same pipe :D
        
             | ericvolp12 wrote:
             | Yeah exactly! The longer we can make it without having to
             | shard the firehose, the better. It's a lot less complex to
             | consume as a single stream.
        
         | wkat4242 wrote:
         | It's a benefit on the receiving side too. And it has ecological
         | benefits. Nothing to sneeze at.
        
         | steveklabnik wrote:
         | A thing I have admired about the BlueSky development team is
         | that they're always thinking about the future, not just the
         | present. One area in which this is true is the system design:
         | they explicitly considered how they would scale. Sure, at the
         | current cost, this may not be worth it, but as BlueSky
         | continues to grow, work like this will be more and more
         | important.
         | 
         | Also, it's just a good look towards the community. Remember,
         | ideally not all of ATProto's infrastructure is run by BlueSky
         | themselves, but by a diverse set of folks who want to be able
         | to control their own data. These are more likely to be
         | individuals or small groups, not capitalized startups. While
         | the protocol itself is designed to be able to do this, "what
         | happens when BlueSky is huge and running your own infra is too
         | expensive" is a current question that some folks are reasonable
         | skeptical about: it's no good being a federated protocol if
         | nobody can afford to run a node. By doing stuff like this, the
         | BlueSky team is signaling that they're aware of and responsive
         | to these concerns, so don't look at it as trying to save money
         | on bandwidth: look at it as a savvy way to generate some good
         | marketing.
         | 
         | That's my take anyway.
         | 
         | EDIT: scrolling down, you can see this kind of sentiment here:
         | https://news.ycombinator.com/item?id=41637307
        
       | ChicagoDave wrote:
       | I'm just popping in here to say this (and BlueSky and atproto)
       | are two of the coolest technical feats in today's tech world.
        
       ___________________________________________________________________
       (page generated 2024-09-24 23:00 UTC)