[HN Gopher] Why is IRC distributed across multiple servers?
___________________________________________________________________
Why is IRC distributed across multiple servers?
Author : rain1
Score : 120 points
Date : 2021-09-12 10:56 UTC (12 hours ago)
(HTM) web link (gist.github.com)
(TXT) w3m dump (gist.github.com)
| throwthere wrote:
| I don't know if the numbers are realistic here. First and most
| importantly, messages are only sent to clients in the same
| chatroom, not sever wide. Second, 10% of users are only very
| rarely going to send messages at once. By rare you can probably
| substitute never. This, this is simple very small text messages
| where seconds of lag don't really matter-- why would it be hard
| to manage tens of thousands of concurrent connections?whatsapp
| crushed millions of connections on single server back in 2012--
| https://web.archive.org/web/20140501234954/https://blog.what...
| prdonahue wrote:
| So you can nick collide people, obviously.
| sdfs6645d wrote:
| https://www.reddit.com/live/17n32347h2coa
| https://www.reddit.com/live/17n32eybvox8j
| https://www.reddit.com/live/17n33zpj7ujrq
| https://www.reddit.com/live/17n34xytz9su9
| https://www.reddit.com/live/17n35ablgxldg
| https://www.reddit.com/live/17n35grmkeeqo
| https://www.reddit.com/live/17n35walz24op
| https://www.reddit.com/live/17n364wbm5m7n
| https://www.reddit.com/live/17n36fxxw5f06
| https://www.reddit.com/live/17n3bisi2bly2
| https://www.reddit.com/live/17n3bx98vndyl
| https://www.reddit.com/live/17n3c3kq8vgi1
| https://www.reddit.com/live/17n3cbuq2sawu
| https://www.reddit.com/live/17n3clc0kf3qk
| https://www.reddit.com/live/17n3crh9t9hba
| https://www.reddit.com/live/17n3d4tgfroo8
| https://www.reddit.com/live/17n3dcxpjb6mz
| https://www.reddit.com/live/17n3dj906t21a
| https://www.reddit.com/live/17n3drj836arb
| https://www.reddit.com/r/nflChiefsvBrownstvs/
| https://www.reddit.com/r/nflChiefsvBrowns/
| https://www.reddit.com/live/17n3g9kcuru7t
| https://www.reddit.com/live/17n3gg2056pzw
| https://www.reddit.com/live/17n3grv476zr0
| https://www.reddit.com/live/17n3h0p5701cy
| https://www.reddit.com/live/17n3iascrq39o
| https://www.reddit.com/live/17n3i4nkj2vwg
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| https://links.uky.edu/sites/default/files/webform/marketing-...
| unixhero wrote:
| Engineering IRC networks is si much fun.
| H8crilA wrote:
| Just remember that "netsplits" exist in every distributed system,
| be it a chat app or a database. It's just the CAP theorem. IRC
| has chosen to sacrifice C (consistency).
|
| The only thing that changed in the modern times is that the P
| (partitions) are extremely rare in modern high octane cloud
| infrastructures. Also, modern solutions often decide to sacrifice
| A (availability), by returning an error saying "we're aware of
| the problem and we're working on a solution". This is what
| happened quite recently when Google authentication went out and
| half of the internet went dark, while under the hood they had a
| simple out-of-quota situation on one of the replicas of their
| core authentication systems. The system was programmed to
| sacrifice A (availability) and reject all authentication
| requests.
| moonchild wrote:
| > IRC has chosen to sacrifice C (consistency)
|
| Hm? Hasn't it sacrificed partition tolerance? A netsplit is a
| partition.
| mappu wrote:
| In CAP, the P happens whether you like it or not, and you get
| to choose between C-but-not-A or A-but-not-C.
|
| IRC is an AP system. It stays up (+A) in a netsplit (+P) but
| the resulting servers are not consistent.
| Twisol wrote:
| It tolerates partitions just fine; I've been through many
| netsplits where folks just kept talking on our side of the
| split until the network healed.
|
| Partition tolerance doesn't mean partitions don't affect the
| system, or that they can't happen. It just means the system
| has to choose whether to become unavailable or inconsistent
| (since it can't have both in the presence of a partition).
| IRC chooses to remain available, at the cost of losing
| messages for people on the wrong side of the split.
| j56no wrote:
| If it had sacrificed P IRC would stop working in case of a
| net split. Instead it keeps working in an inconsistent state.
| wlonkly wrote:
| No, "stop working" is Availability.
| throwaway20371 wrote:
| > "netsplits" exist in every distributed system, be it a chat
| app or a database, it's just the CAP theorem
|
| Well let's not get carried away. Network partitions happen
| everywhere, but everything is not about the CAP theorem. CAP
| theorem is a very specific model that a lot of apps (even ACID
| databases) don't conform with. Comparing IRC to CAP theorem is
| like comparing it to ACID and saying, "IRC decided to sacrifice
| transaction integrity".
|
| IRC didn't explicitly sacrifice the C in CAP, they designed a
| simple server protocol. They could have added a bunch of
| weirdness to hide splits from users, but it would have been
| unnecessarily complicated and not contributed significantly to
| the user experience.
| H8crilA wrote:
| I'm sorry but I don't think you realise how simple and
| fundamental the CAP theorem is. It's almost a tautology. And
| yes it applies fully.
|
| The most basic case is if there's absolutely no method of
| exchanging information from point A to point B. Then agents
| at A and B will not be able to communicate. That's it. Any
| system built to facilitate information exhange will either
| have to deliver incomplete information (C) or will have to
| refuse to operate (A).
|
| Now then, as I said, nowadays it's extremely unlikely that
| there's truly no connection between any two major Internet
| hubs (though it can happen, hello BGP). It still happens in
| specific systems that do not work on any method of
| information transfer but rather on specific methods of
| information transfer. The IRC example requires specific
| servers to be up, not just a functioning IP routing between
| the end clients. If some server is not up then (at least
| temporarily) from IRC's point of view there's no way to
| deliver information from A to B. The Google auth outage
| example requires (among most likely many other things) disk
| space availability on specific servers for information
| exchange to happen.
| TheDong wrote:
| > I don't think you realise how simple and fundamental the
| CAP theorem is
|
| May I recommend reading "A Critique of the CAP Theorum -
| Martin Kleppmann", available as a PDF here
| https://arxiv.org/abs/1509.05393
|
| As that paper points out, your definition of CAP theorem is
| simplified and incomplete to the point of being wrong, as
| many are.
|
| As it also point out, CAP theorem doesn't really account
| for eventual consistency well.
|
| I would argue that a chat protocol is a good place to
| perform eventual consistency, and those tradeoffs work
| well. During network partitions, have both sides of the
| partition continue to accept messages. Have the client mark
| messages with random unique IDs, and have each server mark
| messages with a server timestamp. The well-defined merge
| operation is now to sort by server-time and dedupe by
| message ID, such that if a message is sent to two servers
| it only displays once.
|
| This doesn't work for IRC traditionally, since messages do
| not have unique IDs, and so no merge operation can
| deduplicate them, and servers do not store messages during
| netsplits (or at any time really), so they cannot be re-
| sent.
|
| However, a similar system exists for other chat systems.
| matrix is a federated system of multiple servers, and when
| partitions occur, each server will still accept new
| messages, and later those messages will be made available
| to other servers and merged in at the appropriate time.
|
| I think that CAP theorem's results are less interesting if
| you consider application-level resolutions to network
| issues (i.e. eventual consistency), and as I believe the
| paper also implies trotting it out constantly when talking
| about practical systems gets old fast.
| H8crilA wrote:
| If you can always merge reordered edits/messages then CAP
| does not apply because you don't need C (as defined in
| CAP), you may instead talk about partitions/connectivity
| issues as if they were some anomalous sources of large
| latencies in the system. You have your own, different
| definition of C. There are some very very large scale
| systems out there that work under the assumption that any
| edits can arrive reordered, and it's OK for the
| observable properties of the system.
|
| Here's what's "inconsistent" in an eventually consistent
| chat app: your typed responses might have been different
| had you seen in time what the other party has to say. To
| some degree the "computation" happens in your head. A
| "fully consistent" / "fully synchronous" chat app would
| sometimes refuse to send a message because the other
| party might have said something in the meantime. Like
| you'd expect from a fully-synchronous bank account
| balance handling system that wants to keep >= 0 balance
| at all times, rejecting overdraft transactions.
|
| (And I agree that this is completely acceptable behavior
| for a chat app; we as people are built to tolerate this
| kind of a problem in async person to person
| communication; just pointing out what does C in CAP
| exactly mean; the "fully synchronous" chat app would be
| just an occasional pain in the ass with little benefit)
| throwaway20371 wrote:
| > The most basic case is if there's absolutely no method of
| exchanging information from point A to point B. Then agents
| at A and B will not be able to communicate. That's it.
|
| That's not it. The most basic case is if there's no
| _linearizeability_ between A and B. A and B can continue
| communicating but fail the C in CAP if linearizeability
| fails. Hence we shouldn 't compare everything to CAP.
| 300bps wrote:
| One thing I haven't seen covered is multiple servers =
| redundancy.
|
| If a server goes down, having a net split is a lot better than
| having the entire network down.
| k__ wrote:
| So its users can get fun netsplits.
|
| I remember we would all try to get on the same server in our
| channel, but some less technical people would use a web client
| that assigned different ones every time.
| nathias wrote:
| The side effect was also great for communities on .net servers
| that didn't have services like user accounts and channels.
| Channel ops were battle-won and people who had them were much
| better at not sucking completely.
| Sunspark wrote:
| ChanOps have always been a problem. Anyone who becomes one, is
| a dictator for life. There is no recourse, the only option is
| to either be on their good side, or go to a different channel
| or network.
|
| I like IRC as an open technology. I don't like the lack of
| accountability from the gatekeepers.
|
| It is the same problem on online forums like reddit. If the
| mods do not look upon you with favour, you are banned, even if
| the rules have not been broken.
| nathias wrote:
| Yes, chanops make channels into properties of individuals,
| but without them they are property of the community that uses
| them.
| jchw wrote:
| Well, from my historical reading of it, initially, IRC was a
| federated network of servers that were essentially one network,
| the way email is one network: there was no shared administration
| or anything. Anyone could run a server and jump into the network.
| Due to abuse, servers began restricting who they peered with, and
| it fractured into multiple networks.
|
| So really, I suspect it was designed to be distributed and
| federated, and it just became what it is by accident.
| albertgoeswoof wrote:
| What would be the abuse issues from open peering? How were we
| able to solve them for email, but not IRC?
| nickelpro wrote:
| We didn't, email spam exists to this day. The solution has
| been to ban entire swaths of domains and even IP ranges by
| chucking all mail from them into spam folders
| Ekaros wrote:
| Many other services also used to be like this. Think of Usenet
| aka. news. It is effective model when you think of Internet as
| network of networks. When there was real difference between
| connecting to your local area network, metropolitan area
| network or even wide area network.
|
| Actually we have come quite far from those days and full speed
| point-to-point links between most points is somewhat realistic.
| unilynx wrote:
| It was never open to attach your server to a network, unlike
| email. A server connection was way too powerful for that. You
| needed an existing server admin to allow your server to
| connect.
| ghancock wrote:
| I wasn't there but I have seen multiple histories say that
| there were servers that accepted connections from anyone
| (most famously eris.berkeley.edu but not only that one). For
| example, https://about.psyc.eu/IRC
| jimjams wrote:
| In reality only guys in the same channel get sent the messages...
| if messages are spread between even a few channels the autual
| numbers are much more manageable for one server.
| Ologn wrote:
| > One of the problems of having multiple servers is that
| netsplits can occur.
|
| In the early/mid 1990s, the IRC servers in Australia would split
| from the IRC servers in the US all of the time (sometimes Europe
| would break from the US as well). The Internet connection between
| the US and Australia was slower and flakier back then. It made
| lots of sense for Australians to be on Australian IRC servers and
| Americans to be on US IRC servers, and to all be talking together
| when the link was working (the majority of the time) and to not
| be when the link broke (fairly regularly). The CAP theorem says
| something has to go in those cases, and the thing that went was
| consistency between US and Australian (or European) messages sent
| to a channel - the messages from the other side of the split
| would be dropped during the split.
|
| I don't remember many technical netsplits on Freenode or Libera
| in recent years, so it is less of a thing now. IRC servers were
| always federated, so there was the original split of Anet and
| EFnet, and the Undernet split, then the EFnet/IRCnet split which
| revolved around those US/Europe/Australia issues. More recently
| there was the Freenode/Libera split.
|
| IRC's model always worked for me.
| throwaway20371 wrote:
| Why are Linux distributions hosted on multiple mirror servers
| that they don't own?
|
| 1) money 2) availability 3) trust 4) security
|
| 1) If you don't have a lot of money, you take the servers you can
| get. Donated mirrors means you don't have to pay the bandwidth or
| hosting bills.
|
| 2) If you have multiple servers, it's less likely that one server
| going down will tank your project. When GitHub, AWS, or even
| Level3 has an outage, Linux distros keep on chugging like nothing
| happened. Traditional server maintenance is also easier when
| everyone can just switch to a different server.
|
| 3) Maintainers can use their PGP keys to create signed packages
| and downloads. Their public keys are distributed on mirrors, as
| well as embedded in the downloads they've signed. Once downloaded
| by users, the distribution can verify its own integrity. But how
| does the user know they started with the real maintainers' public
| key? The public key is distributed on a hundred geographically-
| distributed servers all owned by different people; the user can
| check them all. So other than compromising a maintainer's key,
| it's logistically impossible to compromise end-user security.
| (this one is more Linux-specific than IRC-specific)
|
| 4) If you only have one server & it gets compromised, it can be
| hard to tell. By comparing its operating state to the other
| servers, you can sometimes more quickly identify the compromise.
| And if you do find a compromise, you can remove the compromised
| server quickly, close the hole on the other servers, and start
| regenerating keys. It's an eventuality every large project should
| be prepared for, and IRC servers do get compromised. Linux
| mirrors don't matter in this regard, but the build servers etc do
| matter.
|
| IRC comes from the same time and place, and has some (but not
| all) of the same considerations.
| LinuxBender wrote:
| I can somewhat answer this. Apologies, this became a bit long
| winded and I have barely touched on several historical, technical
| and logistical reasons.
|
| Part of the answer is historical and part of this _was_
| technical. IRC has been around for a very long time. As such, the
| earlier versions of the servers and daemons could not accept tens
| of thousands of client connections _ePoll vs Select_. The
| connections between servers are multiplexed and not directly
| related to the number of people connected to the server. There
| was also a matter of latency. Servers in a region would keep the
| messages local to that region, as only people in the same channel
| get the messages and it was less common to have people in the
| same channel all over the world. This also changed with time. If
| there was a split, you lost other regions. This was not always
| the case, so of course I am over-generalizing since there were
| many different IRC networks designed by many different people.
| Being long running services, I had seen a great deal of
| hesitation to re-architect anything on the fly on at least some
| of the networks, even after ePoll and modern hardware made it
| possible to have tens of thousands of people on one server. Some
| of the smaller IRC networks indeed consolidated into fewer or a
| single server.
|
| Another facet is logistics and ownership. Many of the bigger
| networks are comprised of servers owned and managed by different
| people and organizations. The servers are linked as a matter of
| trust. That trust can be revoked. Most of the early IRC networks
| were run by people doing this in their free time with their own
| money and/or limited resources. In some other cases some
| organizations prefer to have their own servers so that their own
| people indeed to not suffer splits for their local communication.
| There are a myriad of other use-cases and reasons why some
| organizations had their own servers. Sometimes there was a need
| to give LocalOps special permissions that would not be permitted
| network-wide. Despite the technical capability to have less
| servers, some organizations are not going to give up their local
| nodes.
|
| One issue not mentioned is permission losses on splits. The issue
| with splits and permission changes has more to do with the way
| services are integrated into IRC, or more specifically, aren't.
| Services are treated like bots with higher privilege and most if
| not all of them were not written to be multi-master. Rather than
| dealing with moving services around or pushing for read-only
| daemons, they just lived with the possibility that there would be
| splits and they would eventually resolve themselves. I personally
| would have preferred to see a more common integration with
| OpenLDAP. Some of the IRC daemons can use LDAP, but it is more of
| an after-thought, or bolt on. This would have allowed splits to
| occur without losing channel permissions and clients could be
| configured to quickly attach to another server in another region
| and that is just DNS management. This could have been further
| improved by amending or replacing the IRC RFC's to allow SRV
| records. This may have been done by now for all I know. I shut
| down my last public server some time ago.
|
| There is a lot more to this than I could sum up on HN. Anyway,
| today you can fire up an IRCd of your choice on modern hardware
| and accept tens of thousands if not hundreds of thousands of
| people on a single server if you wish. It is technically
| possible. I would still design the network to have multiple
| servers, as you will eventually hit a bottleneck. If you really
| want to do this, you will have to de-tune the anti-ddos counter
| measures to allow the thundering herd to join your standby server
| or make code changes to permit the thundering herd briefly on
| fail-over.
| wayoutthere wrote:
| As someone who ran IRC servers in the 90s the technical
| limitation was the number of file descriptors. I think Linux at
| the time was limited to 1024 and the biggest server on our
| network was a DEC Alpha with 4096. The entire network (DALnet
| at the time) was in the 20-30k user range so we absolutely
| needed multiple servers.
| throwaway20371 wrote:
| I'm pretty sure even back then you could edit the hard-coded
| limit in the source code and recompile. I remember us doing
| something like this as it was too expensive to just keep
| buying servers and our apps were connection-happy.
| wayoutthere wrote:
| 1024 was the max you could boost it to; 256 was the default
| as I recall. Linux 1.x was pretty bootstrappy.
| blibble wrote:
| there was also no efficient io multiplexing
|
| an ircd with a few thousand clients was cpu bound on
| poll()/select()
|
| /dev/poll and kqueue/epoll were game changing
| jlokier wrote:
| It was actually possible to delegate subsets of descriptors
| to child processes doing the poll()/select(), making
| polling have the same time complexity as /dev/poll and
| kqueue/epoll, and avoid being CPU bound. Even better if you
| delegated cold subsets, and kept a hot subset in the main
| process.
|
| But few knew the trick so it didn't catch on.
| blibble wrote:
| mind explaining how?
|
| with poll()/select() I don't see how you can avoid
| checking every FD at least once (poll's fd counter
| aside), vs. epoll() only returning those in the desired
| state
|
| (and I don't think you could do tricks like epoll_wait()
| on an epoll fd)
| jlokier wrote:
| Sure.
|
| Fork some child processes, and keep an AF_UNIX
| socketpair() open to them so you can pass them file
| descriptiors with SCM_RIGHTS.
|
| Have the main process divide up the fds it is waiting on
| into a "hot" subset and cold subsets of size at most N,
| and for each cold subset pick a child process P. fds can
| be moved between hot and cold at any time, and generally
| you will move them to hot after they have woken and been
| used, and move them to cold after a few consecutive poll-
| cycles where they were not ready. Don't move fds to cold
| subsets belonging to child processes that you don't want
| to wake, though.
|
| When the main process is ready to "poll everything", have
| it iterate over each child process _that is not already
| sleeping_ , and send a message over the socketpair(),
| containing a list of fd_set additions and removals to
| that child's wait-for subset, including the type of poll
| (read, write, etc).
|
| For each fd where the child doesn't have the real file
| descriptor yet, pass that over the socketpair() as part
| of the message. (If threads are usable instead of
| processes, there's no need to send the file descriptor.
| But on old systems, the system threads were often
| implemented by userspace multiplexing with poll/select
| anyway, so it wasn't a good idea to use threads with this
| technique.)
|
| As well as a list of changes, this message tells the
| child process to run poll/select on its subset, and then
| reply with the set of fds that are ready (and their
| readiness type).
|
| After issuing all the child process messages, the main
| process does its own poll/select, to wait for hot fds and
| replies from the child processes.
|
| The reason this has different scaling properties, despite
| the overheads, is that each child handles a limited size
| subset, messages scale with the amount of change activity
| not the size of sets, and ideally the "coldest" fds end
| up gathered together in child processes that _continue to
| sleep between a large number of main process polls_ , so
| the number of active child processes and messages scales
| with the amount of change activity as well.
|
| Keep in mind, even active fds are removed from the wait-
| for subset if they've recently reported they are ready
| and the poll loop hasn't read/written them yet. So it has
| similar algorithmic properties to epoll.
|
| As a bonus in the case of select(), the fds in the child
| processes have smaller values than in the main process.
| So in addition to the number of fds polled per cycle
| scaling with the amount of activity instead of the total
| number of fds, the fd_set bitset size does not grow with
| the total number of fds either. In the main process the
| bitset size does grow, but it's possible to juggle fd
| values with dup2() to overcome that.
| wpietri wrote:
| People whose only experience is with modern hardware and
| networks really have a hard time getting the first point. As
| somebody who started coding around the time IRC was created,
| hardware and networks are _amazingly good_ compared to what we
| had at the time.
|
| In the mid-90s, years after IRC was written, I set up a
| distributed system for the financial traders I was working for.
| Our between-cities links were 64 kb/s guranteed and could burst
| all the way up to 256 kb/s. And those links were not super
| reliable. These were connecting systems with Pentium processor
| running ~90 MHz with ~8 MB RAM. They did very, very little
| compared with even the cheapest server slice you can get with
| AWS.
|
| This is one of those things, like George Washington never
| knowing about dinosaurs, where it's just hard to comprehend how
| people thought back in the olden days.
| rain1 wrote:
| thanks, really appreciate this comment.
| LinuxBender wrote:
| My pleasure. I am sure others could add a great deal more.
| There is a very long history and there are many pieces of
| history I am leaving out. A big part I left out is the
| individual server rate limits vs. network link rate limits
| and network topology and that is both a technical and
| logistical issue.
| spinax wrote:
| One positive thing I'd add - as a user - under logistics is
| high availability. Life is messy, servers go down planned
| or unplanned for whatever reasons - IRC networks are in a
| sense truly 'federated' in that the client will get a new
| server on reconnect attempts much like webservers behind a
| load balancer. You never have to worry about your 'home
| instance' being unavailable, as they're all your home
| instance. (I speak about the public networks like Libera or
| OFTC)
| Sophira wrote:
| > _Services are treated like bots with higher privilege_
|
| A slight correction: Services normally link as a _server_ to
| the network, which is how they get the higher privilege that
| they do (because only servers, not clients, get the ability to
| kill users from the network, etc).
|
| And to add to this for others who may be curious: typically
| there is some special configuration on the IRC server side to
| allow the link, and some additional configuration to disallow
| clients from changing their nickname to names like "NickServ",
| etc (but to still allow the names when a server on the network
| broadcasts a user with that nick). Normal non-Services IRC
| bots, on the other hand, connect as regular clients.
| duskwuff wrote:
| Services also need to perform actions which aren't possible
| for ordinary users, like knowing when a user connects,
| forcibly changing a user's nick, or changing a user's
| permissions in a channel without being an operator in the
| channel.
| hrpnk wrote:
| Ah, netsplits were so eventful. I still remember the split-wars
| where groups would wait for a split to happen and gain operator
| permissions only to take over a channel on the merge [1].
|
| [1] https://en.wikipedia.org/wiki/IRC_takeover#Riding_the_split
| sterlind wrote:
| wouldn't ChanServ fix things once the split resolves?
| hrpnk wrote:
| I did not experience that, but you're right:
| https://en.wikipedia.org/wiki/IRC_services#ChanServ
| lnxg33k1 wrote:
| Not all networks have services, for example that happened a
| lot on IRCNet which doesn't(? maybe now has?)
| techrat wrote:
| ChanServ is a relatively modern function of IRC. For a good
| while, still to this day on some networks, services did not
| exist.
| rdpintqogeogsaa wrote:
| Lots of correct and insightful information here, but I'd like
| to pick out one specific aspect here.
|
| > _[...] clients could be configured to quickly attach to
| another server in another region and that is just DNS
| management. This could have been further improved by amending
| or replacing the IRC RFC 's to allow SRV records. This may have
| been done by now for all I know._
|
| To set the stage: Larger IRC networks balance their global
| servers. A DNS A query for irc.example.com will yield a list of
| geographically local servers, possibly shuffled on each query
| as well.
|
| I know of at least one IRC network that refuses to send even
| the list of all geographically local servers, only sending a
| subset, as a measure to avoid trivial DDoS attacks if people
| don't go around collecting the DNS records ahead of time. I'm
| told that this actually works because the thread actors are not
| the sophisticated kind.
|
| Incidentally, I have also noted that some networks will shuffle
| the order of A records for each query because the clients
| cannot be trusted to select a random DNS response. Considering
| something _this trivial_ already doesn 't work, I dread to
| imagine how much a DNS SRV implementation would go wrong,
| considering it needs both sorting and a weighted random
| sampling[1] to really work.
|
| [1] https://datatracker.ietf.org/doc/html/rfc2782 page 3 _et
| seq._
| samsquire wrote:
| On some operating systems getaddrinfo sorts the DNS response
| by IPv6 distance! Breaking load balancing
|
| https://access.redhat.com/solutions/22132
| toast0 wrote:
| This is not exclusive to IPv6, I've seen it on v4 as well.
| If you've got short DNS TTLs and can return 2-4 records out
| of a larger pool, that can help, but if your TTLs are
| longer, you have to consider the handful of recusrive DNS
| servers that serve a large number of users... You want to
| give them more records to balance that traffic better.
|
| OTOH, current IRC usage numbers are pretty low, a beefy
| single server should work, except for the disruption
| potential of single servers. Latency can be a bit of an
| issue too, depending on where your users are; not great if
| users are in south asia and the only server is in east
| coast US.
| hcykb wrote:
| Because when IRC was popular servers and routes went down often
| and a single server couldn't handle all the users a network would
| have. Neither of those are a concern anymore.
| jbverschoor wrote:
| Because many of the early protocols, including IP, we're designed
| with network failures in mind.
| mvanbaak wrote:
| > via round robin DNS (meaning that when people resolve the DNS
| it gives them a random server from the set of 20 to connect to)
|
| Most of the times, it's not simple round-robin, but also geo-
| based. This means clients will get ip addresses of the servers
| closest to them.
| magila wrote:
| My experience with Freenode/Libra Chat is that they either
| don't implement geo DNS or don't do a very good job of it. I'm
| on the US west coast and lookups to irc.libera.chat often
| return servers in Europe.
|
| Edit: Double checking Libra Chat's website I see that they have
| added regional hostnames so I guess that's their solution.
| pvtmert wrote:
| if they're using aws route53, your isp needs to support edns.
|
| otherwise, your netblock might have been falsely advertised
| in the dns provider's geoip database. (eg. maxmind)
| pushrax wrote:
| They're using Cloudflare. When I resolve them from the east
| coast, I got a San Francisco server once and a server in
| Budapest once. They have a server in Toronto, Ashburn,
| Montreal, and other places that are closer.
|
| I know geodns works here since I use it for some of my own
| deployments.
| melony wrote:
| Does IRC predates distributed state machines? Why can't the
| servers sync up the chat via Paxos or Raft?
| manquer wrote:
| Chat is not that complex as other distributed applications
| you probably don't need Raft.Both Paxos and Raft are very
| complex algorithms to implement.
|
| A CRDT based append only implementation is probably more than
| enough?. Data is never modified only added/removed in typical
| chat workflows.
|
| Reading discourd engineering blog over the years it looks
| like scaling the pub/sub for the consumers in large channels
| is lot harder than DB/store itself being distributed.
| giantrobot wrote:
| Distributing state wasn't the goal on IRC, only relaying
| messages. If you miss a message you miss a message. You can
| use client-side tools (bots, bouncers, etc) to record state
| but the protocol itself doesn't care.
| duskwuff wrote:
| Implementing Paxos would mean that stateful operations (like
| connecting to the server, joining a channel, or changing
| modes) become impossible on a server, or a group of servers,
| that have lost quorum.
| sterlind wrote:
| hard to do Paxos over large geographical distances
| efficiently, but... it's IRC, so..
|
| I just assume it was from an earlier internet where
| distributed systems weren't as well understood. I don't think
| it necessarily predates Paxos but it definitely predates
| Paxos being a household name.
| X6S1x6Okd1st wrote:
| paxos was first created 1989, but not popularized for a long
| while after:
| https://en.m.wikipedia.org/wiki/Paxos_(computer_science)
|
| irc 1988: https://en.m.wikipedia.org/wiki/Internet_Relay_Chat
|
| Earliest reference for raft I can find is 2013.
| rawoke083600 wrote:
| Would be fun to visit the old problems(like this one) with modern
| toolset. Say golang with channels(not the /join type of channels)
| :p
___________________________________________________________________
(page generated 2021-09-12 23:01 UTC)