[HN Gopher] Understanding Round Robin DNS
       ___________________________________________________________________
        
       Understanding Round Robin DNS
        
       Author : hyperknot
       Score  : 190 points
       Date   : 2024-10-26 16:46 UTC (6 hours ago)
        
 (HTM) web link (blog.hyperknot.com)
 (TXT) w3m dump (blog.hyperknot.com)
        
       | latchkey wrote:
       | > "It's an amazingly simple and elegant solution that avoids
       | using Load Balancers."
       | 
       | When a server is down, you have a globally distributed / cached
       | IP address that you can't prevent people from hitting.
       | 
       | https://www.cloudflare.com/learning/dns/glossary/round-robin...
        
         | arrty88 wrote:
         | The standard today is to use a relatively low TTL and to health
         | check the members of the pool from the dns server.
        
           | latchkey wrote:
           | That's like saying there are traffic rules in Saigon.
           | 
           | Exact implementation of TTL, is a suggestion.
        
         | wongarsu wrote:
         | An clients tested in the article behaved correctly and chose
         | one of the reachable servers instead.
         | 
         | Of course somebody will inevitably misconfigure their local DNS
         | or use a bad client. Either you accept an outage for people
         | with broken setups or you reassign the IP to a different server
         | in the same DC.
        
           | latchkey wrote:
           | If you know all of your clients, then you don't even need
           | DNS. But, you don't know all of your clients. Nor do you
           | always know your upstream DNS provider.
           | 
           | Design for failure. Don't fabricate failure.
        
             | zamadatix wrote:
             | Why would knowing your clients change whether or not you
             | want to use DNS? Even when you _control_ all of the clients
             | you 'll almost always want to keep using DNS.
             | 
             | A large number of services successfully achieve their
             | failure tolerances via these kinds of DNS methods. That
             | doesn't mean all services would or that it's always the
             | best answer, it just means it's a path you can consider
             | when designing for the needs of a system.
        
               | latchkey wrote:
               | I'm replying to the comment above. If the article picks a
               | few clients and it happens to work, that is effectively
               | "knowing your clients". At which point, it means you have
               | control over the client/server relationship and if we are
               | trying to simplify by not using load balancers, we might
               | as well simplify things even further, and not use DNS.
               | 
               | It is an absurd train of thought that nobody in their
               | right mind would consider... just like using DNS-RR as a
               | replacement for load balancing.
        
               | zamadatix wrote:
               | I must be having trouble following your train of thought
               | here - many large web services like Cloudflare and Akamai
               | serve large volumes of content through round robin DNS
               | balancing, what's absurd about their success? They
               | certainly don't know every client that'll ever connect to
               | a CDN on the internet... it just happens to work almost
               | every time anyways. That very few clients might not
               | instantly flip over isn't always a design failure worth
               | deploying full load balancers. I'm also still not
               | following why the decisions for whether or not you need a
               | load balancer are supposed to be in any way equivalent to
               | the decisions of when using DNS would make sense or not?
        
               | latchkey wrote:
               | We are not talking about "large web services", we are
               | talking about small end users spinning up their own DNS-
               | RR "solution".
               | 
               | LWS get away with it because of Anycast...
               | 
               | https://www.cloudflare.com/en-
               | gb/learning/cdn/glossary/anyca...
        
               | zamadatix wrote:
               | Anycast is certainly a nice layer to add but it's not a
               | requirement for DNS round robin to work reliably. It does
               | save some of the concern around relying on selection of
               | an efficiently close choice by the client though and can
               | be a good option for failover.
               | 
               | More directly - is there some set of common web client
               | I've been missing for many years that just doesn't follow
               | DNS TTLs or try alternate records? I think the article
               | gets it right with the wish list at the end containing a
               | Amazon Route 53-like "pull dead entries automatically"
               | note but maybe I'm missing something else? I've used this
               | approach (pull the dead server entries from DNS, wait for
               | TTL) and never caught any unexpected failures during
               | outages but maybe I haven't been looking in the right
               | places?
               | 
               | If you mean it's possible to design something with round-
               | robin DNS in a way that more clients than you expect will
               | fail then absolutely, you can do things the wrong way
               | with most any solution. Sometimes you can be fine with a
               | subset of clients not always working during an outage or
               | you can be fine with a solution which provides slower
               | failover than an active load balancer. What I'm trying to
               | find is why round-robin DNS must always be the wrong
               | answer in all design cases.
        
               | latchkey wrote:
               | > _is there some set of common web client I 've been
               | missing for many years that just doesn't follow DNS TTLs
               | or try alternate records?_
               | 
               | Yes. There are tons of people with outdated and/or buggy
               | software still using the internet today.
        
               | zamadatix wrote:
               | What % did you find to be "tons" with these specific
               | bugs? I'm assuming it was quite a significant number (at
               | least 10%?) that broke badly quite often given the
               | certainty it's the wrong decision for all solutions, any
               | idea how to help me identify which clients I've been
               | missing or might run into? DNS TTLs are also pretty
               | necessary for most web systems to work reliably,
               | regardless of load balancer or not, so what ways do you
               | work around having large numbers of clients which don't
               | obey them (beyond hoping to permanently occupy the same
               | set of IPs for the life of the service of course)?
        
               | latchkey wrote:
               | The percentage is kind of irrelevant. The issue is that
               | if you're running something like an e-commerce site and
               | any percentage of people can't hit your site because of a
               | TTL issue with one of your down servers, you're likely to
               | never know how much lost revenue you've had. Site is
               | down, go to another store to buy what you need. You also
               | have no control over fixing the issue, other than to get
               | the server back and running. This has downstream effects,
               | how do you cycle the server for upgrades or maintenance?
               | 
               | I don't understand why anyone would argue for this as a
               | solution when there are near zero effort better ways of
               | doing this that don't have any of the negative downsides.
        
               | buzer wrote:
               | > More directly - is there some set of common web client
               | I've been missing for many years that just doesn't follow
               | DNS TTLs or try alternate records?
               | 
               | I don't know if there is such a list but older versions
               | of Java are pretty famous for caching the DNS responses
               | indefinitely. I don't hear much about it these days so I
               | assume it was probably fixed around Java 8.
        
         | toast0 wrote:
         | Skipping an unnecessary intermediary is worth considering.
         | 
         | Load balancing isn't without cost, and load balancers subtly
         | (or unsubtly) messing up connections is an issue. I've also
         | used providers where their load balancers had worse
         | availability than our hosts.
         | 
         | If you control the clients, it's reasonable to call the
         | platform dns api to get a list of ips and shuffle and iterate
         | through in an appropriate way. Even better if you have a few
         | stablely allocated IPs you can distribute in client binaries
         | for _when_ DNS is broken; but DNS is often not broken and it 's
         | nice to use for operational changes without having to push new
         | configuration/binaries everytime you update the cluster.
         | 
         | If your clients are browsers, default behavior is ok; they
         | usually use IPs in order, which can be problematic [1], but
         | otherwise, they have good retry behavior: on connection refused
         | they try another IP right away, in case of timeout, they try at
         | least a few different IPs. It's not ideal, and I'd use a load
         | balancer for browsers, at least to serve the initial page load
         | if feasible, and maybe DNS RR and semi-smart client logic in JS
         | for websockets/etc; but DNS RR is workable for a whole site
         | too.
         | 
         | If your clients are not browsers and not controlled by you,
         | best of luck?
         | 
         | I will 100% admit that sometimes you have to assume someone
         | built their DNS caching resolver to interpret the TTL field as
         | a number of days, rather than number of seconds. And that
         | clients behind those resolvers will have trouble when you
         | update DNS, but if your loadbalancer is behind a DNS name,
         | _when_ it needs to change addresses, you 'll deal with that
         | then, and you won't have experience.
         | 
         | [1] one of the RFCs suggests that OS apis should sort responses
         | by prefix match, which might make sense if IP prefixes were
         | heirarchical as a proxy to get to a least network distance
         | server. But in the real world, numerically adjacent /24s are
         | often not network adjacent, but if your servers have widely
         | disparate addresses, you may see traffic from some client ips
         | gravitate towards numerically similar server ips.
        
           | ectospheno wrote:
           | > I will 100% admit that sometimes you have to assume someone
           | built their DNS caching resolver to interpret the TTL field
           | as a number of days, rather than number of seconds.
           | 
           | I've run a min ttl of 3600 on my home network for over a
           | year. No one has complained yet.
        
             | toast0 wrote:
             | That's only because there's no way for service operators to
             | effectively complain when your clients continue to hit
             | service ips for 55 minutes after you should. And if there
             | was, we'd first yell at all the people who continue to hit
             | service ips for weeks and months after a change... by the
             | time we get to complaining about one home using an hour
             | ttl, it's not a big deal.
        
               | ectospheno wrote:
               | I take the point of view that if me not honoring your 60
               | second ttl breaks your site for me then I want to know so
               | I stop going there.
        
       | easylion wrote:
       | https://www.cloudflare.com/en-gb/learning/cdn/glossary/anyca...
        
       | easylion wrote:
       | did you try running a simple bash curl loop instead of manually
       | printing. The data and statistics will be become exactly clear.
       | Because i want to understand how to ensure my clients get the
       | nearest edge data center
        
       | tetha wrote:
       | > As you can see, all clients correctly detect it and choose an
       | alternative server.
       | 
       | This is the nasty key point. The reliability is decided client-
       | side.
       | 
       | For example, systemd-resolved at times enacted maximum technical
       | correctness by always returning the lowest IP address. After all,
       | DNS-RR is not well-defined, so always returning the lowest IPs is
       | not wrong. It got changed after some riots, but as far as I know,
       | Debian 11 is stuck with that behavior, or was for a long time.
       | 
       | Or, I deal with many applications with shitty or no retry
       | behavior. They go "Oh no, I have one connection refused, gotta
       | cancel everything, shutdown, never try again". So now 20% - 30%
       | of all requests die in a fire.
       | 
       | It's an acceptable solution if you have nothing else. As the
       | article notices, if you have quality HTTP clients with a few
       | retries configured on them (like browsers), DNS-RR is fine to
       | find an actual load balancer with health checks and everything,
       | which can provide a 100% success rate.
       | 
       | But DNS-RR is no loadbalancer and loadbalancers are better.
        
         | latchkey wrote:
         | > _It 's an acceptable solution if you have nothing else._
         | 
         | I'd argue it isn't acceptable at all in this day and age and
         | that there are other solutions one should pick today long
         | before you get to the "nothing else" choice.
        
           | toast0 wrote:
           | Anycast is nice, but it's not something you can do yourself
           | well unless you have large scale. You need to have a large
           | number of PoPs, and direct connectivity to many/most transit
           | providers, or you'll get weird routing.
           | 
           | You also need to find yourself some IP ranges. And learn BGP
           | and find providers where you can use it.
           | 
           | DNS round robin works as long as you can manage to find two
           | boxes to run your stuff on, and it scales pretty high too.
           | When I was at WhatsApp, we used DNS round robin until we
           | moved into Facebook's hosting where it was infeasible due to
           | servers not having public addresses. Yes, mostly not
           | browsers, but not completely browserless.
        
             | latchkey wrote:
             | Back in 2013, that might have been the best solution for
             | you. But there were still plenty of headlines...
             | https://www.wamda.com/2013/11/whatsapp-goes-down
             | 
             | We're talking about today.
             | 
             | The reason why I said Anycast is cause the vast majority of
             | people trying to solve the need for having multiple servers
             | in multiple locations, will just use CF or any one of the
             | various anycast based CDN providers available today.
        
               | toast0 wrote:
               | Oh sure, we had many outages. More outages on the one
               | service where we tried using loadbalancers because the
               | loadbalancers would take a one hour break every 30 days
               | (which is pretty shitty, but that was the load balancer
               | available, unless we wanted to run a software load
               | balancer, which didn't make any sense).
               | 
               | We didn't have many outages due to DNS, because we had
               | fallback ips to contact chat in our clients. Usage was
               | down in the 24 hours after our domain was briefly
               | hijacked (thanks Network Solutions), and I think we lost
               | some usage when our DNS provider was DDoSed by 'angry
               | gamers'. But when FB broke most of their load balancers,
               | that was a much bigger outage. BGP based outages broke
               | everything, DNS and load balancers, so no wins there.
        
               | latchkey wrote:
               | > We didn't have many outages due to DNS, because we had
               | fallback ips to contact chat in our clients.
               | 
               | Exactly! When you control the client, you don't even need
               | DNS. Things are actually even more secure when you don't
               | use it, nothing to DDoS or hijack. When FB broke one set
               | of LB's, the clients should have just routed to another
               | set of LB's, by IP.
        
               | toast0 wrote:
               | FB likes to break everything all at once anyway... And
               | healtchecking the load balancers wasn't working either.
               | So DNS to regional balancers was sending people to the
               | wrong place, and the anycast ips might have worked if you
               | were lucky, but you might have gotten a PoP that was
               | broken.
               | 
               | The servers behind it were fine, if you could get to one.
               | You could push broken DNS responses, I suppose, but it's
               | harder than breaking a load balancer.
        
         | nerdile wrote:
         | It's putting reliability in the hands of the client, or
         | whatever random caching DNS resolver they're sitting behind.
         | 
         | It also puts failover in those same hands. If one of your
         | regions goes down, do you want the traffic to spread evenly to
         | your other regions? Or pile on to the next nearest neighbor? If
         | you care what happens, then you want to retain control of your
         | traffic management and not cede it to others.
        
       | meindnoch wrote:
       | So half of your content is served from another server? Sounds
       | like a recipe for inconsistent states.
        
         | ChocolateGod wrote:
         | You can easily use something like an object store or shared
         | database to keep data consistent.
        
       | jgrahamc wrote:
       | Hmm. I've asked the authoritative DNS team to explain what's
       | happening here. I'll let HN know when I get an authoritative
       | answer. It's been a few years since I looked at the code and a
       | whole bunch of people keep changing it :-)
       | 
       | My suspicion is that this is to do with the fact that we want to
       | keep affinity between the client IP and a backend server (which
       | OP mentions in their blog). And the question is "do you break
       | that affinity if the backend server goes down?" But I'll reply to
       | my own comment when I know more.
        
         | delusional wrote:
         | > I'll let HN know when I get an authoritative answer
         | 
         | Please remember to include a TTL so I know how long I can cache
         | that answer.
        
           | jgrahamc wrote:
           | Thank you for appreciating my lame joke.
        
         | mlhpdx wrote:
         | So many sins have been committed in the name of session
         | affinity.
        
           | jgrahamc wrote:
           | Looks like this has nothing to do with session affinity. I
           | was wrong. Apparently, this is a difference between our paid
           | and free plans. Getting the details, and finding out why
           | there's a difference, and will post.
        
             | asmor wrote:
             | Well, CEO said there is none, get on it engineering :)
        
       | cybice wrote:
       | Cloudflare results with worker as a reverse proxy can be much
       | better.
        
         | easylion wrote:
         | But won't it add an additional hop hence additional latency to
         | every single request ?
        
           | rodcodes wrote:
           | Nah, because the Cloudflare Workers run at closest edge
           | location and are real fast.
           | 
           | The real solution with Cloudflare is to use their Load
           | Balancing (https://developers.cloudflare.com/load-balancing)
           | which is a paid feature.
        
       | specto wrote:
       | Chrome and Firefox use the OS dns server by default, which in
       | most OS' have caching as well.
        
       | urbandw311er wrote:
       | What a great article! It's often easy to forget just how flexible
       | and self-correcting the "official" network protocols are. Thanks
       | to the author for putting in the legwork.
        
       | zamalek wrote:
       | Take a look at SRV records instead - they are very intentionally
       | designed for this, and behave vaguely similarly to MX. Creating a
       | DNS server (or a CoreDNS/whatever module) that dynamically
       | updates weights based on backend metrics has been a pending pet
       | project of mine for some time now.
        
         | jeroenhd wrote:
         | Until the HTTP spec gets updated to include SRV records, using
         | SRV records for HTTP(S) is technically spec-incompliant and
         | practically useless.
         | 
         | However, as is common with web tech, the old SRV record has
         | been reinvented as the SVCB record with a smidge of DANE for
         | good measure.
        
       | teddyh wrote:
       | One of the early proposed solutions for this was the SRV DNS
       | record, which was similar to the MX record, but for every
       | service, not just e-mail. With MX and SRV records, you can
       | specify a list of servers with associated priority for clients to
       | try. SRV also had an extra "weight" parameter to facilitate load
       | balancing. However, SRV did not want the political fight of
       | effectively hijacking every standard protocol to force all
       | clients of every protocol to also check SRV records, so they
       | specified that SRV should _only_ be used by a client if the
       | standard for that protocol explicitly specifies the use of SRV
       | records. This technically prohibited HTTP clients from using SRV.
       | Also, when the HTTP /2 (and later) HTTP standards were being
       | written, bogus arguments from Google (and others) prevented the
       | new HTTP protocols from specifying SRV. SRV seems to be
       | effectively dead for new development, only used by some older
       | standards.
       | 
       | The new solution for load balancing seems to be the new HTTPS and
       | SVCB DNS records. As I understand it, they are standardized by
       | people wanting to add extra parameters to the DNS in order to to
       | jump-start the TLS1.3 handshake, thereby making fewer roundtrips.
       | (The SVCB record type is the same as HTTPS, but generalized like
       | SRV.) The HTTPS and SVCB DNS record types both have the priority
       | parameter from the SRV and MX record types, but HTTPS/SVCB lack
       | the weight parameter from SRV. The standards have been published,
       | and support seem to have been done in some browsers, but not all
       | have enabled it. We will see what browsers will actually do in
       | the near future.
        
         | jsheard wrote:
         | > The new solution for load balancing seems to be the new HTTPS
         | and SVCB DNS records. As I understand it, they are standardized
         | by people wanting to add extra parameters to the DNS in order
         | to to jump-start the TLS1.3 handshake, thereby making fewer
         | roundtrips.
         | 
         | The other big advantage of the HTTPS record is that it allows
         | for proper CNAME-like delegation at the domain apex, rather
         | than requiring CNAME flattening hacks that can cause routing
         | issues on CDNs which use GeoDNS in addition to or instead of
         | anycast. If you've ever seen a platform recommend using a www
         | subdomain instead of an apex domain, that's why, and it's part
         | of why Akamai pushed for HTTPS records to be standardized since
         | they use GeoDNS.
        
           | teddyh wrote:
           | Oh yes1. This is an advantage shared by all of MX, SRV and
           | HTTPS/SVCB, though.
           | 
           | 1. <https://news.ycombinator.com/item?id=38420555>
        
       | metadat wrote:
       | _> This allows you to share the load between multiple servers, as
       | well as to automatically detect which servers are offline and
       | choose the online ones._
       | 
       | To [hesitantly] clarify a pedantry regarding "DNS automatic
       | offline detection":
       | 
       | Out of the box, RR-DNS is only good for load balancing.
       | 
       | Nothing automatic happens on the availability state detection
       | front unless you build smarts into the client. TFA introduction
       | does sort of mention this, but it took me several re-reads of the
       | intro to get their meaning (which to be fair could be a PEBKAC).
       | Then I read the rest of TFA, which is all about the smarts.
       | 
       | If the 1/N server record selected by your browser ends up being
       | unavailable, no automatic recovery / retry occurs at the protocol
       | level.
       | 
       | p.s. "Related fun": Don't forget about Java's DNS TTL [1] and
       | `.equals()' [2] behaviors.
       | 
       | [1] https://stackoverflow.com/questions/1256556/how-to-make-
       | java...
       | 
       | [2] https://news.ycombinator.com/item?id=21765788 (5y ago, 168
       | comments)
        
         | encoderer wrote:
         | We accomplish this on Route53 by having it pull servers out of
         | the dns response if they are not healthy, and serving all
         | responses with a very low ttl. A few clients out there ignore
         | ttl but it's pretty rare.
        
           | ChocolateGod wrote:
           | I once achieved something similar with PowerDNS, which you
           | can use LUA rules to do health checks on a pool of servers
           | and only return health servers as part of the DNS record, but
           | found odd occurrences of clients not respecting the TTL on
           | DNS records and caching too long.
        
             | tetha wrote:
             | You usually do this with servers that should be rock-solid
             | and stateless. HAProxy, Traefik, F5. That way, you can pull
             | the DNS record for maintenance 24 - 48 hours in advance. If
             | something overrides DNS TTLs that much, there is probably
             | some reason.
        
           | d_k_f wrote:
           | Honest question to somebody who seems to have a bit of
           | knowledge about this in the real world: several (German, if
           | relevant) providers default to a TTL of ~4 hours. Lovely if
           | everything is more or less finally set up, but usually our
           | first step is to decrease pretty much everything down to 60
           | seconds so we can change things around in emergencies.
           | 
           | On average, does this really matter/make sense?
        
             | stackskipton wrote:
             | Lower TTLs is cheap insurance so you can move hostnames
             | around.
             | 
             | However, you should understand that not ALL clients will
             | respect those TTLs. There are resolvers that may minimum
             | TTL threshold where IF TTL < Threshold, TTL == Threshold,
             | Common with some ISPs, and also, there may be cases where
             | browsers and operating systems will ignore TTLs or fudge
             | them.
        
       | hypeatei wrote:
       | The browser behavior is really nice, good to know that it falls
       | back quickly and smoothly. Round robin DNS has always been
       | referred to as a "poor mans load balancer" which it seems to be
       | living up to.
       | 
       | > Curl also works correctly. First time it might not, but if you
       | run the command twice, it always corrects to the nearest server.
       | 
       | This took two tries for me, which begs the question how curl is
       | keeping track of RTT (round trip times), interesting.
        
       | V__ wrote:
       | This seems like a nice solution for zero-downtime updates. Clone
       | the server, add a the specified ip, deny access to the main one,
       | upgrade and turn the cloned server off.
        
       | unilynx wrote:
       | > So what happens when one of the servers is offline? Say I stop
       | the US server:
       | 
       | > service nginx stop
       | 
       | But that's not how you should test this. A client will see the
       | connection being refused, and go on to the next IP. But in
       | practice, a server may not respond at all, or accept the
       | connection and then go silent.
       | 
       | Now you're dependent on client timeouts, and round robin DNS will
       | suddenly look a whole lot less attractive to increase
       | reliability.
        
         | Joe_Cool wrote:
         | Yeah SIG_STOP or just ip/nftables DROP would be a much more
         | realistic test.
        
       | rebelde wrote:
       | I have use round robin for years.
       | 
       | Wish I could add instructions like:
       | 
       | - random choice #round robin, like now
       | 
       | - first response # usually connects to closest server
       | 
       | - weights (1.0.0.1:40%; 2.0.0.2:60%)
       | 
       | - failover: (quick | never)
       | 
       | - etc: naming countries, continents
        
       | edm0nd wrote:
       | The dark remix version of this is fast flux hosting and what a
       | lot of the bulletproof hosting providers use.
       | 
       | https://unit42.paloaltonetworks.com/fast-flux-101/
        
       | stackskipton wrote:
       | As SRE, I get a chuckle out of this article and some of the
       | responses. Devs mess this up constantly.
       | 
       | DNS has one job. Hostname -> IP. Nothing further. You can mess
       | with it on server side like checking to see if HTTP server is up
       | before delivering the IP but once IP is given, the client takes
       | over and DNS can do nothing further so behavior will be wildly
       | inconsistent IME.
       | 
       | Assuming DNS RR is standard where Hostname returns multiple IPs,
       | then it's only useful for load balancing in similar latency
       | datacenters. If you want fancy stuff like geographic load
       | balancing or health checks, you need fancy DNS server but at end
       | of day, you should only return single IP so client will target
       | the endpoint you want them to connect to.
        
         | lysace wrote:
         | I've never ever come up with a scenario where RR DNS is useful
         | in the goal of achieving high availability. I'm similarly
         | mystified.
         | 
         | What can be useful: dynamically adjusting DNS responses
         | depending on what DC is up. But at this point shouldn't you be
         | doing something via BGP instead? (This is where my knowledge
         | breaks down.)
        
           | stackskipton wrote:
           | Yea, Anycast IP like what Cloudflare does is the best.
           | 
           | If you want cheaper load balancing and are ok with some
           | downtime while DNS reconfigures, DNS system that returns IP
           | based on which Datacenter is up works. Examples of this are
           | Route53, Azure Traffic Manager and I assume Google has
           | solution, I just don't know what it is.
        
             | lysace wrote:
             | Worked on implementing a distributed-consensus driven DNS
             | thing like 15 years ago. We had 3 DCs around the world for
             | a very compute-intense but not very stateful service. It
             | actually just worked without any meaningful testing on the
             | first single DC outage. In retrospect I'm amazed.
        
       | realchaika wrote:
       | May be worth mentioning Zero downtime failover is a Pro or higher
       | feature I believe, that's how it was documented before as well,
       | back when protect your origin server docs were split by plan
       | level. So you may see different behavior/retries.
        
       | nielsole wrote:
       | > Curl also works correctly. First time it might not, but if you
       | run the command twice, it always corrects to the nearest server.
       | 
       | I always assumed curl was stateless between invocations. What's
       | going on here?
        
         | barrkel wrote:
         | My hypothesis: he's running on macOS and he's seeing the same
         | behavior from Safari as from curl because they're both using
         | OS-provided name resolution which is doing the lowest-latency
         | selection.
         | 
         | Firefox and Chrome use DNS over HTTPS by default I believe,
         | which may mean they use a different name resolution path.
         | 
         | The above is entirely conjection on my part, but the guess is
         | heavily informed by the surprise of curl's behavior.
        
           | hyperknot wrote:
           | Correct. I'm on macOS and I tried turning off DoH in Firefox
           | and then it worked like Safari.
        
       | jkrauska wrote:
       | Check out what happens when you use IPv6 addresses. RFC 6724 is
       | awkward about ordering with IPv6.
       | 
       | How your OS sorts DNS responses also comes in to play. Depends on
       | what your browser makes DNS requests.
        
       | mlhpdx wrote:
       | Interesting topic for me, and I've been looking at anycast IP
       | services and latency based DNS resolvers as well. I even made a
       | repo[1] for anyone interested in a quick start for setting up AWS
       | global accelerator.
       | 
       | [1] https://github.com/mlhpdx/cloudformation-
       | examples/tree/maste...
        
       | why-el wrote:
       | Hm, I thought Happy Eyeballs (HE) was mainly concerned with IPv6
       | issues and falling back to IPV4. I didn't think it was this RFC
       | in which finally some words were said about round-robin
       | specifically, but it looks like it was (from this article).
       | 
       | Is it true then that before HE, most round-robin implementations
       | simply cycled and no one considered latency? That's a very
       | surprising finding.
        
       | freitasm wrote:
       | Interesting. The author starts by discussing DNS round robin but
       | then briefly touches on Cloudflare Load Balancing.
       | 
       | I use this feature, and there are options to control Affinity,
       | Geolocation and others. I don't see this discussed in the
       | article, so I'm not sure why Cloudflare load balancing is
       | mentioned if the author does not test the whole thing.
       | 
       | Their Cloudflare wishlist includes "Offline servers should be
       | detected."
       | 
       | This is also interesting because when creating a Cloudflare load
       | balancing configuration, you create monitors, and if one is down,
       | Cloudflare will automatically switch to other origin servers.
       | 
       | These screenshots show what I see on my Load Balancing
       | configuration options:
       | 
       | https://cdn.geekzone.co.nz/imagessubs/62250c035c074a1ee6e986...
       | 
       | https://cdn.geekzone.co.nz/imagessubs/04654d4cdda2d6d1976f86...
        
         | hyperknot wrote:
         | I briefly mention that I don't go into L7 Load Balancing
         | because it'd be cost prohibitive for my use case (millions of
         | requests).
         | 
         | Also, the article is about DNS-RR, not the L7 solution.
        
       | bar000n wrote:
       | hey! so i got a cdn for video made of 4 bare metals and 2 are
       | newer and more powerful so i give them each 2 ip addresses from
       | the 6 addresses replied by dns for the respective a record. but
       | from a very diverse pool of devices (proprietary set top boxes,
       | smart tv sets, mobile clients ios and android, web browsers, etc)
       | i still get ~40% of traffic on the older servers instead of the
       | expected 33% given 2 out of 6 ip addresses resolved as dns a
       | records for these hosts. why?
        
       | kawsper wrote:
       | 37signals/Basecamp wrote about something similar on their blog,
       | they saw traffic switching almost immediately:
       | https://signalvnoise.com/posts/3857-when-disaster-strikes and in
       | their comments they said it was hinted that it was just a DNS
       | update with low TTLs.
        
       ___________________________________________________________________
       (page generated 2024-10-26 23:00 UTC)