hngopher.com

       [HN Gopher] Akamai Edge DNS was down
       ___________________________________________________________________
        
       Akamai Edge DNS was down
        
       Author : vhab
       Score  : 444 points
       Date   : 2021-07-22 16:12 UTC (6 hours ago)
        
 (HTM) web link (edgedns.status.akamai.com)
 (TXT) w3m dump (edgedns.status.akamai.com)
        
       | swarnie_ wrote:
       | I love seeing these issues reverberate around the internet.
       | 
       | This time i think /r/sysadmin pegged the issue first, great sub.
        
       | geocrasher wrote:
       | People don't believe me when I say how much DNS matters. So I
       | wrote a song about it.
       | 
       | https://soundcloud.com/ryan-flowers-916961339/dns-to-the-tun...
        
         | wpasc wrote:
         | dns, DNS, dns, dns. The start of every process, dns.
         | 
         | Love this.
        
           | geocrasher wrote:
           | Thank you! I'm glad that landed where I wanted it to. It was
           | a lot of fun to put together. I keep threatening to make a
           | video. I need a collection of DNS memes so that I can just
           | sideshow them.
        
             | wpasc wrote:
             | Haha please do, a video would be great. Your song reminded
             | me of the song: "Find the Longest Path" which you may get a
             | kick out of:
             | 
             | https://www.youtube.com/watch?v=a3ww0gwEszo
        
         | mvanbaak wrote:
         | Awesome! Thank you.
        
         | patleeman wrote:
         | I just teared up
        
           | geocrasher wrote:
           | LOL! great comment thank you!
        
         | ricardo81 wrote:
         | Brilliant. NOERROR for this.
        
         | brianjking wrote:
         | lol, thanks for the laugh.
        
         | zyberzero wrote:
         | Thank you!
        
         | southerntofu wrote:
         | Sounds amazing! Do you maybe have a direct link? Soundcloud
         | doesn't want us privacy-conscious users browsing their website
         | :(
        
         | Frost1x wrote:
         | This made my day, thanks!
        
           | geocrasher wrote:
           | And this, mine! Thanks!
        
         | [deleted]
        
         | kevando wrote:
         | lol
        
         | pololee wrote:
         | Thank you!! lol
        
         | dolni wrote:
         | > People don't believe me when I say how much DNS matters.
         | 
         | That's weird to me. I have been working in sysadmin/DevOps for
         | over a decade, but it did not take me very long to learn that
         | DNS outages cause massive problems.
        
           | geocrasher wrote:
           | Right, but everybody has to learn that at some point. And I
           | happen to be somebody who teaches such things. The importance
           | of DNS is hard to overstate, but I go to great lengths to do
           | exactly that, to make a point ;)
        
       | throwawaysha wrote:
       | I ran DNS servers, among other things, in the late 90s with
       | better uptime than these "multi-DC/AZ/geo redundant" services
       | everyone uses these days.
        
       | memco wrote:
       | Was just browsing a website where the first page of a query
       | worked, but visiting page 2 of the results was returning a DNS
       | error. Was curious how and why only part of the site was down,
       | but it looks like this was the problem as now the whole site is
       | down.
        
         | katbyte wrote:
         | aren't short DNS TTLs great?
        
           | sebmellen wrote:
           | Is this a serious argument for long TTLs? Always wondered why
           | they exist... How interesting.
        
             | slim wrote:
             | Yes it is. The longer the TTL the longer you stay
             | independent from third parties. It's what makes the
             | internet stable.
        
               | remram wrote:
               | Long TTL makes you independent from DNS third parties, in
               | that your name is still know by clients if DNS is down.
               | 
               | Short TTL makes you independent from hosting third
               | parties, in that you can quickly change which hosting
               | provider your domain name points to.
               | 
               | You can't win this one by only changing your TTL. The
               | best solution is to use short TTLs and multiple
               | nameservers on different providers.
        
       | lowbloodsugar wrote:
       | So many sites being reported as down, but change your DNS to
       | something else (e.g. Google 8.8.8.8 and 8.8.4.4) and, after
       | flushing your DNS cache, the sites are available. I was unable to
       | get to ups.com or newegg.com (why yes, I am expecting a new toy),
       | but after switching DNS and flushing DNS cache, I was able to get
       | to both.
       | 
       | Specifically, 1.1.1.1 provided bad addresses (as opposed to no
       | addresses), and removing 1.1.1.1 fixed my problem. By then it had
       | returned a bunch of bad addresses and I had to flush my DNS
       | cache.
        
       | didjathinkmess wrote:
       | Cyberpolygon already? Thought we had at least a month or two
        
         | penultimatebro wrote:
         | Shh, normies are not ready for that.
         | 
         | It's just a completely random DNS outage, nothing more.
        
       | gianpaj wrote:
       | https://www.interactivebrokers.co.uk/ , a Trading Platform, is
       | also down as well :(
       | 
       | How am I going to sell my AMC stock...
        
         | swarnie_ wrote:
         | You don't, you hold the dumb, over priced stock as a reminder
         | for future, better informed investing.
        
       | thunfisch wrote:
       | Yep, all our EdgeDNS zones as well as DSD edgekeys are just
       | returning SERVFAILS. Many big german websites are down right now.
        
         | zhdc1 wrote:
         | Several unrelated websites I was trying to visit are down. I
         | figured I would find the answer on HN : )
        
           | mariusseufzer wrote:
           | Same haha
        
       | cbeley wrote:
       | I wonder if this is why LastPass is down. It has completely
       | locked me out of my vault. You'd think it'd continue to work
       | offline in a case like this. :/
        
         | zxcvbn4038 wrote:
         | When it comes to password managers, 1password is the one to
         | beat. Much better experience in every regard.
        
         | eunai wrote:
         | I switched to BitWarden and haven't looked back. You can use it
         | on the phone and pc (browser). As well as a desktop client.
        
           | fredski42 wrote:
           | And with vaultwarden you can go self hosted with a very
           | lightweight server written in rust.
        
             | AnIdiotOnTheNet wrote:
             | Switched to vaultwarden at work for password management,
             | only have minor gripes so can recommend.
        
           | benburleson wrote:
           | Yeah, my path was LastPass -> Bitwarden -> 1Password.
           | 
           | Both Bitwarden and 1Password are great.
        
             | decrypt wrote:
             | Same path. It'll be very hard to move away from 1Password.
             | App experience, sync, security features like key in
             | addition to master password, family organizer-based
             | recovery of an account, these are a few things that stand
             | out.
        
               | raffraffraff wrote:
               | I prefer the browser addon for bitwarden over 1Password.
               | Try editing a site in 1Password. It forces you to log
               | into the full sir, whereas bitwarden can do almost
               | everything right there in the addon.
        
               | judge2020 wrote:
               | This is also possible with the 1Password X extension,
               | however there's a lot of feature segmentation and unclear
               | messaging between the Desktop app-based version and
               | 1Password X so I don't blame you for using the old one.
        
               | macintux wrote:
               | Yeah, I use 1Password for every critical bit of
               | information (SSN numbers, physical access codes) and a
               | whole lot of less-critical stuff. I expect to be a
               | customer for life.
        
               | revscat wrote:
               | Can you explain what family organizer-based recovery
               | means? It sounds like dad or mom could recover a kids
               | password?
        
               | eddieroger wrote:
               | That's about right for what it is, or at least how I
               | think about it. There's no magic "unlock vault" button
               | (by design), but an Organizer can kick off a workflow to
               | reset a vault if need be. I have a few of the more tech-
               | savvy family members set as organizers in my family in
               | case something ever happens to me.
        
               | chewmieser wrote:
               | https://support.1password.com/recovery/
        
               | chewmieser wrote:
               | My favorite feature personally is the built-in 2FA
               | support. Click and it logs into your account and copies
               | the 2fa code to clipboard so just paste on next screen.
               | 
               | Multiple vaults too is nice but I know others have ways
               | to limit exposure of passwords in similar manners.
        
               | arnado wrote:
               | Bitwarden offers this as well, but I don't really
               | understand why you would want it. If someone compromises
               | your password manager, 2FA is now worthless. Or am I
               | misunderstanding how it works?
        
               | decrypt wrote:
               | Your understanding is correct. 1Password requires a key
               | in addition to the master password. And finally,
               | 1Password can have 2FA for itself, which is stored on my
               | Authy. These are reasons why I am comfortable storing my
               | 2FA codes on it.
               | 
               | Bitwarden has 2FA support too, but does not have the
               | unique key feature that 1Password has.
        
             | JonathanMerklin wrote:
             | Then what was the impetus to switch off of Bitwarden?
        
       | nowahe wrote:
       | I'm in the middle of a migration from Akamai to Cloudfront, time
       | to take a break I guess
        
       | blondie9x wrote:
       | Looks like it is fixed now!
        
       | [deleted]
        
       | fredski42 wrote:
       | I thought DNS was supposed to be resilient
        
         | topspin wrote:
         | DNS is _designed_ to be fault tolerant. Such a design, however,
         | is often not leveraged correctly; the implementation of DNS can
         | be and frequently is subject to SPOFs.
        
       | realSaddy wrote:
       | This is affecting Steam as well
        
         | ssully wrote:
         | It is impacting a lot of things: https://downdetector.com/
        
       | 00deadbeef wrote:
       | Well it's been an hour now since I first noticed the effects and
       | their service status still has no useful information or ETA for a
       | fix. It's just an "emerging issue".
        
         | jonnyone wrote:
         | The affected sites that I use are now working. Check again.
        
       | bpye wrote:
       | This is apparently why I can't book my COVID vaccine
       | appointment...
        
         | _joel wrote:
         | Yes, was trying to do the same. Getting this 2nd jab has been a
         | nightmare. Places listed as walk-in having Moderna, don't and
         | they ran out of it when I went to get my secheduled jab.
         | Ringing 119 just ends up in a dead line, then this outage. Fun.
        
       | Scoundreller wrote:
       | All yuor data are belong to us
        
       | cbono1 wrote:
       | Why would Google and Amazon be on the downdetector list or
       | experiencing issues? Don't they have their own DNS / nameservers
       | separate from Akamai?
        
         | sathackr wrote:
         | because the way downdetector works is it just basically counts
         | how many people are searching/visiting for <site> down and if
         | it's much higher than typical it flags the site as down.
         | 
         | So if everyone searched "is google down" and visited the link
         | on downdetector that was returned in the search, that would add
         | to the downdetector count for that site.
         | 
         | Downdetector doesn't actually know if the site is up or down.
        
           | brentm wrote:
           | A more proper name might be PeopleThinkItsDownDetector.com
        
             | cbono1 wrote:
             | Not nearly as SEO friendly
        
           | k1t wrote:
           | I found this hard to believe, but it's correct.
           | 
           |  _Downdetector only reports an issue if a significant number
           | of users are impacted. To that end, Downdetector calculates a
           | baseline volume of typical problem reports for each service
           | monitored, based on the average number of reports for that
           | given time of day over the last year. Downdetector's incident
           | detection system compares the current number of problem
           | reports to this baseline and only reports an issue if the
           | current volume significantly exceeds the typical volume of
           | reports._
           | 
           | https://www.speedtest.net/insights/blog/how-downdetector-
           | wor...
        
           | mc32 wrote:
           | So how do they reset status? The number of queries going down
           | signifies return to normal status?
        
             | dylan604 wrote:
             | Some CEO calls another CEO and makes a deal?
        
       | Eikon wrote:
       | This is affecting apple as well
       | 
       | https://www.apple.com/go/
        
         | iruoy wrote:
         | For some reason that url doesn't work for me, but
         | https://www.apple.com/ and https://www.apple.com/nl/ do.
        
         | remram wrote:
         | That fails with a 404 for me, which is probably not related to
         | DNS at all?
         | 
         | archive.org seems to indicate there was never anything there...
        
       | mvanaltvorst wrote:
       | What role does Akamai Edge DNS play in normal internet traffic?
       | DNS responses usually get cached, as far as I understand
       | correctly. And it is usually possible to change your DNS server
       | to e.g. Google's and circumvent the outage. Does Akamai Edge DNS
       | play a role on the server side?
        
         | carlsborg wrote:
         | Looks like this: the affected subdomains are CNAMEd to the
         | akamai CDN, and the Nameserver for those are/were down.
         | 
         | So for example:
         | 
         | Top level domain for nvidia resolved fine..
         | 
         | dig @1.1.1.1 nvidia.com => status: NOERROR, Nameservers are
         | ns6.dnsmadeeasy.com
         | 
         | But the website didnt. dig @1.1.1.1 www.nvidia.com => status:
         | SERVFAIL,
         | 
         | The Nameserver for the this www.nvidia resolved to the akamai
         | nameserver which had a problem..
         | 
         | dig @1.1.1.1 www.nvidia.com NS => CNAME
         | e33907.a.akamaiedge.net.
        
         | NeckBeardPrince wrote:
         | > What role does Akamai Edge DNS play in normal internet
         | traffic?
         | 
         | Clearly a big one.
        
         | r1ch wrote:
         | The trend these days are DNS TTLs of 60 - 300 seconds, to allow
         | "Cloud agility" or something, so sites are exposed to a much
         | larger risk of authoritative nameservers going down.
        
           | jameshart wrote:
           | You say that like it's a bad idea.
           | 
           | Services like Akamai use short TTLs for their edge services
           | for a variety of reasons, not least because if one of their
           | edge servers goes offline (for planned or unplanned reasons)
           | it lets them sub in a new one and have it receive traffic
           | immediately, rather than have a bunch of clients continue
           | trying to talk to a dead node. So sure, you can increase
           | those TTLs to trade 'what if the DNS server goes down?' risk
           | with 'what if the edge server goes down?' risk...
           | 
           | But keeping the edge servers up and running is probably a lot
           | harder - they need to scale more to handle traffic load, they
           | have to actually handle client data, TLS termination, much
           | more complex configuration.... so if I'm placing bets on
           | which of those things is more likely to die on me, it's the
           | edge node, not the DNS server.
        
         | uncertainrhymes wrote:
         | If you use a CDN to front your traffic, you need the CNAME for
         | www (or whatever) to be pointing at their DNS infrastructure,
         | so they can return whichever closest POP is going to serve your
         | traffic.
         | 
         | e.g. dig @1.1.1.1 www.nvidia.com +trace
         | 
         | ... various things from the root ...
         | 
         | www.nvidia.com. 7200 IN CNAME www.nvidia.com.edgekey.net. ;;
         | Received 83 bytes from 208.94.148.13#53(ns5.dnsmadeeasy.com) in
         | 35 ms
         | 
         | So the main DNS is fine, but it'll never get an A record
         | because the last link in the chain is toast -- edgekey being
         | Akamai in this case, but all CDNs do this so they can route
         | traffic. Normally, this is a good thing so they can shift
         | traffic within 30 seconds on their side. Unfortunately, it also
         | means it would take nvidia an two hours to point away from
         | Akamai.
        
       | _joel wrote:
       | So that's why the NHS website is down
        
         | jamespwilliams wrote:
         | Back up now by the looks of it
        
       | [deleted]
        
       | tyingq wrote:
       | You can see this on a lot of sites right now. You get the Akamai
       | style error with something like:                 Reference:
       | #11.453a2f17.1393u44848484.3aee33433
       | 
       | At the bottom of a very bland looking error page.
        
         | halfmatthalfcat wrote:
         | You could argue Akamai is the blandest of the CDN bunch; their
         | UIs are atrocious.
        
           | chrisweekly wrote:
           | Their APIs are (or, were, last I suffered their use a few
           | years ago) also terrible, eg blanket policy of refusing to
           | cache any resource in the presence of "Vary" header,
           | regardless of its value, and failure to honor standard HTTP
           | headers... thankfully there are many other options for CDN,
           | which are SO MUCH BETTER.
        
             | youngtaff wrote:
             | Surely it depends what you vary on?
             | 
             | Content-Encoding should be well supported, User-Agent less
             | so and for very good reasons (there's too much variation in
             | UA strings)
        
               | acdha wrote:
               | It wasn't that simple -- IIRC, for a while Vary meant
               | "don't cache anything, ever, under any circumstances"
               | unless you made some custom configuration changes. Over
               | time they _added_ support for just "Vary: Accept-
               | Encoding" (IIRC less than a decade ago) and that was
               | fragile. They improved that over time but it was painful
               | for a number of years because there were various failure
               | modes which meant things wouldn't be cached, or (IIRC)
               | compression would be disabled for certain URLs
               | sporadically if the first request for the option did not
               | request transfer compression.
        
               | judge2020 wrote:
               | https://learn.akamai.com/en-us/webhelp/adaptive-media-
               | delive...
               | 
               | > AMD automatically strips these headers out of requests
               | to support caching for faster delivery.
               | 
               | > I need the Vary HTTP headers: AMD can cache the
               | associated object if the Vary HTTP header contains only
               | "Accept-Encoding" and "Gzip" is present in the Content-
               | Encoding header
               | 
               | (AMD in this case standing for Akamai Media Delivery)
        
             | zxcvbn4038 wrote:
             | Akamai is their own worst enemy most of the time. Their
             | prices are the highest, they trail on features, their
             | documentation opaque, it takes an hour to propagate
             | changes, etc. Only a few years ago you could only use SSL
             | if you purchased their ridiculously expensive pci-dss plan
             | - I thought they would defend that to their grave.
             | 
             | Better alternatives are Cloudflare, Fastly, AWS CloudFront.
             | 
             | Google Cloud CDN always seems to have very good latency but
             | a very bare bones feature set and no edge compute I can
             | identify. Support is always a huge red mark for Google
             | anything.
        
           | dylan604 wrote:
           | yeah, but only tech nerds see it, so it's okay. maybe it's a
           | ploy to get the users to go to the real command set via CLI.
           | make it so shitty nobody wants the UI, and goes back to the
           | terminal. "if you're not a CLI ninja, then you shouldn't be
           | using our product anyways!"
        
         | lowbloodsugar wrote:
         | What's frustrating is that DNS is returning an address, instead
         | of just failing, and so macos is caching that value (though it
         | might be cloudflare doing that).
        
           | space_ghost wrote:
           | Wildcard DNS should be a prosecutable crime, punishable by no
           | less than 20 years of hard labor. (Edit: Probably should have
           | made it clear that this was a joke)
        
             | gokhan wrote:
             | Wildcard DNS helps me to handle multitenancy easily. What's
             | wrong with it?
        
             | dylan604 wrote:
             | When did congress members start posting to HN?
        
             | adamdoran wrote:
             | Presumably you're referring to the practice of answering
             | queries for nonexistent records with an A record belonging
             | to an advertisement page? (instead of doing the right thing
             | answering NXDOMAIN, presuming no records of another type
             | also exist for the queried name.)
             | 
             | dnsmasq has a really useful feature for dealing with this:
             | --bogus-nxdomain
        
             | breakingcups wrote:
             | I don't see how wildcard DNS is related to this? Nor how
             | it's bad?
        
           | LeoPanthera wrote:
           | To empty the macOS DNS cache:
           | 
           | sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
        
       | dbsmith83 wrote:
       | https://downdetector.com/archive
       | 
       | So many sites down... and unfortunately not one of them is
       | Twitter
        
         | cpgeier wrote:
         | Amazing that down detector manages to stay up during these
         | kinds of outages. Noticed it has been a little slow but they
         | really have done a good job keeping it up even though large
         | portions of the internet is down right now.
        
           | mindcrime wrote:
           | Who detects if Down Detector is down? Is there a
           | isdowndetectordown.com site?
        
             | cube00 wrote:
             | "I dunno. Coast Guard?"
        
             | SahAssar wrote:
             | Sounds like when Fuckedcompany put itself on Fuckedcompany.
        
             | mcintyre1994 wrote:
             | It's parked by GoDaddy, but unfortunately their website is
             | fubar by this outage if you try to click through to see how
             | much they want for it :)
        
             | ksec wrote:
             | I guess the mother of all Network Downtime checker is HN.
        
         | dheera wrote:
         | Is there a way to tell your system to fall back to the last
         | known IP address if DNS server isn't reachable?
         | 
         | Basically soft-invalidate your local DNS cache but it back from
         | the cache graveyard if DNS is down.
        
           | elithrar wrote:
           | You could run a local resolver like dnsmasq or Unbound that
           | can "serve stale" on upstream failures, but that assumes the
           | DNS failure is a client-facing resolver one.
           | 
           | From what I observed here, it was more internal DNS related:
           | Newegg was serving an opaque "DNS failure" error page _from
           | Akamai's front-end_ which is likely because their infra was
           | failing to resolve names internally.
        
           | bombcar wrote:
           | It should be possible to set your cache so it lives forever
           | but still checks for a new IP at normal expiring time.
        
           | [deleted]
        
           | TimWolla wrote:
           | Unbound has a 'serve-expired' option: https://nlnetlabs.nl/do
           | cumentation/unbound/unbound.conf/#ser...
        
         | mcintyre1994 wrote:
         | It's interesting that they report an AWS outage but there don't
         | seem to be any issues there. Looks like their methodology is a
         | bit too reliant on those speculative tweets from the first 5
         | minutes of all these sites going down.
         | https://downdetector.com/status/aws-amazon-web-services/
         | 
         | > So many websites are down, are AWS servers down or something?
         | 
         | > Amazon web services is down which is affecting a lot of
         | company web sites and services. Not sure what is going on.
         | 
         | > Miss us? @aldotcom and a whole bunch of other folks have been
         | knocked off the internet by what appears to be an AWS
         | attack/system failure. We'll be back. ?
        
           | mandelbrotwurst wrote:
           | It's just based on user reports, so this is people
           | mischaracterizing it as an AWS outage.
        
             | jacob019 wrote:
             | cloudfront was down too
        
             | mcintyre1994 wrote:
             | Yep that's my point. I'm guessing that for a lot of sites
             | they can verify if there's an outage pretty easily when
             | they see a spike in reports, but for something like AWS
             | unless they updated their status page (lol) or downdetector
             | ran a bunch of stuff on there just to check with, I guess
             | they don't have a good way to verify it.
        
               | mandelbrotwurst wrote:
               | Gotcha, yeah I guess I always just considered that out of
               | scope for their service and that it's just a report
               | aggregator but I suppose you would expect it to be at
               | least a little bit clever based on the "detector" name
        
         | 1f60c wrote:
         | > Unfortunately not one of them is Twitter
         | 
         | Please keep comments like this off HN
        
         | grawprog wrote:
         | You got your wish, looks like Twitter's on the list now too.
        
       | jdlyga wrote:
       | Oops, someone unplugged the DNS machine
        
       | mvelie wrote:
       | Akamai believes they have it fixed. We've seen our traffic return
       | to normal. https://twitter.com/Akamai/status/1418251400660889603
        
         | roody15 wrote:
         | hmmm does not appear fixed here in the Midwest
        
       | 00deadbeef wrote:
       | Figured this out almost 30 minutes before they bothered to update
       | their status page.
        
       | SjorsVG wrote:
       | Many bank systems are disrupted by this in the Netherlands
        
         | ricardo81 wrote:
         | My UK bank (HBOS) seemed to have 'online banking unavailable'
         | though their site was up. No doubt related.
        
       | SjorsVG wrote:
       | Many banks in the Netherlands are affected by this.
        
       | schemathings wrote:
       | Possibly related .. Verizon peering issues / ASN701 at Equinix
       | NY2 in Secaucus NJ
        
       | rvz wrote:
       | Probably Akamai needs to use Kubernetes.
       | 
       | EDIT: So HN can't even take a joke after this? [0]
       | 
       | [0] https://news.ycombinator.com/item?id=27893482
        
         | mdtancsa wrote:
         | Sheesh, So yesterday! :)
        
         | unemphysbro wrote:
         | come on, this is funny. HN needs to lighten-up.
        
         | whitepoplar wrote:
         | Probably caused by Kubernetes
        
           | rvz wrote:
           | That's even worse if true; despite HNers creating a storm in
           | a tea cup on DOSing a blog of a service not using K8s when
           | having a blog is not their main service. [0].
           | 
           | Either way, the joke's is now on the HNers in that thread.
           | 
           | [0] https://news.ycombinator.com/item?id=27893482
        
       | knaik94 wrote:
       | I am surprised financial institutions don't have any regulation
       | for redundancy. The one that stuck out to me is the Navy Federal
       | Credit Union website being down. I have not had any issues
       | logging into mobile though for some of the reported sites.
        
         | toomuchtodo wrote:
         | Commercial banks are held to a different operational resiliency
         | standard than financial infrastructure.
         | 
         | (a component of my consulting work is reporting to financial
         | regulators for institutions)
        
         | deckard1 wrote:
         | this is prime shit Hacker News says right here. Wait until you
         | learn banks close on Sunday. Or have maintenance windows for
         | their website, ATM, etc.
        
         | christophilus wrote:
         | I'm not sure how easy it would be to regulate. But yeah. I've
         | got a few short term trades in my brokerage account, and
         | outages really throw a wrench into those.
        
           | xyzzy21 wrote:
           | The way regulate is like anything else: if they fail to meet
           | QoS uptimes, they get fined in 6-8 figures for every minute
           | of loss.
        
         | cryptoz wrote:
         | All major Canadian banks were down.
        
         | Terretta wrote:
         | > _financial institutions don 't have any regulation for
         | redundancy_
         | 
         | As CTO of a bank, I wasn't aware of this. So either we wasted a
         | ton of money and time constantly upgrading redundancy and
         | business continuity technologies to satisfy our regulators...
         | or this statement could be mistaken.
        
         | brentm wrote:
         | CapitalOne has a broken login which is pretty surprising to me.
        
       | [deleted]
        
       | tjpnz wrote:
       | Just got booted out of Netflix on the PS4 because the console
       | could no longer connect to Sony's license server. Netflix was
       | working just fine by the way.
        
         | vmception wrote:
         | Ah thats whats going on. Happened to me as well, I just assumed
         | that Sony is neglecting PS4 performance with its new system,
         | while bogging it down with bloatware.
        
         | lxgr wrote:
         | Was the app installed/running using a secondary PSN account by
         | any chance? This shouldn't be happening on a primary
         | account/console pair.
        
           | tjpnz wrote:
           | It should be my primary although I've often seen it revert
           | back after setting it. I did try setting it as my primary
           | again but you know.
        
         | hackerbrother wrote:
         | Yup, I learned Hulu on Xbox One relies heavily on some
         | Microsoft authentication during a recent Office 365 or Azure
         | outage (not sure which).
        
       | [deleted]
        
       | simonswords82 wrote:
       | I'm sick and tired of these types of services (I'm looking at you
       | too Cloudflare) going down and taking otherwise healthy websites
       | down with them.
        
         | ceejayoz wrote:
         | Most websites using Akamai aren't gonna be "otherwise healthy"
         | without the CDN handling most of the load.
        
         | tootie wrote:
         | It was fastly last time.
        
           | simonswords82 wrote:
           | True but cloudflare have been guilty of downtime too.
        
             | ceejayoz wrote:
             | There aren't many sites that aren't, including "otherwise
             | healthy websites" hosted without a CDN.
        
             | TheSwordsman wrote:
             | I think this is a factually true statement if your business
             | uses any computers. ;)
        
         | sammy2244 wrote:
         | Cloudflare hasnt had an outage in a long time. And when they do
         | they are upfront about it, and post a detailed post-mortem.
        
       | davidjgraph wrote:
       | Serious question, has anyone properly solved the issue of DNS as
       | a single point of failure?
        
         | tyingq wrote:
         | It's an interesting question, as it's always been solved on the
         | server side. All of the current problem is client side. That
         | is, client resolvers that aren't using diverse providers, and
         | only do things like round-robin with long timeouts.
        
           | kokey wrote:
           | Anycast for the DNS IPs deals with most of the problems of
           | clients not failing over elegantly when their primary DNS
           | server is broken.
        
             | citrin_ru wrote:
             | From a client (DNS recursor) point of view there is no
             | primary server. There is just multiple NS records which are
             | equal. If one of them is down it can introduce resolving
             | delays, but they are usually small. At least if something
             | like Unbound or Bind is used. Unbound e. g. maintains
             | infra-cache where it tracks RTT and errors for each server
             | and avoid servers which are down.
        
         | hk1337 wrote:
         | The problem isn't DNS though, is it? The problem is that people
         | don't necessarily use the redundancies on DNS?
         | 
         | The whole reason it takes a domain 24h to fully work with DNS
         | is because it propagates the information other DNS servers,
         | thus making not be a centralized service.
        
           | unilynx wrote:
           | That differs per TLD though. In .nl updates are usually fully
           | processed within the hour (they update the zone file twice
           | per hour)
        
           | jameshart wrote:
           | DNS doesn't 'propagate' except in the very limited case of
           | zone-transfer publication, which... nobody really relies on
           | these days. Registrars tell you it takes 24 hours to
           | propagate to stop you from complaining to them about your
           | ISP's DNS caching policy. The reality is: recursing DNS
           | servers have caches, they respect TTLs, and for the most part
           | that means that DNS changes should fully wash through within
           | an hour for most changes (less if you keep your TTLs
           | shorter).
        
         | arberx wrote:
         | Yes: https://ens.domains/
        
         | jakeschaeffer wrote:
         | https://handshake.org is the only project I've seen that
         | actually solves the issue with a decentralized root zone file.
         | 
         | https://namebase.io is a "registrar" for it.
        
           | airstrike wrote:
           | Why does this need to have the whole NFT / crypto / auction
           | angle?
           | 
           | https://learn.namebase.io/starting-from-zero/how-to-get-a-
           | na...
           | 
           | This is so convoluted it actually makes the whole thing a
           | non-starter
        
             | fwip wrote:
             | Decentralized control of a centralized finite resource
             | (domain names) requires consensus. For example, Joe Smith
             | and Joe Blow both want joe.com.
             | 
             | You want a protocol that gives consistent "global" state
             | without any centralized / trusted users -
             | blockchain/bitcoin is one of the only technical solutions
             | to provide that.
             | 
             | I agree that it's a garbage solution in practice, but
             | that's why it's got cryptoshit bundled in.
             | 
             | A potential different solution to DNS monopoly, if that is
             | a problem that needs solving, is multiple name-resolution
             | providers that have differing records on what name points
             | where. (The tradeoff is that an owner may need to register
             | their name with multiple different providers).
        
         | sakisv wrote:
         | Depending on what point you draw the line of "single point of
         | failure" you could use multiple providers for your dns.
         | 
         | GOV.UK for example uses both aws and gcp for DNS
        
           | davidjgraph wrote:
           | So, NS entries pointing to both? But then take the example
           | your domain was in Route53 and AWS goes down. You can't
           | configure the NS entries to avoid AWS DNS servers. Is the
           | idea that child DNS servers detect the outage and cache the
           | values in the name server(s) that remain up?
           | 
           | But then, the cached values from AWS take a while to clear,
           | TTL never seems to be applied properly. It always feels like
           | the worst case in such a scenario is you can point everyone
           | at the right thing within 24 hours.
        
             | corobo wrote:
             | Have them all hot and live rather than any sort of failover
             | system. Keep everything in sync with OctoDNS or similar
             | 
             | https://github.com/octodns/octodns
             | 
             | DNS is fastest first* rather than main/failover. If AWS DNS
             | was down your GCP DNS would have replied (if all is well)
             | sooner than {timeout} so your visitor would still have a
             | response
             | 
             | * Sort of. I think if the client doesn't get a reply from
             | the server it picked randomly in 1s they move on to the
             | next server, repeat until all fail
        
             | NotEvil wrote:
             | Ibthink if route53 was down. Your dns provider whouldn't
             | able to go there. So it will go to the root who will give
             | gcp one too. So your dns provider might try that.
             | 
             | (I don't know if this is how it works, but I thibk that's
             | how it supposed to work)
        
               | zxcvbn4038 wrote:
               | You typically have four name servers for a domain, but
               | they don't all have to be hosted with the same company.
               | Very handy when your DNS provider decides to brag they
               | are unhackable and the hackers reply by immediately
               | hacking them followed by DDoSing them to death.
        
             | tpetry wrote:
             | You set both services in your ns records. So every day they
             | share the load for dns resolution. If one day one of them
             | is down the client can/will use a different nameserver from
             | your configuration.
        
             | wongarsu wrote:
             | Configuring two NS entries is pretty standard, so surely
             | most resolvers try one of the two, and if it's down try the
             | other one? What else would be the point of having multiple
             | nameservers? Then you just have to get two nameserver
             | providers and make sure their settings stay synced, and
             | point your domain to one nameserver from each.
             | 
             | Of course that requires the server to properly fail, i.e.
             | stop responding to requests. That doesn't seem to be the
             | case here
        
           | paradite wrote:
           | Last time I tried setting NS to both cloudflare and digital
           | ocean in my domain registry, cloudflare sent me an email
           | saying the configuration is invalid and asked me to revert.
           | Am I doing something wrong?
        
           | gregsadetsky wrote:
           | gov.uk's traffic seems to be handled by Fastly, a well known
           | CDN.
           | 
           | What I'm a bit surprised / unsure of is what happens when I
           | run "dig ns gov.uk". The results are:                 gov.uk.
           | 21559 IN  NS  ns1.surfnet.nl.       gov.uk.     21559 IN  NS
           | auth50.ns.de.uu.net.       gov.uk.     21559 IN  NS
           | ns3.ja.net.       gov.uk.     21559 IN  NS  ns2.ja.net.
           | gov.uk.     21559 IN  NS  ns0.ja.net.       gov.uk.     21559
           | IN  NS  auth00.ns.de.uu.net.       gov.uk.     21559 IN  NS
           | ns4.ja.net.
           | 
           | Who is ja.net , uu.net and surfnet.nl ..?
           | 
           | EDIT: I see that ja.net i.e. jisc.ac.uk "manages the second
           | level domain .gov.uk" -- https://www.jisc.ac.uk/domain-
           | registry . I imagine that uu.net and surfnet.nl are there for
           | redundancy
        
             | PaywallBuster wrote:
             | whois ja.net         Domain Name: JA.NET         Registry
             | Domain ID: 499794_DOMAIN_NET-VRSN         Registrar WHOIS
             | Server: whois.demys.com         Registrar URL:
             | http://www.demys.com
             | 
             | "Demys is a leading provider of corporate domain name
             | management and an ICANN accredited registrar"
             | whois uu.net         Domain Name: UU.NET         Registry
             | Domain ID: 5486163_DOMAIN_NET-VRSN         Registrar WHOIS
             | Server: whois.markmonitor.com
             | 
             | surfnet is just an ISP in Netherlands
             | 
             | https://www.surf.nl/
        
               | gregsadetsky wrote:
               | Thanks
               | 
               | Is it possible to see if/where is gov.uk using GCP or AWS
               | for its domain zones? From what I can see -- that's not
               | the case? Or am I looking in the wrong place?
        
               | PaywallBuster wrote:
               | I think you did the right query, maybe they're using it
               | for different domain names?
        
         | grishka wrote:
         | And then there are Cloudflare and other Centralized Downtime
         | Networks as another point of failure.
        
           | andoma wrote:
           | Loled at this.
        
         | toddh wrote:
         | You can still hardcode IP addresses. Not sure most people
         | realize DNS isn't actually needed, you know, except for
         | convenience and all that.
        
           | tyingq wrote:
           | The "Host:" header in http[s] pretty much killed that. Half
           | the internet would be a Cloudflare error page if we moved
           | back to ip addresses :)
        
         | citrin_ru wrote:
         | It is relatively easy to make DNS highly redundant: just put
         | multiple DNS server in data-centers which are as independent as
         | possible (different geo locations, different ISPs). You can
         | also use different DNS software and different OS (say
         | BSD+Linix) to exclude correlated bugs. Root DNS server AFAIK
         | use different software for this reason.
         | 
         | Problems starts when you want to easy make frequent changes and
         | introduce complex software to manage DNS zones (and complexity
         | usually comes with bugs).
        
       | foobarbazetc wrote:
       | Absolutely amazing how many billion $+ companies are single homed
       | for DNS.
       | 
       | I wonder how much they spend on multi-AZ redundant
       | architectures...
        
         | zxcvbn4038 wrote:
         | Most CDNs offer huge incentives for sending them more traffic,
         | a lot of time you end up in a contract obligated to handle X
         | requests and Y gigabytes of traffic per month. But personally I
         | believe you should never have a single provider for anything -
         | particularly when it's acceptable for a company to cut you off
         | with no warning or recourse.
        
         | nexuist wrote:
         | Might be survivorship bias. Multi-AZ arch protects against all
         | other failures, so the only one that remains visible to the
         | outside world is DNS.
        
         | toast0 wrote:
         | Using multiple providers for mostly static DNS is easy, pick
         | one as primary and AXFR to the other and notifications and
         | whatever. Or it's not too hard to keep a zone file in source
         | control and sync it to the providers.
         | 
         | Using multiple providers for fancy DNS, like only providing IPs
         | that pass healthchecks or geotargetting users to datacenters
         | gets pretty hard, because the different providers have similar
         | capabilities, but no uniform interface, so you've either got to
         | do it manually, or you have to build out your own abstraction
         | that is probably limiting.
         | 
         | If possible, insourcing DNS makes the most sense to me, because
         | if you can't keep your service online, it's not the worst if
         | your DNS is offline; and if you can keep your service online,
         | you probably won't mess up your DNS too badly.
        
           | jfoutz wrote:
           | So much this. Keeping feature by feature parity is the tricky
           | part.
        
         | orblivion wrote:
         | So here's a weird question: Supposing companies multi-home for
         | DNS, or whatever other essential service, via multiple service
         | providers.
         | 
         | Whatever multi-home means, why can't there just be one service
         | provider that does _that_? And are we sure that these service
         | providers aren 't already doing that as best we might hope for?
         | (For instance, Amazon already has multiple zones, etc.)
         | 
         | I suppose the one thing this can't protect against is some sort
         | of political (broadly defined) threat related to the company
         | itself.
        
           | lxgr wrote:
           | > Whatever multi-home means, why can't there just be one
           | service provider that does that?
           | 
           | Many of these outages are due to pushing broken artifacts or
           | configuration to production.
           | 
           | A single provider can pretty easily offer geographic or
           | network topological redundancy, but administrative and/or
           | technological independence is pretty hard to achieve in a
           | single company.
        
             | orblivion wrote:
             | I mean, I guess what I'm saying is that in theory a single
             | provider could purposely keep two different departments
             | that manage their own artifacts independently.
        
             | knute wrote:
             | I believe EasyDNS can automatically push DNS settings to
             | Route53 to host DNS in AWS. Doesn't protect you from fat-
             | fingering a change, but you should be resilient to either
             | EasyDNS or Route53 going down.
             | 
             | https://kb.easydns.com/knowledge/easyroute53/
        
       | tru3_power wrote:
       | Any idea on cause? Ddos or hardware failure?
        
         | MrRadar wrote:
         | Widespread issues like this on major CDNs tend to be
         | configuration errors.
        
           | tootie wrote:
           | Cloudflare seems to be struggling too. Not sure if they have
           | some dependency on Akamai or if this portends something much
           | worse
        
       | aliswe wrote:
       | Not only that their support telephone line (in sweden) was down
       | as well
        
       | twalichiewicz wrote:
       | Posted this is the thread about the travel websites being down,
       | but seems Fidelity is entirely impossible to sign in to / trade
       | right now.
        
       | sebyx07 wrote:
       | The good parts of centralisation
        
       | [deleted]
        
       | conqrr wrote:
       | Affecting Airbnb search
        
       | SandroG wrote:
       | Is this related to:
       | 
       | Multiple websites including DraftKings, Airbnb, FedEx, Delta and
       | others appear to be experiencing issues.
       | 
       | https://www.bloomberg.com/news/articles/2021-07-22/multiple-...
        
       | testplzignore wrote:
       | Strange thing about the duration of this outage... From logs I
       | have, it seems to have lasted _exactly_ one hour, from 15:38 to
       | 16:38. Their Twitter account also said  "disruption lasted up to
       | an hour", though they incorrectly said it started at 15:46 (did
       | it take 8 minutes for their monitoring to notice?).
       | 
       | That makes me think that whatever the fix was, it had to wait for
       | some one-hour cache to expire before it took effect. I'm very
       | interested to find out what the cache issue was, more so than
       | what the original bug was.
        
       | xyzzy21 wrote:
       | And people wonder why I try to avoid depending on online
       | anything...
        
       | delgaudm wrote:
       | Lastpass is down, so if you use lastpass the effect is
       | significantly compounded.
        
         | mcintyre1994 wrote:
         | Do they not cache everything locally? I'd have thought a
         | password manager/secure data store would work offline.
        
           | stusmall wrote:
           | They do.
        
         | sammy2244 wrote:
         | Having your passwords only accessible by internet is a stupid
         | idea anyway
        
         | nonfamous wrote:
         | It still works in offline mode. You can't update passwords, but
         | you can retrieve them.
        
           | compscistd wrote:
           | To enable offline mode, I had to turn on airplane mode on my
           | phone before logging in.
        
       | soheil wrote:
       | App Store on MacOS is down!
        
       ___________________________________________________________________
       (page generated 2021-07-22 23:02 UTC)