hngopher.com

       [HN Gopher] AWS us-east-1 down
       ___________________________________________________________________
        
       AWS us-east-1 down
        
       The status page says everything is fine though.
        
       Author : rurp
       Score  : 549 points
       Date   : 2023-06-13 19:08 UTC (3 hours ago)
        
       | jdlyga wrote:
       | Most stuff is back up, but AWS MediaConvert still seems to be
       | down.
        
       | ltcoleman wrote:
       | We are seeing console, codebuild, etc. access issues. Possibly
       | all using Lambdas, foundationally?
        
         | waz0wski wrote:
         | the last big us-east-1 outage was ... DNS - and it's usually
         | DNS or software-defined core networking causing these cascading
         | failures
         | 
         | Loss of DNS causes inter-service api calls to fail, then IAM
         | and all other services fail. Anything not built to handle those
         | situations with backoff causes a 'stampeding herd' of
         | failure/retry and exacerbate the outage
         | 
         | Review the AWS statements about outages here -
         | https://aws.amazon.com/premiumsupport/technology/pes/
        
       | andrewinardeer wrote:
       | Is this possibly why my Ring doorbell omitted a phantom chime
       | about two hours ago at 4am?
        
       | bastard_op wrote:
       | Maybe azure was hosting some services there, getting ddos'd and
       | all.
        
       | super_linear wrote:
       | Seems like it's time for end of the quarter AWS demos and someone
       | got a little too eager for launch.
        
       | throw03172019 wrote:
       | My Whole Foods grocery pickup order was affected by this outage.
       | They couldn't check me in. Groceries were packed in the fridge
       | but they told me to come back later. What a waste of time.
        
       | dijit wrote:
       | I wonder if this is a coincidence or if us-east-1 is simply down
       | enough that I'm just experiencing selection bias; but I posted a
       | poll on twitter earlier today:
       | https://twitter.com/dijit/status/1668678588713824257
       | 
       | Contents:
       | 
       | > Has anyone ever _actually_ had customers accept an outage
       | because AWS was down; or is this just cloud evangelicalism
       | copium?
       | 
       | > [ ] Yeah, outages free pass
       | 
       | > [ ] No, they say to use AZ's
        
         | kobalsky wrote:
         | it's important to inform customers about the resiliency of
         | their systems and let them pick how far they are going to
         | invest for it.
         | 
         | then you get to eat popcorn when stuff explodes.
         | * single server event.   $       * multi server event.    $$
         | * single az event.       $$$       * multi az event.
         | $$$$       * global provider event. $$$$$       * cross
         | provider event.  $$$$$$       * alien invasion.
         | $$$$$$$$$$$$$$
        
           | Endy wrote:
           | Always be prepared for alien invasion
        
           | matwood wrote:
           | Back when we had servers in an onsite DC we lost a raid card
           | and the system I was developing went down. We had the fancy
           | support so a tech was out with the card replaced in a couple
           | hours, then we had to restore from tape backup. All in all, a
           | non-critical system was down for most of a business day. My
           | bosses boss stormed in, upset he couldn't pull a report, and
           | asked how do we prevent this in the future. I responded at a
           | minimum we had to double the cost for a hot standby, and he
           | said 'never mind' and walked out.
        
             | d8dUf7KjBEYCk7Q wrote:
             | That sounds nice. My boss's boss is usually the one
             | storming in, and he usually says "okay let's do it", and
             | then I have to implement it in a week...
        
               | bombcar wrote:
               | That's why you always BofH the estimates to include some
               | fun toys for yourself, too
        
           | xyst wrote:
           | Just need to deploy your service on Mars AND Earth. Duh
        
             | dmckeon wrote:
             | And you thought the current time zone confusion was bad.
             | Now you have two sets of time zones, and a varying delay of
             | about 5 to 21 minutes between them. Oh, the joy!
        
               | kobalsky wrote:
               | note to self: synchronous replication may be a problem
        
           | underbluewaters wrote:
           | This should be logarithmic
        
             | svieira wrote:
             | The nice thing is that any graph without a unit _can_ be
             | log-scale - so in a way, it already is.
        
           | bell-cot wrote:
           | "Briefly describe the '$$$$$$$' through '$$$$$$$$$$$$$'
           | situations. Can't leave money lying on the table."
           | 
           | - memo from Enterprise Sales Dept.
        
         | paulddraper wrote:
         | Depends on your customers.
         | 
         | If your customers are tech, they're too busy running around
         | with their hair on fire too.
        
         | robrtsql wrote:
         | > No, they say to use AZ's
         | 
         | Using 3 AZs in us-east-1 won't save you.
         | 
         | I guess a demanding customer would have said 'you should have
         | implemented disaster recovery so you could failover to us-
         | east-2' but that's easier said than done. The more regional AWS
         | services you adopt, the bigger the impact is. How does one
         | recover from a regional outage if their pipeline is in that
         | region?
        
           | fbellag wrote:
           | What I did once I was in the position of _having_ to provide
           | that level of support, was to run the pipeline in a third
           | region, different from the "prod" ones. That way, worst case
           | you can't do deployments during the outage...
           | 
           | Another alternative studied was to use a thirdparty ci/cd
           | service, outside of our network. It was discarded bc you
           | never know where that would actually run
        
             | robrtsql wrote:
             | > It was discarded bc you never know where that would
             | actually run
             | 
             | Yep, I considered that switching to GitHub Actions would
             | _theoretically_ eliminate the need for disaster recovery
             | for CI/CD (since the handling of disasters is out of your
             | hands) but in practice their SLA is far worse than just
             | running CodePipeline in a single region.
        
               | fbellag wrote:
               | Yeah, that's why we went with a third region instead.
               | But, at the end of the day, if _only_ changes are
               | affected for a couple of hours, that wouldn't impact the
               | service that much
        
           | fnordpiglet wrote:
           | I've worked for several systemically important megacorps
           | where certain things had to not only run cross region but
           | also cross provider. It's absurdly difficult, and only should
           | be done if you _need_ five or more 9's of availability.
           | Almost nothing actually does.
        
         | mrobins wrote:
         | I can think of more times where a whole AZ has had issues than
         | times where just one AZ went dark and failover happened
         | seamlessly.
        
           | paulddraper wrote:
           | s/whole AZ/whole region/
        
         | kinghajj wrote:
         | AZs don't really help when it's AWS' own services across the
         | entire region that break. Anecdotally, we have had customers
         | accept outages that were out of our control without penalty.
        
           | vultour wrote:
           | AZs also don't help with natural disasters at all. I believe
           | AWS is the only one doing geographically distributed AZs, for
           | the others it just means different connections and placed
           | somewhere else in the building.
           | 
           | edit: turns out AWS is the one with geo distribution, not
           | Azure
        
             | kapilvt wrote:
             | Aws azs are also distributed geographically within a region
             | w separate power and network lines. From the docs "
             | Availability Zones are distinct locations within an AWS
             | Region that are engineered to be isolated from failures in
             | other Availability Zones."
        
               | vultour wrote:
               | Ah, you are probably right. I was thinking of the
               | incident a few weeks back where the fire suppression took
               | out multiple AZs, but that was actually GCP.
        
           | dijit wrote:
           | Wild, that wouldn't have flown with datacenter providers
           | having issues for my previous companies.
           | 
           | AWS really does have an easier time than old school
           | datacenter providers. I guess the complexity is higher but
           | it's shocking that they can charge so much yet we hold them
           | to a lower standard.
        
             | nostrebored wrote:
             | Outage rates are also wildly different. When you're using
             | dozens of managed services and have a few prod-impacting
             | outages with any reasonable (cross-AZ) design, customers
             | are less sensitive then when they are dependent on dozens
             | of products that hav independent failure modes with
             | potentially cascading impact.
        
             | gtirloni wrote:
             | DCs are pretty static and offer way fewer services than AWS
             | or any other public cloud.
             | 
             | I worked for one for some time and whenever we had issues,
             | some people would call and ask if we were going bankrupt.
             | It gave me a feeling they also have way smaller customers
             | that might not understand the underlying stack.
        
               | aeyes wrote:
               | If all you use in AWS is static EC2 instances you would
               | have to go back a looooong time to find an outage which
               | affected their availability. Even in us-east-1.
        
               | TonyCoffman wrote:
               | December 22, 2021 was the last partial impact we had in
               | us-east-1 for EC2 instances. They had power issues in
               | USE1-AZ4 that took a while to sort out.
        
         | jmacjmac wrote:
         | Maybe cheaper regions have more users and have higher outage
         | rates
        
         | Johnny555 wrote:
         | My employer lets customers choose which of our supported
         | regions to run in and exempts cloud provider outages from our
         | SLA (we're on the hook for staying up for single AZ outages,
         | but not multi AZ or region outages). We provide tools to help
         | customers replicate their data so they can be multi-region or
         | even multi provider if they want to.
        
       | [deleted]
        
       | uberism wrote:
       | It appears to have been fixed and resolved for us.
        
       | monero-xmr wrote:
       | You can't even access the tools (web or CLI) in order to put your
       | own system into maintenance mode...
        
       | chaosmachine wrote:
       | If you're having problems accessing the console, the workaround
       | is just to use a different region, eg:
       | 
       | https://ca-central-1.console.aws.amazon.com/console/home
       | 
       | This assumes you don't actually need anything from us-east-1,
       | though :)
        
       | mirkodrummer wrote:
       | eu-west-1 having some issue as well for me
        
         | lonnyk wrote:
         | can you elaborate?
        
       | theshrike79 wrote:
       | Guesses, which one is it this time:                 * DNS       *
       | Misconfigured switch
       | 
       | It's always one of those two.
        
         | mdaniel wrote:
         | * Cert expiry
        
       | whoisjuan wrote:
       | Why is it always us-east-1 though?
       | 
       | I have always stayed away from that region because it seems
       | significantly less reliable than other regions.
        
         | nicoffeine wrote:
         | I thought I read that this is where they deploy new changes
         | first. Can anyone confirm?
        
           | shitlord wrote:
           | When I worked there, there were few hard and fast rules.
           | Every team had its own release processes, so there was a lot
           | of variance. It has been a couple of years, so this may have
           | changed.
           | 
           | Typically, a team would group their regions into batches and
           | deploy their change to one batch at a time. Usually they
           | follow a geometric progression, so the first batch has one
           | region, the second batch has two regions, the third batch has
           | four regions, and so on. This batching was performed for the
           | sake of time; nobody wants to wait a month for a single
           | change to finish rolling out.
           | 
           | One reason not to deploy to us-east-1 in the first batch is
           | so you don't blow up your biggest region. The fewer customers
           | you break, the better.
           | 
           | One reason not to deploy to us-east-1 in the last batch is
           | that there are a lot of batches. If a problem is uncovered
           | after deploying the last batch, then someone has to initiate
           | rollbacks for every single region.
           | 
           | Some teams tried to compromise and put us-east-1 in one of
           | the earlier batches.
        
           | shepherdjerred wrote:
           | No definitely not. Usually pipelines deploy over 1-2 week
           | periods, and they don't deploy on Fridays/holidays/high-
           | traffic periods like December.
           | 
           | Deployments start off very conservative, maybe 1-2 small
           | regions on the first day of deployments. As you gain
           | confidence, the pipeline deploys to more regions/bigger
           | regions.
           | 
           | A pipeline that deploys to 22 regions over one week might go
           | from 2 small regions on monday, 4 small/medium regions on
           | tuesday, 8 medium/large regions on wednesday, 8 regions on
           | thursday.
           | 
           | us-east-1 is usually going to be deployed to on the
           | wednesday/thursday in this example, but that isn't always the
           | case because sometimes deployments are accelerated for
           | feature launches (especially around re:invent), or retried
           | because of a failure.
           | 
           | There are best practice guides within Amazon that very
           | closely detail how you should deploy, although it is up to
           | the teams to follow them, which they usually do an okay job
           | of.
        
           | temp_praneshp wrote:
           | From observing my wife's teams over the years, they deploy
           | new _products_ early to that region, but deploying code
           | changes starts in smaller regions.
        
           | mparnisari wrote:
           | When i worked at aws, IIRC, us-east-1 was one of the last
           | regions we deployed to. So this is very confusing to me
        
           | throwaway019254 wrote:
           | I don't believe it's true. I was working on one of the
           | biggest AWS services and we always deployed to small regions
           | first.
           | 
           | @dijit is right:
           | https://news.ycombinator.com/item?id=36315736
        
         | arbitrage wrote:
         | Because it was one of the first, and it shows its age and less
         | than rigorous rollout compared to the other zones.
        
         | gtirloni wrote:
         | I suspect it's where they concentrate a lot of their control
         | plane.
        
         | [deleted]
        
         | JoshTriplett wrote:
         | us-east-1 is AWS's oldest region, and has the most legacy
         | infrastructure, in ways that many other regions do not.
        
         | dijit wrote:
         | It's the:
         | 
         | * Largest (DDoS'd most, most complex, scaling issues etc)
         | 
         | * Oldest (More time for weird idiosyncrasies to take hold)
         | 
         | * Where most testing happens
         | 
         | * Where new products are deployed first
        
           | Matthias247 wrote:
           | 1) and 2) certainly apply. 3) and 4) don't. Testing in the
           | largest region is one of the biggest anti-patterns.
        
             | jedberg wrote:
             | 4 is still generally true. Most new features drop in us-
             | east-1 on launch day.
        
               | shepherdjerred wrote:
               | Usually us-east-1 is deployed to after several smaller
               | regions. Usually it'll fall in the middle of the week
               | depending on the pipeline.
               | 
               | Just because a feature is there on launch day doesn't
               | mean it was deployed to first. Features are often hidden
               | behind flags that are switched for launch.
        
               | jedberg wrote:
               | I'm well aware of that, but the point is that when the
               | feature is ungated to the public, it's in us-east-1 and
               | gets all that load, and more load than the rest because
               | of the fact that a lot of big customers are based in us-
               | east-1, including much of Amazon itself.
        
             | grumple wrote:
             | AWS doesn't test there last I checked, they roll out to
             | smaller regions first.
        
           | mcast wrote:
           | Most AWS engineering is closest to (and tested in) us-west-2
           | (PDX) or us-east-2 (Ohio)
        
           | abraae wrote:
           | Looking forward to Auckland coming online, which should be
           | the opposite to most of these factors, and will make game
           | streaming bearable (for me)
        
           | paulddraper wrote:
           | It's also the home of single region services...
           | 
           | IAM, Cloudfront ACM certs, etc
        
           | theshrike79 wrote:
           | It's also
           | 
           | * The only place where the IAM dashboard can be accessed
           | from. I need to access it NOW. I can't.
        
         | mike_d wrote:
         | us-east-1 is the largest region, so it is where changes meet
         | scale.
         | 
         | It is also a massively complex beast in itself spanning dozens
         | of datacenters with massive amounts of fiber between them. Much
         | more fragile than having everything in a single building and as
         | you scale up the number of components you increase the rate of
         | failure.
        
           | sofixa wrote:
           | No AWS region is in a single building, they aren't amateurs
           | like Azure. Each region is at least 3 AZs, which is at least
           | one physical DC.
        
             | yanellena wrote:
             | And yet it's AWS that's down.
        
               | sofixa wrote:
               | Touche. Still I'd rate the overall reliability of AWS
               | higher than Azure; and even if that weren't the case,
               | security issues make Azure look like a very poor choice.
        
         | esotericimpl wrote:
         | [dead]
        
         | colinbartlett wrote:
         | I actually just wrote about this very thing. It's not just that
         | it SEEMS less reliable, it absolutely is:
         | 
         | https://statusgator.com/blog/is-north-virginia-aws-region-th...
        
       | kyleee wrote:
       | The AWS status page is now down as well
        
         | WalterSobchak wrote:
         | https://health.aws.amazon.com/health/status seems to be working
         | fine for me (been refreshing it every couple of minutes).
        
       | hatmanstack wrote:
       | [flagged]
        
       | jmacjmac wrote:
       | Can we find stats somewhere, something like number of outages by
       | regions?
        
       | throaway87c10f0 wrote:
       | Mysterious lack of "AWS is bad for the internet because it is so
       | centralized" dialog up in here.
       | 
       | edit: for those that would downvote: HN _just_ yesterday:
       | https://news.ycombinator.com/item?id=36295352
       | https://news.ycombinator.com/item?id=36295305
        
         | SamuelAdams wrote:
         | Ok fine. Running your own datacenter in 2023 is incredibly
         | risky. There's the upfront server cost and the ongoing
         | maintenance cost. There's patches and staffing and disaster
         | planning and all the other things that goes into it. Plus
         | there's the cyberinsurance and protections and security
         | components too.
         | 
         | Do you really think other (smaller) orgs can do a better job at
         | hosting a datacenter than Amazon / Google / Microsoft /
         | Cloudflare? They have some of the brightest minds in the
         | industry working there, and they can price things at a much
         | better price than anything you can build yourself.
         | 
         | Yes, I get it. All the computer processing power in a handful
         | of actor's hands is probably not the most fantastic thing.
         | However with the price of some cloud vendors compared to the
         | DIY approach, it's hard for organizations to ignore.
         | 
         | If you really want to combat this, make the cost of running
         | your own data center less. Reduce risk. Reduce the amount of
         | money it costs for hiring good people or MSP's. Reduce the cost
         | of acquiring and installing hardware.
         | 
         | Organizations pay attention to dollars so if you want the trend
         | to shift, come up with a less costly alternative to the current
         | cloud offerings.
        
         | dijit wrote:
         | its just tired at this point.
         | 
         | everyone knows, nobody seems to care.
         | 
         | Another comment of mine in this thread asks the question if you
         | can excuse downtime of your service due to AWS outages.
         | 
         | Consensus seems to be: yes
         | 
         | which is a pretty huge deal, well worth the insane cost
         | increase of AWS by itself. No other hosting provider would
         | grant you such an excuse.
         | 
         | I would weep for the centralised future of the internet, but
         | its already here, so theres no point.
        
           | klysm wrote:
           | Even if people _do_ care, there isn't much to do about it.
        
             | hosteur wrote:
             | If people do care they could use other hosting providers
             | such as Hetzner or OVH, no?
        
         | ulrashida wrote:
         | Why a throwaway for this post? Not like this is some deep
         | whistleblowing or career risk.
        
           | chrisco255 wrote:
           | Maybe they work for Amazon.
        
         | hx833001 wrote:
         | Too techie and doing things the right way, so CF shouldn't be
         | successful? Therefore... jealousy? That's my guess as to why
         | all the hacker news hate.
        
         | andersrs wrote:
         | It's a mob mentality. Safety in numbers. "Oh well, my site is
         | down but so is my neighbour's so nobody will be that mad about
         | it."
        
           | jjk166 wrote:
           | which is legitimate - if only you're down then you're losing
           | business to your competitors and failing those who rely upon
           | you; if everyone's down it's a wash. And frankly it's not
           | like you're going to have significantly better uptime by
           | going against the crowd.
        
         | [deleted]
        
       | gawshinde wrote:
       | https://health.aws.amazon.com/health/status now it is showing.
       | Lot of services are impacted.
        
       | thedigitalone wrote:
       | Toast POS is down 100%, don't go out to lunch.
        
         | xyst wrote:
         | I can see some restaurants just comp'ing the tickets out and
         | having toast foot the bill in lost sales
        
           | thedigitalone wrote:
           | Toast offline mode captures the credit cards and processes
           | them later, no reason to turn away sales, it is just a
           | hassle.
        
           | dymk wrote:
           | You say that as if it's as easy as sending Toast the bill and
           | Toast just going "Yeah okay we'll pay".
           | 
           | When the POS system goes down, restaurants take down credit
           | card numbers, and then charge them later when the POS comes
           | back up.
        
         | jjice wrote:
         | As a side note, I wonder if businesses won't even accept cash
         | if they can't go through their POS system. If not, it's a shame
         | that these modern internet connected POSs lock out stuff like
         | that.
        
           | inconceivable wrote:
           | some restaurants have their owner or manager run square or
           | stripe on their cell phone.
        
             | JoBrad wrote:
             | I was at a swim meet last week, and one of the food trucks
             | was using Apple Tap to Pay because Toast didn't have a
             | solution that worked for them, on site. After they finish
             | up at an area, they then enter a single transaction for all
             | of the day's business into Toast.
        
             | xyst wrote:
             | I don't know about Stripe but Square used to offer the CC
             | swipe dongle you can connect to your phone. Then process
             | payments through their app
        
           | p_l wrote:
           | Depending on country and exact POS setup, they might not be
           | able to take cash if POS is down.
           | 
           | For example, in Poland, your typical restaurant or shop needs
           | to generate tax receipt (as well as properly calculate the
           | tax), and uses either a separate receipt POS device, or POS
           | with appropriate receipt printer (the devices are certified
           | and for example do simultaneous two prints - one for client
           | one for seller - or use digitally-signed storage for seller
           | copy).
           | 
           | If the POS isn't designed properly to operate in case of
           | network failure... welp, can't take cash either, at least not
           | legally.
        
       | rotten wrote:
       | Looks like it is a problem with lambdas.
        
       | trnsfrmrsr wrote:
       | def broken for us
        
         | trnsfrmrsr wrote:
         | issues with STS and IAM galore #thisIsFine
        
       | jdlyga wrote:
       | I posted a comment about AWS Media Convert down below, but it's
       | back working for me.
        
       | [deleted]
        
       | vvoyer wrote:
       | Worth knowing: today and tomorrow is AWS re:Inforce 2023
       | https://reinforce.awsevents.com/.
        
       | ciguy wrote:
       | It appears to be an outage in IAM which is trickling down to
       | every service which relies on IAM auth.
        
         | pliuchkin wrote:
         | But IAM is supposed to be Global, not us-east-1
        
           | Androider wrote:
           | All the regions are equal, but some regions are more equal
           | than others.
        
           | paulddraper wrote:
           | In that it globally depends on us-east-1.
        
       | [deleted]
        
       | gerenuk wrote:
       | Netlify is down as well. https://www.netlifystatus.com
        
         | [deleted]
        
       | [deleted]
        
       | smnirven wrote:
       | yep can't even get into console to diagnose / troubleshoot / fix
        
         | bovermyer wrote:
         | CLI works OK still.
        
       | dveeden2 wrote:
       | https://health.aws.amazon.com/health/status reports:
       | Increased Error Rates and Latencies       Jun 13 12:08 PM PDT We
       | are investigating increased error rates and latencies in the US-
       | EAST-1 Region.
       | 
       | They list Lambda as the only affected service
        
         | amir734jj wrote:
         | the status page doesn't even open.
        
         | notatoad wrote:
         | status page won't load for me. are they still hosting their
         | status page on their own infrastructure?
        
           | j-sizz wrote:
           | Perhaps they should host it on GCP
        
             | xenospn wrote:
             | Ouch
        
         | ciguy wrote:
         | They've added a dozen or so more as potentially down now.
         | Anything that uses IAM, which I suspect is the core of the
         | issue.
        
           | ramranch wrote:
           | doesn't every service use IAM?
        
           | rotten wrote:
           | They have 41 services listed now.
        
         | ishjoh wrote:
         | certificate manager also down (I know because I tried to update
         | an ssl cert for cloudfront which only allows US-East-1 ssl
         | certs, maybe someone will eventually fix that to allow any
         | region to have the ssl cert for cloudfront)
        
           | heleninboodler wrote:
           | > cloudfront which only allows US-East-1 ssl certs
           | 
           | This seems like an odd limitation. Do you know the technical
           | reason?
        
         | NameError wrote:
         | I suppose "increased error rates and latencies" is technically
         | true when the error rate is 100% and the latency is "until we
         | fix it"
        
       | grumple wrote:
       | It's fun watching each service fail sequentially while the aws
       | service dashboard just updates them to "Informational" status,
       | whatever that means.
       | 
       | Even management console is down, and their suggested region
       | specific workaround does not work, at least for us-east-1. I can
       | see some processes via api but I don't have code prepared for
       | monitoring every service from my local.
        
         | grumple wrote:
         | And now the service health page is down.
        
       | FullyFunctional wrote:
       | I "love" it when my vacuum stops working because an online book
       | sellers servers went down. #modernlife
       | 
       | This is a good reminder to avoid cloud-centric products, but they
       | are getting harder and harder to avoid.
        
       | uberism wrote:
       | s3 is down for us
        
       | 89vision wrote:
       | We're on use1 and havent seen any degradation
        
       | assimpleaspossi wrote:
       | My son delivers part-time for Amazon and all the drivers at his
       | warehouse were sent home. So if your delivery is late or non-
       | existent today....
        
         | prfssnl wrote:
         | Amazon Flex?
        
           | assimpleaspossi wrote:
           | Yep
        
       | issafram wrote:
       | I love how their status says that services are just degraded
        
       | dangoodmanUT wrote:
       | Yes
        
       | arixzajicek wrote:
       | this happened less than an hour after I altered our prod scheme,
       | thought I brought down production, what a relief
        
         | thefourthchime wrote:
         | I had a similar reaction. Oh no WTF, how did i break that?!
         | Then my buddy texted me about us-east-1 being down. Then i
         | thought "Oh thank god, this shitshow is someone else's fault."
        
         | taylodl wrote:
         | No, you brought down us-east-1 instead! Thanks!
        
           | spicybright wrote:
           | This is why you don't give prod credentials to the new guy!
        
       | Max-Ganz-II wrote:
       | I kicked off a Redshift cluster in every region, they've all run
       | and completed, except for `us-east-1`, which is stuck creating
       | the cluster. Been about an hour now.
        
       | dangoodmanUT wrote:
       | What's interesting is that I can still access my EKS cluster, but
       | none of the deployments are "ready" that have LBs attached to
       | them. Pods can create fine though!
        
       | mlhpdx wrote:
       | How many folks actually use multi-region deployments with
       | automatic failover (e.g. latency based routing in route 53)?
        
         | klysm wrote:
         | The difficulty of making this work in practice is pretty high.
         | It also isn't cheap, so I would guess not many.
        
       | [deleted]
        
       | last_responder wrote:
       | No issues on our Linodes .
        
       | arixzajicek wrote:
       | [flagged]
        
       | samwillis wrote:
       | Other submission linking to AWS status page:
       | https://news.ycombinator.com/item?id=36315441
       | 
       | And what a surprise it's US-EAST-1 again...
        
       | nathants wrote:
       | finally an opportunity to test a full deploy from scratch, and
       | restore from backup, in a new region.
       | 
       | i wonder if it will work first try? the true test of devops
       | culture.
        
         | danryan wrote:
         | Good luck! My own attempt failed because SSO is down.
        
       | rafamcc wrote:
       | anyone getting Gateway Time-out?
        
       | rafamcc wrote:
       | getting Gateway Time-out
        
       | totaldude87 wrote:
       | much reliable https://downdetector.com/status/aws-amazon-web-
       | services/
        
       | neodypsis wrote:
       | Is there an estimate for the time they will take to solve this?
        
         | JoBrad wrote:
         | 2.5 hours, from start of incident.
        
       | JTbane wrote:
       | b-but muh five nines of reliability...
        
       | JohnMakin wrote:
       | I'm not sure it's the case here, but the issue with these cloud
       | providers is they use their own services to maintain their
       | infrastructure - that's why when something like lambda gets
       | degraded, which would not shock me if they're using everywhere,
       | you start to see random crap like console and IAM go down as
       | well.
        
       | johnnyApplePRNG wrote:
       | I can't even update my billing information right now :/
        
       | rbosinger wrote:
       | It's always during the demos to the stakeholders, isn't it?
        
         | mardifoufs wrote:
         | I'm so glad my demo today was specifically about local
         | inference on... Windows. I guess working I finally found an
         | upside to doing ML outside Linux ; we don't have Windows VMs on
         | AWS!! :)
        
         | [deleted]
        
       | noradbase wrote:
       | I guess I'll use the downtime to see what's new on Reddi... oh...
       | yeah.
        
       | forgetfulness wrote:
       | Tried to log on to OkCupid and it was down, I guess I should
       | thank Jeff for taking me off the Skinner Box for a while.
        
       | jarym wrote:
       | us-east-1 seems to be very 'special' compared to the other
       | regions - I wonder if they will ever align it with the rest of
       | them.
        
         | thefourthchime wrote:
         | I think a _lot_ of companies just do everything there and pinky
         | promise one day they 'll go multi-region.
        
           | rotten wrote:
           | I worked with a devops person who moved everything we had set
           | up in other regions _to_ US-East-1 because that is where you
           | are supposed to run stuff. According to him, the other
           | regions were just for DR stuff.
        
             | jarym wrote:
             | Surely not an AWS certified devops person? I don't think
             | they teach mythology!
        
         | goodells wrote:
         | Yep, it has issues so frequently. I wonder how many
         | companies/teams start using AWS and blindly choose us-east-1
         | without realizing what they're getting into.
         | 
         | <rant>
         | 
         | It's also quite annoying sometimes that some things _need_ to
         | be in us-east-1, and if e.g. you are using Terraform and
         | specify a different default region, AWS will happily let you
         | create useless resources in regions that aren't us-east-1 that
         | then mysteriously break stuff because they aren't in this one
         | blessed region. AWS Certificate Manager (ACM) certificates are
         | like this, I believe.
         | 
         | </rant>
        
           | mschuster91 wrote:
           | ACM certificates themselves can be had in any region (and you
           | can use them for stuff like ELBs), but since the Cloudfront
           | control plane is in us-east-1, if you want Cloudfront (and
           | IIRC, also if you want custom domain names for an S3 bucket,
           | but don't quote me on that) you'll have to create an
           | additional certificate in us-east-1.
           | 
           | Sigh.
        
       | adubashi wrote:
       | It doesn't matter if your infra is in another region, because
       | there will almost always be transitive dependencies on us-east-1.
       | IAM is deployed in us-east-1 and there will always be a
       | transitive dependency on us-east-1
        
         | jedberg wrote:
         | Usually it only prevents changes, but the runtime isn't
         | affected.
        
         | mooreds wrote:
         | Control plane will almost always be impacted, I agree.
         | 
         | Our data plane was fine (for example, ec2 instances and s3
         | buckets in other regions were fine).
        
         | luhn wrote:
         | I have never had a production issue in other regions due to a
         | us-east-1 outage. The worst that ever happened was I had to
         | wait to update a Cloudfront distribution because the control
         | plane (based in us-east-1) was down, but the existing
         | configuration continued working fine throughout.
         | 
         | I don't know what the architecture of IAM looks like, but
         | somehow it's never suffered a global outage.
         | 
         | AWS is really, really good at regional isolation.
        
           | TheSoftwareGuy wrote:
           | I think the data plane is regional
        
           | jcims wrote:
           | >I don't know what the architecture of IAM looks like, but
           | somehow it's never suffered a global outage.
           | 
           | Authentication possibly, but the control plane has gone down
           | preventing changes.
        
         | shepherdjerred wrote:
         | I thought there was some recent shift on making IAM multi-
         | region?
        
       | impulser_ wrote:
       | Why does everyone keep deploying their products to this one
       | region when it always seems like the one that fails?
       | 
       | We don't use big cloud were I work, so maybe I'm missing
       | something. Does East-1 offer something other don't?
        
         | 8b16380d wrote:
         | Instance types, for one. us-east-1 has all the latest instance
         | types and more of them. We could not run some of our workloads
         | in any other region.
        
         | andrewmcwatters wrote:
         | There's a lot of software, iirc even Amazon's own dashboards,
         | that simply defaults to us-east-1.
        
           | paulddraper wrote:
           | FWIW, mine defaults to Ohio. Has happened multiple times. IDK
           | if it is geographic or what.
        
           | akira2501 wrote:
           | That's my favorite part of Amazon's console. That miniature
           | heart attack you have when you ask "WHERE ARE ALL MY LAMBDAS
           | AND DYNAMO INSTANCES?"
           | 
           | Then you realize that they just switched you back to us-
           | east-1 for some reason and a wave of familiar relief washes
           | over you.
        
         | xyst wrote:
         | Because $$$
        
         | holler wrote:
         | I used it because early on in the project I wanted to use
         | features for IoT that were only available on us-east-1
         | initially, as well as lambda@edge which was on us-east-1 only
         | at the time.
        
         | maxcan wrote:
         | There are some services (cloudfront for example) which require
         | this region. Its not that much harder to have multiple regions
         | in your deployment but putting everything in one is simpler for
         | smaller startupy orgs.
        
         | jmull wrote:
         | Best latency from where almost 90% of my users are?
        
         | pirsquare wrote:
         | It's dirt cheap.
         | 
         | But I still prefer EU region =).
        
         | MajimasEyepatch wrote:
         | You generally want to use a region close to your users, so
         | right off the bat, us-east-1 and us-east-2 are the obvious
         | choices for most East Coast companies. If I were starting a new
         | project, I'd probably go us-east-2, but if your company has
         | been on the cloud long enough, us-east-2 might not have existed
         | when your foundational infrastructure was created. And for most
         | companies, going multi-region is an expensive, difficult
         | proposition that might not be worth it.
         | 
         | Plus, as others have noted, there are critical AWS services in
         | their control plan that only run in us-east-1 behind the
         | scenes. So you're kind of out of luck.
        
         | paulddraper wrote:
         | Features.
         | 
         | For example, do you want your Cloudfront CDN to have a custom
         | (secure) domain?
         | 
         | Then you have to host your ACM cert in us-east-1.
        
         | gui77aume wrote:
         | It has the best latency from Chile and I think from other
         | countries of South America
        
       | compumike wrote:
       | [flagged]
        
         | CodesInChaos wrote:
         | "eventually" is doing _a lot_ of work in that sentence. Our sun
         | will last another 5 billion years, and the heat death is
         | something like 100 trillion years away.
        
           | compumike wrote:
           | Sure :)
           | 
           | Though a lot of practical thermal-related causes of
           | electronics failure seem to operate on timescales of years to
           | decades, like electromigration
           | https://en.m.wikipedia.org/wiki/Electromigration or even just
           | cooling fan bearing failure. And I don't think it would be a
           | huge stretch to point to electromigration as a case of
           | diffusion, a natural entropy increasing process, re-
           | randomizing the arrangement of atoms within a transistor (and
           | therefore making it fail eventually).
        
       | BluePen7 wrote:
       | They've just acknowledged degradation with lambda
        
       | intsunny wrote:
       | For those wondering: Currently PDT is 7 hours behind UTC.
       | 
       | AWS can do so many things, reporting critical outage updates in
       | UTC is not one of those things.
        
         | messe wrote:
         | > AWS can do so many things, reporting critical outage updates
         | in UTC is not one of those things.
         | 
         | Thank you for reminding me about one of my biggest mildest
         | annoyances from working at AWS.
        
         | andrelaszlo wrote:
         | After spending some time in the Canary Islands I realized how
         | nice it was to be in UTC all the time and now I have my laptop
         | clock set to UTC. Still contemplating whether I should set
         | Google Calendar and my smartwatch to UTC as well. 8-)
        
         | takeda wrote:
         | I thought it uses your browser time zone, is it not?
        
           | paulddraper wrote:
           | No. It's all PDT.
        
           | [deleted]
        
           | meepmorp wrote:
           | Says the same here and I'm on the other coast.
        
         | rurp wrote:
         | The inconsistency with timezones across different services in
         | the AWS console has always baffled and annoyed me. Some places
         | have a time without a timezone and I can never tell right away
         | if it's utc, local time, or region time.
        
           | [deleted]
        
           | xp84 wrote:
           | > The inconsistency [of everything, everywhere] in the AWS
           | console
           | 
           | ftfy
           | 
           | AWS is powerful and very popular, but for the console, "it
           | functions" must be the only condition the UI has to satisfy.
           | Should every page use a unique table and sorting widget and
           | UI language? Yes, please!
           | 
           | I'm assuming this helps them move fast, not having to
           | coordinate with anybody or wait for a UI designer to tell
           | them how it should look. But it's striking when compared to
           | GCP.
        
           | [deleted]
        
         | stephenc123 wrote:
         | I've set the option on the below page to UTC
         | 
         | https://health.aws.amazon.com/health/status#settings
         | 
         | As I'm logged in, it persists across browser sessions.
        
         | kroltan wrote:
         | Semi-related: if you ever feel the need to report times to a
         | global audience, not only make sure to always report the
         | timezone (even if it is the same as the user's), but also use
         | UTC offsets rather than timezone names.
         | 
         | Life is too short to remember what each timezone name means and
         | converting to it, UTC offsets are much easier on the mental
         | calculator.
        
           | JJMcJ wrote:
           | Nothing worse than people who say "9 AM my time" I suppose
           | it's OK if it's Pacific vs Mountain but even there Arizona
           | doesn't observe Daylight, and parts of Eastern Oregon are
           | Mountain, not Pacific.
           | 
           | Never mind dealing with India, Australia, etc etc.
           | 
           | OK to use local time in your statement, just say what that
           | time is.
        
           | perlgeek wrote:
           | It's also not too complicated to add a few lines of
           | javascript that show the datet/time in the user's local time
           | zone (via Date.getTimezoneOffset) as well.
        
             | bob1029 wrote:
             | We do this in all of our web apps. It's pretty simple and
             | dramatically improves UX when you have customers that are
             | doing a lot of scheduling.
             | 
             | Showing both at the same time is peak design for me
             | personally. UTC compares for relative sequencing, local
             | time for "was that before or after I ate lunch".
        
             | bawolff wrote:
             | Honestly this is the worst when there is no timezone marker
             | and some times are in browser time and others arent
        
               | bombcar wrote:
               | Or when logged times in the past change depending on
               | today's daylight savings setting.
        
             | blibble wrote:
             | given how long it took AWS to add support for Ed25519 ssh
             | keys (literally just fix the validation regex), I wouldn't
             | hold your breath
        
             | andrelaszlo wrote:
             | As long as you still show which tz it is! :)
             | 
             | GCP's various products have gotten a lot better at this
             | lately, but just a few months ago I could click around
             | between various dashboards and explorers, some showing the
             | time in UTC, some in your browser's tz, and some in your
             | profile's tz (if I recall correctly). Some of them were
             | showing the tz, and for some you had to guess. Sometimes
             | you had multiple tzs on the same page. Sometimes the date
             | picker for a control was in one tz and the widget it was
             | controlling in another (leading to quite a lot of
             | confusion).
             | 
             | The worst offence IMO was not showing the tz at all.
             | Especially given the overall lack of consistency.
        
           | yreg wrote:
           | And if it's on a forum debating an event that's about to
           | happen soon, I find the following extremely convenient:
           | 
           | - the keynote will start when this post is 5 hours old
           | 
           | - the rocket launch is scheduled to when this comment is 30
           | hours old
        
           | colanderman wrote:
           | And even if the time is UTC, please indicate this.
        
           | dylan604 wrote:
           | just report it in epoch seconds
        
             | conductr wrote:
             | that's based on UTC, so just use UTC?
        
               | dylan604 wrote:
               | based on is not the same thing though is it?
               | 
               | UTC is human readable even if it is not calculated
               | correctly. yes, i'm saying that if you can read epoch
               | seconds, you're not human. 1970-01-01 00:00:00 is always
               | a give away that something is a foot
        
               | conductr wrote:
               | "anchored on" then? I might be wrong but we're both
               | talking about showing time as distance from the same
               | starting point are we not? One's just more human readable
               | so that's why I say why not just use that? Seconds since
               | can be miscalculated too, especially if current time
               | isn't known/reliable
        
             | dramm wrote:
             | Or stardate -\\_(tsu)_/-
        
             | drivers99 wrote:
             | Or Swatch Internet time (.beat time). No time zones, it's
             | always UTC+1, with the day divided into 1000 beats.
        
             | inopinatus wrote:
             | I will pass you the address of a struct timespec, please
             | fill it in.
        
             | queuebert wrote:
             | Julian Date
        
           | TremendousJudge wrote:
           | It's also usually extremely US-centric. Nobody outside of
           | North America has any idea what "PDT" or "Mountain Time"
           | means.
        
             | jen20 wrote:
             | Not to mention they conflict. CST can be "Central Standard
             | Time", "China Standard Time" or "Cuba Standard Time" and so
             | forth...
        
             | jjtheblunt wrote:
             | And if you've been American since birth, and live in
             | Arizona, one might still not know, since PDT and Mountain
             | Time alternate covering Arizona seasonally. ("Ask me how i
             | know.")
        
               | Rebelgecko wrote:
               | It can also varies within Arizona... one of the most
               | confusing times in my life was driving from California
               | through the Navajo Reservation in AZ on my way to an
               | appointment. Was my cell phone giving me the local time
               | on the reservation? Was it connecting to a cell tower
               | just outside the reservation, giving me DST-less Arizona
               | time? Or a tower slightly further away in Utah (DST?) Or
               | was it giving me the time on the Hopi reservation, which
               | is an enclave totally surrounded by the Navajo
               | Reservation which uses AZ time?
        
             | donalhunt wrote:
             | I mostly struggle with Irish Standard Time (used for DST in
             | Ireland) and Indian Standard Time which have the same
             | acronym. :(
             | 
             | Thankfully, I learnt a long time ago to use ISO 8601 and
             | UTC for dates and times. I still revert to PST/PDT if my
             | audience is primarily left coast based.
        
               | unmole wrote:
               | > I mostly struggle with Irish Standard Time (used for
               | DST in Ireland) and Indian Standard Time which have the
               | same acronym. :(
               | 
               | Heh. After the first few instance of confusion, we
               | switched to saying _Bangalore time_ and _Dublin time_.
        
               | macksd wrote:
               | And I can't say it's ever actually caused a problem, but
               | something about Indian Standard Time being a half-hour
               | offset from UTC has always bothered me so much... But now
               | we're fully off-topic.
        
               | WirelessGigabit wrote:
               | Left coast? That's a term I've never heard of.
        
             | NoZebra120vClip wrote:
             | 3 years ago, when I started work for my current employer, I
             | noticed in Slack that everyone was reckoning time in
             | "Standard Time" year-round. Now imagine my chagrin because
             | I live in Arizona, and "Mountain Standard Time" does not
             | change for DST. Therefore, all my coworkers were citing
             | nonsensical, nonexistent time zones and it was messing up
             | my ability to convert back and forth.
             | 
             | Come to find out that this was some sort of entrenched,
             | company-wide standard that was deliberately imposed. I made
             | a lot of noise about this and appealed to some rather
             | highly-placed directors, because I felt like it was wildly
             | inaccurate and deceiving people; if you schedule a meeting
             | in EDT but you say it's in EST, and we have employees all
             | around the world, who's going to know? You're inviting off-
             | by-one errors. Especially with me who lives permanently in
             | MST.
             | 
             | 3 years on, I've been unable to change this fundamentally;
             | while a few people acknowledge DST, 90% of the company
             | still adheres to this crazy false standard.
        
               | WirelessGigabit wrote:
               | I just had someone asking me if I'm available at 5pm EST.
               | 
               | Also, your clock can get confused driving North from PHX
               | to Zion National Park.
               | 
               | In summer you start in Mountain Standard Time, drive into
               | the Navajo Nation which does observe Mountain Daylight
               | Time, containing through the Hopi Reservation, which is
               | Mountain Standard Time. Then you end up back in Navajo
               | Nation with Mountain Daylight Time. You keep on driving
               | towards Page which is in Mountain Standard Time. However,
               | when you cross the state-border of AZ/UT you're back in
               | Mountain Daylight Time.
               | 
               | My clock threw a segmentation fault.
        
               | cj wrote:
               | This is why I always write ET instead of EDT/EST.
               | 
               | I encourage everyone at my company to do the same. Easy
               | way to eliminate errors while typing 1 less key stroke!
        
               | randallsquared wrote:
               | This is the way.
        
               | testplzignore wrote:
               | One of the saddest pieces of code I ever wrote was to
               | treat "MST" as always meaning America/Denver. I'm sorry.
        
               | dyingkneepad wrote:
               | How does this generate off-by-one errors? I am also part
               | of a company with employees in pretty much every
               | timezone, but when they create a meeting the meeting
               | invitation is programmed with the correct timezone so in
               | my Calendar it always shows what time the meeting is
               | going to be for me. I never even have to think what
               | timezone the organizer is...
        
               | jlick wrote:
               | The off-by-one error occurs when you announce an event in
               | Standard time but really mean Daylight time, or vice
               | versa. While those local to the time zone will often
               | automatically correct this mistake either consciously on
               | unconsciously, those in other time zones (especially
               | where Daylight time isn't used or is on a different
               | schedule) will tend to rely on time conversion tools
               | which will take a literal interpretation of the scheduled
               | time and result in the person being an hour early or an
               | hour late.
        
               | Travis_Pastrami wrote:
               | It's the same at my company. Teams and Zoom both
               | automatically schedule meetings in every attendees' own
               | time zone. Maybe that person's company still does phone
               | meetings or something.
        
               | NoZebra120vClip wrote:
               | We don't use any automatic scheduling with Zoom or Google
               | Calendar. Management doesn't send invites to those
               | meetings, they just publish the link on Slack and we have
               | to figure out how to get it into our calendars.
               | 
               | Trust me, at least once I missed a meeting because I was
               | late by an hour due to time zone confusion.
        
             | inopinatus wrote:
             | Unnecessary reflux obtained during US-Australian
             | collaboration from insufficiently specific references to
             | "east coast time".
        
             | lkbm wrote:
             | Probably if you're using AWS, you do, but it would be much
             | more convenient if they just used UTC by default with an
             | option to localize.
        
             | tazjin wrote:
             | Everyone knows "Mountain Time". It is when you go to the
             | mountains on vacation, and don't spend much time adhering
             | to a strict schedule, instead taking leisurely strolls
             | around the fields and promising vague things like "I'll try
             | to be back for dinner".
        
               | jacobr1 wrote:
               | Closely related, yet distinct from, Island Time
        
           | benced wrote:
           | Also report it using IATA time zones (America/Los_Angeles) at
           | least in addition (I'd argue instead of) those abbreviations
           | which are completely unstandardized and not unique.
        
             | mananaysiempre wrote:
             | If the world were fair, we'd be calling these Eggert time
             | zones, as Paul Eggert (longtime tzdata maintainer until the
             | copyright trolls came) invented them; but it isn't.
        
           | throw0101c wrote:
           | Basically just use the output of `date -u`.
        
             | isbvhodnvemrwvn wrote:
             | It's locale-specific, which is not great.
        
             | yawaramin wrote:
             | Use `date -u -Iseconds`, please ;-)
        
               | drivers99 wrote:
               | date: illegal option -- I         usage: date [-jnRu] [-d
               | dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
               | [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
        
               | yawaramin wrote:
               | https://www.gnu.org/software/coreutils/manual/html_node/O
               | pti...
        
         | mulmen wrote:
         | Technically PDT is always 7 hours behind UTC. PST is always 8
         | hours behind. We just change which one we use twice a year.
         | Pacific time makes sense when you realize Fremont is the center
         | of the universe.
        
           | sph wrote:
           | True in theory, in practice people often get it wrong and use
           | the incorrect one.
        
             | rlpb wrote:
             | Indeed. There are Americans who will tell me PST, when they
             | meant PDT but forgot to mention that. Now I have to track
             | the American DST calendar as well as European DST calendar
             | to do the conversion.
             | 
             | There are also people who tell me GMT (because they think
             | that term means "the time in London") when they meant BST
             | (because in summer, London doesn't operate on GMT).
        
         | cogogo wrote:
         | The outage is in Virginia so PDT isn't even local time. On
         | their status page they are asking users to access the console
         | via a region specific endpoint like https://us-
         | west-2.console.aws.amazon.com. Wonder if the PDT timestamp is
         | because they have to serve the status page from US West right
         | now.
        
         | joshuanapoli wrote:
         | The fact that which timezone is used in the announcement is a
         | sign of progress... AWS announced it pretty quickly, gave nice
         | updates, and seems to have fixed the problem quickly enough.
         | I'm interested to see the postmortem...
        
       | chelobaka wrote:
       | Both Vercel and Netlify went down with it.
       | 
       | I wonder what % of the internet went down because of the us-
       | east-1 today.
        
       | eis wrote:
       | Seems like it took IMDB with it. Surprised that Amazon is not
       | able to keep their own property up when one of their zones goes
       | down. Not a great example.
        
       ___________________________________________________________________
       (page generated 2023-06-13 23:02 UTC)