[HN Gopher] Ask HN: Azure has run out of compute - anyone else a...
       ___________________________________________________________________
        
       Ask HN: Azure has run out of compute - anyone else affected?
        
       Last week we at n8n ran into problems getting a new database from
       Azure. After contacting support, it turns out that we can't add
       instances to our k8s cluster either. Azure has told they'll have
       more capacity in April 2023(!) -- but we'll have to stop accepting
       new users in ~35 days if we don't get any more. These problems seem
       only in the German region, but setting up in a new region would be
       complicated for us.  We never thought our startup would be
       threatened by the unreliability of a company like Microsoft, or
       that they wouldn't proactively inform us about this.  Is anyone
       else experiencing these problems?
        
       Author : janober
       Score  : 511 points
       Date   : 2022-11-25 15:58 UTC (7 hours ago)
        
       | ethotool wrote:
       | There is no such thing as unlimited when it comes to resources
       | and/or scalability in the hosting market. You might want to find
       | a local colocation provider, buy a few network switches and
       | servers as a secondary production and backup environment for your
       | startup. Deploying your own infrastructure gives you full control
       | over your startup. Yes it will raise your overhead and yes it's
       | not cheap but for a sustainable operation it's a requirement in
       | my opinion. I currently use Azure but I also have my own
       | deployment with my own IP addresses and ASN which I keep spare
       | capacity and keep some important servers on there incase
       | something happens with Azure. Definitely helps me sleep better at
       | night.
        
       | [deleted]
        
       | cfeduke wrote:
       | I worked briefly in an enterprise facing sales organization that
       | targeted multi-cloud deployments. Azure always had capacity
       | problems.
       | 
       | As ridiculous as it sounds, having an enterprise's applications
       | exist on multi-cloud isn't terrible if the application is mission
       | critical - not only does this get around Azure's constant
       | provisioning issues but protects an organization from the rare
       | provider failure. (Though multi-region AWS has never been a
       | problem in my experience, there is a first time for everything.)
       | Data transfer pricing between clouds is prohibitively expensive,
       | especially when you consider the reason why you may want multi-
       | cloud in the first place (e.g., it's easier to provision 1000+
       | instances on AWS than Azure for an Apache Spark cluster for a few
       | minutes or hours execution - mostly irrelevant if your data lives
       | in Azure Data Lake Storage).
        
       | Haga wrote:
        
       | Jamie9912 wrote:
       | Not the first time this has happened to Azure, they are always
       | under-provisioned. Move to AWS
        
       | lmeyerov wrote:
       | As part of launching our global GPU edge network, we need to
       | support low-volume regions, which means a small number of T4 gpu
       | in different timezones. Azure ran out last Christmas, or at least
       | refused us capacity, and is only adding the next tier of A10's
       | (~2x+ costlier?). We haven't had as much of a problem getting
       | GPUs of different grade on GCP + AWS. I get a form email every 2w
       | from Azure IT that they are working on it. Not as much of an
       | issue for bigger GPUs.
       | 
       | (Also... If into k8s, python, GPUs, graphs, viz, MLOps, working
       | with sec/fraud/supplychain/gov/etc customers on cool deploys, and
       | looking for a remote job, we are hiring for someone to take
       | ownership here!)
        
       | Epa095 wrote:
       | In Norway East Azure were incapable of provisioning new VMs for
       | several(4-5) days, caused by some IP issue. The only solution was
       | 'try to provision in the night, and don't turn it off if you get
       | one'. Their status page showed green through the whole period
       | though, even though nothing needing compute worked. So that was
       | cool....
        
       | arecurrence wrote:
       | This is not as rare as public clouds may lead people to believe.
       | I have had to move workloads around since AWS began (even between
       | public clouds on occasion).
       | 
       | In particular, GPU availability has been a continuing problem.
       | Unlike interchangeable x64 / arm64 instances with some
       | adjustments based on the new core and ram count... if no GPU
       | instances are available then I simply cannot run the job. AMD's
       | improved support has increasingly provided an alternative in some
       | situations but the problem persists.
       | 
       | I recommend doing the work to make the business somewhat cloud
       | agnostic, or at the very least multi-region capable. I realize
       | this is not an option for some services that have no equivalent
       | on other clouds but you mentioned databases and k8s clusters
       | which are both supported elsewhere.
        
         | andrewstuart wrote:
         | GPUs are better run in your own office.
         | 
         | All cloud providers charge much, much more for GPUs than if you
         | run a local machine.
         | 
         | Cloud GPUs are also a lot slower than state of the art consumer
         | GPUs.
         | 
         | Cloud GPUs: much slower, less available, much more expensive.
        
           | pclmulqdq wrote:
           | This is generally true for all accelerators (I work with
           | cloud and on-prem FPGAs for my startup, Arbitrand).
           | 
           | However, lots of people only need those accelerators once in
           | a while, so time sharing (aka cloud computing) makes a lot of
           | sense and saves a ton of money overall. For FPGAs and some
           | compute GPU applications, not having to handle support for
           | your accelerators is also nice.
        
           | bushbaba wrote:
           | Say you want 100 GPUs all inter connected to your multi
           | petabyte data lake that's being fed by your production
           | workload.
           | 
           | Sure you could buy all that equipment but I'd wager it's
           | cheaper, more agile, and greater velocity from it being in
           | the cloud
        
             | cosmic_quanta wrote:
             | I would argue that the cost profile is different.
             | 
             | Local GPUs are a big up-front cost. But assuming that your
             | workload is stable, in the long run I think local GPUs ends
             | up being cheaper per-hour than cloud.
             | 
             | For startups, it doesn't make sense to make the up-front
             | purchase, fine. But if you're optimizing for long-term
             | (amortized) costs, I'd be curious if cloud is cost-
             | effective.
        
               | macrolime wrote:
               | The long run being three months
        
               | bushbaba wrote:
               | An nvidia dgx box is roughly 40k. And that's not
               | including power/storage/rack Space.
               | 
               | But yes, If a single workstation can meet your gpu
               | training needs then it'll be cheaper with sufficient
               | usage
        
         | dehrmann wrote:
         | You want to be in a position where you can spin up in a nearby
         | region and pretend it's local and have things be good enough
         | for a while. Properly building out multi-region is hard, and
         | multi-cloud isn't worth it because it improves how you handle
         | rare events (where half the internet is already down) with
         | ongoing operational toil.
        
       | calltrak wrote:
       | I am so glad we made the decision to pull https://Bigger.Bio off
       | azure a while ago. It was nothing but problems on their platform.
        
       | geonaut wrote:
       | Any optimisations you can make? Will have the advantage of saving
       | you money across all platforms/regions
        
         | janober wrote:
         | Yes, some are possible and we are already doing that. Sadly
         | will it only delay the time we run out of resources. If we
         | would talk about a few weeks, we could for sure make it, but
         | over 4 months is sadly not possible.
        
       | option wrote:
       | I've heard from a friend who works at Microsoft that due to
       | energy crisis in Europe plus their data locality laws, Microsoft
       | is indeed running short on datacenter capacity there and can't do
       | anything about it no matter how much they are willing to spend.
        
         | janober wrote:
         | Very interesting. As mentioned in another post, I am sure they
         | are not trying to screw us or anybody else up. Is after all for
         | sure not in their interest. But not flagging that to users I do
         | not get at all. I would expect to get at least a warning email
         | a week about that, plus warning in the dashboard, but there was
         | literally nothing.
        
           | option wrote:
           | as I've heard it actually affects much more than Azure, also
           | all their cloud-based productivity suite.
        
         | MattGaiser wrote:
         | Couldn't even put up their own solar panels?
        
           | option wrote:
           | lol, what?
        
       | usgroup wrote:
       | Perhaps it's a per customer limit to ration capacity? If so maybe
       | you can legitimately work around it by creating multiple Azure
       | billing accounts.
        
         | janober wrote:
         | Could be possible. But as far as I know would two accounts in
         | the same data center not work for us for technical reasons.
        
       | RajT88 wrote:
       | Get in touch with your CSAM. They will be able to get you
       | assigned a capacity manager, if you don't already have one
       | assigned.
       | 
       | It is the function of the capacity manager to help you plan ahead
       | based on what the data center capacities look like going into the
       | future.
       | 
       | Meet monthly with your capacity manager. Get representation
       | across different technology interests - database, compute,
       | storage, event hubs, etc. Don't ever skip these meetings.
        
         | andrewstuart wrote:
         | Wow.
         | 
         | It's crazy that this could be valid advice, but it is.
        
         | steelframe wrote:
         | > Get in touch with your CSAM
         | 
         | Well that's an unfortunate acronym collision.
        
         | wstuartcl wrote:
         | Not much better than "meet with Infrastructure in Nov to plan
         | next years capacity and server purchases" for on prep -- has
         | Azure really degraded down to this?
        
           | RajT88 wrote:
           | It's quite a bit better than that, in fact. They talk to
           | their customers to try and understand all the big deployments
           | coming to understand if there is going to be a crunch at the
           | region/AZ level.
           | 
           | I'd be surprised if other cloud providers _aren 't_ doing
           | that in some form. I only have experience with Azure (so
           | far).
        
       | rickette wrote:
       | Reading these comments it looks like everyone runs into this all
       | the time. As a counterpoint: never run into this on Azure,
       | scaling up/down 20-30 vm's a day. Hope it stays that way...
        
       | bri3d wrote:
       | Every cloud provider will have these issues with specific
       | instance types in specific regions, although the Azure Germany
       | situation sounds perhaps a bit more dire. At my past (much
       | larger) employers we've always run into hardware capacity issues
       | with AWS too - we're just able to work around them.
       | 
       | Building on cloud requires a lot of trade offs, one being a need
       | for very robust cross-region capability and the ability to be
       | flexible with what instance types your infrastructure requires.
       | 
       | I'd use this as a driver to either invest in making your software
       | multi regional or cloud agnostic. Multi regional will be easier.
       | If you're already on k8s you should have a head start here.
        
         | Innominate wrote:
         | As much as this happens, I don't feel it's something to be
         | expected or even okay.
         | 
         | The major cloud services are expensive. This extra cost is
         | supposed to provide for cloud services' high level of
         | flexibility. Running out of capacity should be a rare event and
         | treated as a high priority problem to be fixed asap.
         | 
         | Without the ability to rapidly and arbitrarily scale, they're
         | just overpriced server farms.
        
           | unionpivo wrote:
           | > Without the ability to rapidly and arbitrarily scale,
           | they're just overpriced server farms.
           | 
           | I mean that's what cloud is (outsourced server farm). Sure
           | they also offer services on top, but that's mostly because
           | they want to lock you in, and can charge more for, so it's a
           | win win for them.
           | 
           | And there is no magic here, someone has to get the chips,
           | build servers and connect them to network. And while they
           | will often overbuild for capacity, they will never do it to a
           | degree, where they can't run out, because that would be way
           | to expensive and not financially viable.
           | 
           | I don't think any cloud will ever be able to guarantee to
           | never run out of resources.
        
             | remus wrote:
             | > I don't think any cloud will ever be able to guarantee to
             | never run out of resources.
             | 
             | I agree with this, but clearly there's a disconnect between
             | how often people expect these kinds of issues and how often
             | they actually happen. The whole point of the cloud is you
             | pay a premium for the added flexibility. If it turns out
             | that flexibility isn't there when you need it then
             | maintaining your own servers becomes a lot more attractive.
        
           | dharmab wrote:
           | > they're just overpriced server farms
           | 
           | That's exactly what a cloud is. It's someone else's
           | datacenter with an API.
        
             | lanstin wrote:
             | A really good API that makes it close to a software defined
             | everything world. Which has promise.
        
           | bagels wrote:
           | Some problems can't be fixed (eg. chip supply chain problems)
           | even if you have more money.
        
             | Havoc wrote:
             | >Some problems can't be fixed (eg. chip supply chain
             | problems) even if you have more money.
             | 
             | They can't magic chips into existence, but leaving a major
             | region like Germany high & dry for almost half a year
             | sounds like planning went wrong frankly. If it were a
             | matter of chips I would have thought on a 3+ month
             | timescale they can steal a few from another region that has
             | a bit of fat
        
         | PaulHoule wrote:
         | There is a "minimal viable product" of documenting the
         | configuration of your system so you can (1) run development,
         | test, staging instances, (2) jump to another region when
         | necessary, (3) from other disasters.
         | 
         | Ideally you have a script that goes from credentials to the
         | service to a complete working instance.
        
       | Kalanos wrote:
       | it's a european site during the world cup haha
        
       | hgsgm wrote:
        
       | purebscloudoff wrote:
       | The Batch Service schedule history monitor sucks. It is
       | inaccurate and doesn't sync the job order correctly. You can call
       | them, they will get on the phone and then say they fixed it. Then
       | you call them again because they didn't and they give you the
       | same answer. Can't blame them, most of them are on H1B's. Nobody
       | wants to be the squeaky wheel in that position. So you will just
       | get the runaround all the time.
        
       | orik wrote:
       | What sort of nodes are you using, can you add a node pool with a
       | different SKU?
        
       | pwarner wrote:
       | Azure, despite being smaller than AWS, I think has more regions.
       | So each one must be smaller, which likely means less spare
       | capacity.
       | 
       | I also sort of suspect the spot market is less robust there. Lots
       | of Azure is lift and shift on premises workloads, and those
       | aren't using spot. Without people using spot, it's even harder to
       | have spare capacity...
        
         | pclmulqdq wrote:
         | Azure uses much smaller datacenters than AWS or GCP. Microsoft
         | wasn't a big compute user before cloud, and it's a lot easier
         | to manage and build for smaller DCs. Amazon and Google both
         | needed huge DCs before being clouds.
        
       | trasz2 wrote:
        
       | exelib wrote:
       | M$ just don't want your money. We had experienced this problem
       | many times in Irland and German regions. Never experienced it
       | with Hetzner or AWS.
        
       | jeffbee wrote:
       | EC2 us-east-1 is chronically stocked out, too. Black Friday is
       | the worst day of the year for this. At work, we pre-allocated
       | tons of EC2 machines we don't really need, to hedge against EC2
       | stockout coinciding with some kind of incident. Yes, we are part
       | of the problem.
        
         | Jamie9912 wrote:
         | Part of what problem? I don't remember us-east-1 ever running
         | out of instances
        
         | don-code wrote:
         | In a former role, I used EC2 in us-east-1 to host the front
         | door e-commerce site for a consumer electronics company. AWS
         | suggested that we go through the Infrastructure Event
         | Mangaement process
         | (https://aws.amazon.com/premiumsupport/programs/iem/) for Black
         | Friday and Cyber Monday, so that staff on Amazon's side could
         | guarantee that they'd have capacity to run our system at its
         | forecasted peak.
         | 
         | The strategy they helped us arrive at was two-pronged:
         | 
         | 1. Pre-launch all needed infrastructure. Yes, for all their
         | "cloud scale", it was actually suggested that we preallocate
         | all of our servers the week before, rather than rely on
         | autoscaling.
         | 
         | 2. Order capacity reservations for all of those instances (http
         | s://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capa...).
         | This ensure that, if any of those instances go bad, we'd be
         | able to relaunch them without going to the back of the line,
         | and finding out that there was no more compute capacity
         | available.
        
       | unixhero wrote:
       | Big fan of n8n!
        
         | janober wrote:
         | Thanks a lot! That is always great to hear!
        
       | andrewstuart wrote:
       | Message to cloud providers:
       | 
       | List what you do you have available so we can choose.
       | 
       | Do not force users to randomly guess and be refused until
       | eventually finding something available.
        
         | RajT88 wrote:
         | Imagine if they did this in realtime. There's already DDOS
         | attacks happening which are abusing the cloud free trials at
         | scale - this would give them another attack vector.
         | 
         | I can see why they wouldn't want to do this.
        
         | layer8 wrote:
         | Why would they make any promises, or be upfront about their
         | resources at the risk of becoming less attractive compared to
         | competitors with more resources? It's not like many people are
         | shunning the cloud for that reason today (although maybe they
         | should).
        
           | bushbaba wrote:
           | Your price point and the clouds margin is tied to not sitting
           | on lots of unused instances. you want there to be adequate
           | capacity not excessive capacity
        
             | layer8 wrote:
             | It goes both ways: cloud providers don't want to make
             | promises about capacity, and cloud users don't want to make
             | promises about usage.
             | 
             | I don't know about price point. Dedicated servers can be
             | cheaper than cloud in many cases, if you have the
             | appropriate know-how, and the cloud business is very
             | profitable for a reason.
        
         | crmd wrote:
         | I need big m4n instances with 100gbe for product demos, and
         | spinning them up lately is like trying to get Taylor Swift
         | tickets on Ticketmaster. We end up wasting money running them
         | for days at a time instead of on demand because we're afraid of
         | losing them.
         | 
         | It's infuriating that AWS doesn't have an API that returns a
         | list of AZs with available inventory for a given instance type.
        
           | andrewstuart wrote:
           | Why not run them elsewhere?
           | 
           | There's lots of providers apart from AWS/Azure/GCP.
           | 
           | Or buy a machine and put it in your office.
           | 
           | Self hosting can often be cheaper and more available and
           | probably faster than using a cloud.
        
             | crmd wrote:
             | We used to run demos on a local hardware cluster, but we
             | found that prospective customers were reacting negatively
             | to demos that were not on the same platform they would be
             | running in production (AWS).
        
             | senderista wrote:
             | I've had great experiences running bare metal instances on
             | packet.io but haven't used them since the acquisition. For
             | accurate benchmarking it was fantastic (and much cheaper
             | than EC2 bare metal instances).
        
           | sammy2255 wrote:
           | Which regions are you trying?
        
       | rockylhotka wrote:
       | My understanding is that the German region is not run by
       | Microsoft, but a German company. This provides a legal shield
       | required by Germany to try and prevent the US government from
       | accessing data on those servers.
        
       | wizwit999 wrote:
       | In the future, build on clouds not clowns.
        
         | autophagian wrote:
         | Clowns are much more solid than clouds, which are famously
         | density-light. Given their traditional proximity to solid
         | ground, too, clowns are a much better choice of foundational
         | substrate than a cloud to build on.
        
           | mianas wrote:
           | Clown-car cluster sounds like it'd be a good name for a
           | compute product.
        
       | a99c43f2d565504 wrote:
       | While you're at it at making your "infrastructure as code" cloud
       | agnostic perhaps take a look at tools like Terraform (the only
       | one I'm familiar with). I've just started the work of defining
       | whatever we need to provision in their notation with the
       | objective that it can be done with a single push of a button in
       | the future.
        
         | janober wrote:
         | It has actually been done that way. For technical reasons is
         | sadly a move to a new data center even with that very
         | complicated and time consuming.
        
         | scarface74 wrote:
         | There is nothing "cloud agnostic" about using Terraform. Anyone
         | who says this has no experience actually trying to implement
         | it.
         | 
         | Terraform has different providers for each cloud provider and
         | the code is not transferable any more than saying if you use
         | Python to script your infrastructure it will be transferable.
        
           | robertlagrant wrote:
           | Agreed. I've advised people same before. You can build to
           | Kubernetes cluster-agnostic (mostly), but the stuff that gets
           | you to that point will be very cloud-specific.
           | 
           | The reason for Terraform, and it's a good one, is your
           | Terraform-related tooling doesn't have to change, e.g. if you
           | route all your infra change approvals through Terraform
           | Cloud), and you can coordinate multi-service changes, e.g.
           | update Auth0 infra to do X, then AWS to do Y.
        
       | sabujp wrote:
       | this is due to the energy crisis in europe caused by the war
        
       | [deleted]
        
       | robjan wrote:
       | We've been having this problem in Singapore for a couple of years
       | now. Can't add any VMs to our k8s cluster and can't provision a
       | number of services which made our multi-region BCP more
       | complicated.
        
         | janober wrote:
         | Years?!?! Guess I then have to be happy that in our case it is
         | "just" around 4 months.
        
       | craigkerstiens wrote:
       | This is nothing new, Azure has been having capacity problem for
       | over a year now[1]. Germany is not the only region affected at
       | all, it's the case for a number of instance types in some of
       | their larger US regions as well. In the meantime you can still
       | commit to reserved instances, there is just not a guarantee of
       | getting those instances when you need them.
       | 
       | The biggest advice I can give is 1. keep trying and grabbing
       | capacity continuously, then run with more than what you need. 2.
       | Explore migrating to another Azure region that runs less
       | constrained. You mention a new region would be complicated, but
       | it is likely much easier than another cloud.
       | 
       | 1. https://www.zdnet.com/article/azures-capacity-limitations-
       | ar...
        
         | rsynnott wrote:
         | > In the meantime you can still commit to reserved instances,
         | there is just not a guarantee of getting those instances when
         | you need them.
         | 
         | ... wait, what? How are they defining 'reserved'?
        
           | alexeldeib wrote:
           | RI are a billing concept (discounted rates for long term
           | commitment).
           | 
           | Dedicated capacity exists, but it's different (compute
           | reservation groups or dedicated hosts).
           | 
           | You can combine CRG/DH with RI for the desired effect,
           | although IMO it's a bit confusing.
           | 
           | (Azure employee)
        
           | robertlagrant wrote:
           | It's a billing mechanism. You pay less if you guarantee use.
           | Sadly, they don't guarantee availability of things to use :)
        
             | rsynnott wrote:
             | Yup, I'm aware of reserved instances (from an AWS PoV) but
             | I always assumed they were, at least theoretically, well,
             | reserved!
        
               | count wrote:
               | In the AWS context, they are, in fact. That's the
               | original point of them - so during big AZ failures your
               | reserved instances had first dibs on the available
               | capacity.
               | 
               | The billing thing became more of the point as big AZ
               | failures are so rare.
        
               | vageli wrote:
               | Non-zonal instance reservations do not reserve capacity. 
               | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-c
               | apa...
        
               | aeyes wrote:
               | On AWS instances are only reserved if you reserve a
               | specific instance type in a specific zone. Reservations
               | across multiple zones or savings plans don't reserve
               | capacity.
        
               | sgerenser wrote:
               | Reminds me of the classic Seinfeld car reservation bit:
               | https://m.youtube.com/watch?v=4T2GmGSNvaM
        
               | alexeldeib wrote:
               | Great bit! Same with airline overbooking ;)
        
             | lr1970 wrote:
             | With dedicated tenancy AWS Reserved Instances are
             | physically reserved for you.
        
         | grepfru_it wrote:
         | >Azure has been having capacity problem for over a year now
         | 
         | This is also a problem internally for Microsoft. GitHub and
         | LinkedIn still operate in private datacenters due to Azure
         | capacity issues
        
           | [deleted]
        
       | zxcvbn4038 wrote:
       | I'm sure Microsoft is just as surprised as you are. Almost every
       | European facility I ever worked with was constrained by either
       | space or power so you had to be really on top of your capacity
       | management. Facilities in the US seem to have unlimited power and
       | floor space so you never have to deal with either issue.
        
       | unreal37 wrote:
       | I remember, in the early days of the pandemic, that Azure
       | Australia ran out of compute too. It happens at the regional
       | level.
       | 
       | Are you stuck only to the German region, and can't go to other
       | European regions?
        
       | 1-6 wrote:
       | Infinite resources is only marketing and no hyperscaler on the
       | market should ever promise that or give people that impression if
       | they haven't accomplished scaling all throughout the entire
       | supply chain.
        
       | somenewaccount1 wrote:
       | If you want help duplicating your k8s cluster workload, hmu. I
       | love K8s and love contract work. $45/hr. Good luck!
        
       | api wrote:
       | Maybe they are doing this to push people into regions with lower
       | energy costs. Of course Northern Virginia or Canada is going to
       | give you much higher ping times.
        
         | sird wrote:
         | Interesting thought. It would be crazy if turning down business
         | was preferable to just raising prices to reflect increased
         | energy costs. I'm not a cloud expert, but maybe they don't have
         | the infra to price differently in some regions?
        
           | semicolon_storm wrote:
           | Azure definitely has the ability to charge differently per
           | region. They do it pretty frequently.
        
           | Scoundreller wrote:
           | That's the problem with charging average costs (assuming they
           | do that) but the new user costs are at the margin which can
           | be muuuuch higher.
        
         | danpalmer wrote:
         | Why wouldn't they just price higher in those regions? People
         | want/need regions for policy and compliance reasons, not just
         | for ping, particularly with Europe and Germany I'd expect.
        
         | 4ndrewl wrote:
         | And potential data residency issues
        
         | janober wrote:
         | I do honestly not think there is any bad intent behind it. I am
         | just surprised that this is happening at all (esp. not with a
         | resolution time of multiple months). They must have known for a
         | long time that this would happen, so I would have expected an
         | early heads-up!
        
       | MildlySerious wrote:
       | This is a bit tangential, but now might be a good time to
       | experiment with raising the price of your product. It might
       | extend the time you have until you have to stop accepting new
       | users entirely, in case your migration is taking longer than
       | needed.
        
       | Animats wrote:
       | Is there a secondary market for reselling Azure capacity? Can you
       | bid against other Azure customers?
        
       | kccqzy wrote:
       | Stockouts have happened on both AWS and GCP too. Most of the time
       | the problem is no longer a problem if you build your
       | infrastructure not to rely on a single region or availability
       | zone. On EC2 especially, even if you can't change to a different
       | region, try changing to a new instance type and that might work.
        
       | z3t4 wrote:
       | I think there is some general rule in business that you should
       | not _depend_ on a provider that if they lost your business it
       | would be less then one percent of their revenue. Or be ready when
       | they drop the guillotine.
        
         | ComputerGuru wrote:
         | For anyone getting started, that means no dependencies at all.
         | Even colocating would be out of the question, according to your
         | metric.
        
           | jodrellblank wrote:
           | Use Azure and AWS so that you're not dependent on either one.
           | 
           | (You could depend on another startup with no revenue).
        
           | iso1631 wrote:
           | In the olden days you use to buy computers from Dell and were
           | well under 1% of their revenue. But if they dropped you as a
           | customer, you bought them from HP instead, no problem.
        
       | mmcconnell1618 wrote:
       | I used to be a technical seller for Azure. This situation is
       | obviously not great for you as a customer but there are proactive
       | steps you can take to prevent this going forward. Reach out to
       | your sales team and work with them on your roadmap for compute
       | requirements going forward. The sales team has a forecast tool
       | that feeds back into the department that buys and racks the
       | equipment. If you can provide enough lead time, they will make
       | sure you have compute resources available in your subscriptions.
        
         | wstuartcl wrote:
         | What you describe is like the inverse of 90% of the reason
         | companies host in the cloud. What makes needing to forcast and
         | reach out to a sales guy to eventually stock hardware for your
         | needs (while now competing against other customers for those
         | resources) any better than hosting on prem.
         | 
         | AWS for sure has had resource constraints in different AZs
         | (especially during flack Friday and holiday loads) but I have
         | never had an issue finding resources to spin up especially if I
         | was willing to be flexible on vm type.
        
           | mmcconnell1618 wrote:
           | Under most circumstance, this isn't needed unless you have a
           | big ask. Say, you need 1,000 specific cores and GPUs, then
           | this process is the best way to ensure you have them
           | available.
           | 
           | The original poster probably has the ability to spin up other
           | instance types in their region. If there is no compute
           | capacity in the entire region, something went wrong
           | operationally.
           | 
           | I'm not suggesting you should put in a request for every new
           | resource you need, but if you have a specific instance type
           | or a large number needed, it helps. You're not losing the
           | ability to shut them down the next day if you don't need
           | them, you're just telling the Azure team that you expect to
           | spin some up around a certain time. If you're making a
           | significant request of compute capacity, the team has the
           | ability to reserve those instances for your subscriptions so
           | that you're not competing with others for those cores.
        
         | dustedcodes wrote:
         | Why work with a human in the Azure sales department and plan
         | cloud resources a year ahead? What's the point of the cloud at
         | this point? Then it just becomes a 100x more expensive version
         | of hiring an infrastructure person and plan with them your own
         | physical resources a year ahead.
        
         | janober wrote:
         | Thanks a lot, that is very helpful and great to know! Def.
         | something we will do in the future.
        
       | robertlagrant wrote:
       | Interesting semi-confirmed anecdote: when lockdown hit, Azure
       | began to refuse to allocate servers. One of the main reasons was
       | they prioritised servers in this way:
       | 
       | 1. Government/health/defence cloud customers
       | 
       | 2. Teams, which was exploding in use and they wanted to
       | capitalise on it
       | 
       | 3. Regular cloud customers
        
         | isoprophlex wrote:
         | Yeah this was real. I remember this. For a while they
         | selectively deprioritized customers, like you say. I'm not
         | judging, just confirming the observation.
        
       | fuzzy2 wrote:
       | Sort-of. I have a Postgres flexible database in the West Germany
       | Central region that can no longer be scaled. It was only created
       | for testing purposes, so no biggie. The backend is basically a
       | managed Compute resource.
       | 
       | If you need more reliability, I see only one way out: Go multi-
       | region or even multi-cloud.
        
       | chunk_waffle wrote:
       | Who else has heard countless times something like "with company
       | X's cloud platform you don't need to file a ticket and wait weeks
       | for another team to provision a physical server, just spin some
       | more up bro." The reality is you do, you've just outsourced the
       | problem.
        
       | cyptus wrote:
       | we have the same issue and escalated it through multiple azure
       | teams.
       | 
       | our quota has been silently set to 0 while there where still
       | instances running. this worked fine until auto-scale scaled the
       | instances down in the night to 1. at the start of the day auto
       | scale was not able to scale back up to the initial amount which
       | did lead into heavy performance issues and outages. we needed to
       | move the instances as azure support did not help us. after many
       | calls with azure and multiple teams involved we finally did not
       | get the quota approved (even if we did have it already and was
       | not asking for ,,new" quotas).
       | 
       | also we decided to not be able to host in the German azure region
       | anymore. Even if we could get the quota this is a business risk
       | we don't want to bear anymore to not be able to scale for
       | unexpected traffic.
       | 
       | this is huge for us as our application requires German servers.
       | We are still in research where to host in future.
        
         | cyptus wrote:
         | Interesting is that you can get instances in dev/test
         | subscriptions without any trouble.
        
       | andrewstuart wrote:
       | Yes it's weird that you have to ask them for instances which some
       | actual physical person looks at your request, thinks about it and
       | says yes or no to.
       | 
       | Instead of providing you with a list of the resources they do
       | have, you have to play this weird game where you ask for specific
       | instances in specific regions and then within several hours
       | someone emails back to say yes or no.
       | 
       | If it's no, you have to guess again where you might get the
       | instance you want and email them again and ask.
       | 
       | I envisage going to an old shop, and asking the shopkeep for a
       | compute instance in a region. He hobbles out the back, and after
       | a long delay comes back and says "nope, don't have no more of
       | them, anything else you might want?".
       | 
       | It's surprising this how it works. Not the auto scaling cloud
       | computing used to bring to mind.
        
         | kulahan wrote:
         | I briefly worked on an Azure team, and what I remember hearing
         | (a few years ago) was that they were building out data
         | warehouses as fast as they can, but they simply cannot keep up
         | with demand. A good problem to have, I thought, but maybe not
         | in light of this news!
        
         | gtirloni wrote:
         | Is this a joke comment?
        
           | selckin wrote:
           | no, they have very low quotas by default, and you have to
           | request increases through the portal, which then get rejected
           | and you click the button to contact support/email and then
           | you sometimes have to negotiate with them
           | 
           | you have to do this for every single instance type they have,
           | can't even experiment or test other instance types cause its
           | too much trouble to get quota
        
             | gtirloni wrote:
             | The comment I replied to was not talking about changing
             | quotas but actually creating instances.
             | 
             |  _> Yes it's weird that you have to ask them for instances
             | which some actual physical person looks at your request,
             | thinks about it and says yes or no to._
        
               | selckin wrote:
               | well can't create an instance without having quota
               | available
               | 
               | and low quota is low, like 10 cpu, so start a 2 node k8s
               | cluster with 8cpu each? nope, go request quota increase
        
             | andrewstuart wrote:
             | In the future it will be possible to use computers to
             | figure out what's available and automatically give it to
             | customers.
             | 
             | 21st century man.... it's coming.
        
               | salawat wrote:
               | But who will determine when more computers are needed to
               | figure out what's available to give to more customers
               | because there's been a spike in demand?
               | 
               | Computers don't fix everything. They just allow you to
               | f*ck up bigger, harder, and faster, usually in the most
               | banal way imaginable.
        
           | oneplane wrote:
           | No, Microsoft still isn't up to the 'use what you want, pay
           | for your usage' level that other companies tend to be. They
           | even still mix "licensing" with "usage" so you have to pay
           | for something to then be allowed to pay for using it...
        
           | andrewstuart wrote:
           | No, this is my actual experience using azure.
        
             | gtirloni wrote:
             | I can go on Azure right now and create an instance and
             | nobody will check anything manually and email me back
             | something. Maybe you're confusing Azure with some other
             | small town colocation provider.
        
               | wstuartcl wrote:
               | "an instance" lol
        
               | andrewstuart wrote:
               | Nope, I went through this process exchanging more than 30
               | emails trying to get the instances I wanted.
        
               | tsimionescu wrote:
               | If you want 1 instance, you're right. If you want 10 - 20
               | instances of one type in a region, the other poster's
               | experience matches my own: you have to open a support
               | request to ask for a quota increase, and that is not an
               | automated process.
        
               | deathanatos wrote:
               | Accounts have instance count quotas; you can get them
               | raised, but it is a support ticket to do so.
               | 
               | And sometimes, that is hard. I've had Azure support not
               | able to understand what quota they need to raise / what
               | quota is being requested. I had to at least link them to
               | their own documentation on it... (partly the confusion is
               | that quota support tickets allow selecting the quota as a
               | piece of metadata on the ticket, _but only for some
               | quotas_ , and of course, mine was for one of the ones not
               | listed. Why they don't just list all of them is anyone's
               | guess.)
        
       | vaderade wrote:
       | Yes, my company found this out trying to add both a database and
       | a serverless app to our existing infrastructure in Germany West
       | Central in July. They had no ETA for more GWC capacity back then
       | and told us to move to the North and West Europe regions.
        
       | natch wrote:
       | >We never thought our startup would be threatened by the
       | unreliability of a company like Microsoft
       | 
       | Had you never heard about (and this is unfortunately not a joke)
       | Microsoft's music service they once had, shut down after a few
       | short years leaving customers without the ability to listen to
       | the music they had paid to listen to?
       | 
       | The service was called, this was the trademarked name, Microsoft
       | "Plays for Sure." You cannot make this stuff up.
        
         | userbinator wrote:
         | That's also the name of the DRM system it had.
        
       | jobhenri wrote:
       | help
        
       | jedberg wrote:
       | > but setting up in a new region would be complicated for us.
       | 
       | I've never done K8 on Azure, but my understanding is that Azure
       | is pretty good about coordinating between your own datacenter
       | running windows and Azure. Maybe you can spin up some windows
       | boxes in a cheap datacenter to make it work?
        
         | toomuchtodo wrote:
         | Hetzner has a German presence I believe, and would work for
         | running k8 on bare metal for n8n to burst to temporarily for
         | running their orchestration and/or workflow runners. Might even
         | be cheaper in the long run versus a cloud provider. Just gotta
         | wire up the helm charts, containers, and whatever message bus
         | is pushing their messages around. Can write to blob storage
         | from anywhere if that's a component of the app.
        
           | zwaps wrote:
           | > Hetzner has a German presence I believe
           | 
           | I sure hope so, as a German company
        
       | Moissanite wrote:
       | Azure Germany is a separate partition from the rest of Azure -
       | presumably for compliance reasons. This is distinct from AWS,
       | where Frankfurt is just another region, albeit one with high
       | demand.
        
         | option wrote:
         | this: compliance plus lack of energy for new datacenter
         | capacity. source: colleague who works at msft. they have a true
         | crisis there and it will get worse.
        
         | tjungblut wrote:
         | Yep, it's run by the Telekom entirely IIRC from my time back at
         | MSFT. Microsoft "just deploys" Azure on it.
        
         | Terretta wrote:
         | > _AWS .. Frankfurt is just another region_
         | 
         | Unlike GCP and Azure, all AWS regions are (were) partitioned by
         | design. This "blast radius" is (was) fantastic for resilience,
         | security, and data sovereignty. It is (was) incredibly easy to
         | be compliant in AWS, not to mention the ruggedness benefits.
         | 
         | AWS customers with more money than cloud engineers kept
         | clamoring for cross-region capabilities ("Like GCP has!"), and
         | in last couple years AWS has been adding some.
         | 
         | Cloud customers should be careful what they wish for. If you
         | count on it in the data center, and you don't see it in a well-
         | architected cloud service provider, perhaps it's a legacy
         | pattern best left on the datacenter floor. In this case, at
         | some point hard partitioning could become tough to prove to
         | audit and impossible to count on for resilience.
         | 
         | UPDATE TO ADD: See my123's link below, first published
         | 2022-11-16, super helpful even if familiar with their approach.
         | 
         | PDF: https://docs.aws.amazon.com/pdfs/whitepapers/latest/aws-
         | faul...
        
           | my123 wrote:
           | Cross-region extensibility points are few and far between.
           | See https://docs.aws.amazon.com/whitepapers/latest/aws-fault-
           | iso... for more details.
        
           | senderista wrote:
           | There is a reason why GCP and Azure have had many more global
           | outages than AWS. Fault isolation always entails some level
           | of inconvenience.
        
           | EE84M3i wrote:
           | AWS has several different levels of region isolation.
           | 
           | There are aws region partitions - general, china, us gov
           | cloud (public), us gov secret and us gov top-secret.
           | 
           | Inside a partition, there can be some regions that are opt-in
           | - see https://docs.aws.amazon.com/general/latest/gr/rande-
           | manage.h...
           | 
           | My understanding is that opt-in regions are even more
           | isolated inside a specific partition for partition-global
           | services like IAM and maybe some other stuff.
        
           | robertlagrant wrote:
           | > Unlike GCP and Azure, all AWS regions are (were)
           | partitioned by design. This "blast radius" is (was) fantastic
           | for resilience, security, and data sovereignty. It is (was)
           | incredibly easy to be compliant in AWS, not to mention the
           | ruggedness benefits.
           | 
           | Could you elaborate on this a little? We use AWS, but are
           | evaluating OCI for certain (very specific) cases, and I'll
           | love to know what questions to ask for comparison purposes.
        
             | Terretta wrote:
             | You likely won't get anywhere asking Oracle questions,
             | their sales is very good at (not) answering.
             | 
             | Here is how partitioned/isolated OCI is by design:
             | 
             | https://www.wiz.io/blog/attachme-oracle-cloud-
             | vulnerability-...
             | 
             | While that's fixed, it speaks volumes to the architecture.
             | Very little has changed since 2018:
             | https://www.brightworkresearch.com/how-to-understand-the-
             | pro...
             | 
             | As noted there, I'd argue OCI is more akin to
             | Softlayer/Bluemix than to GCP, Azure, or AWS, but depending
             | on your certain very specific cases OCI may still be
             | appropriate.
        
       | aftbit wrote:
       | >These problems seem only in the German region, but setting up in
       | a new region would be complicated for us.
       | 
       | This seems like your fundamental problem. If you design an
       | architecture that is limited to a single region of a single cloud
       | provider, you are very likely to encounter issues at some point.
       | 
       | Luckily you have a full month to solve this problem before it
       | will prevent you from accepting new users. My suggestion is to
       | start making your app multi-regional or multi-provider ASAP.
        
       | krmboya wrote:
       | Damn! It's leaky abstractions again
        
       | jcmontx wrote:
       | Why not just creating a bigger DB instance in another region for
       | a few months? Sure, you'll take a performance hit, but 99% of
       | users won't notice or care
        
       | dharmab wrote:
       | I worked in a top 15 Azure customer. This is not unusual at all,
       | especially in the newer regions. Talk to your TAM before you make
       | attempt major capacity changes in a region. They may have advice
       | on specific SKUs to use or which zones have capacity (e.g. when
       | austrailaeast was being built 80%+ of the capacity was in one
       | zone for many months).
       | 
       | If you aren't a big spender you may not have a TAM who can get
       | this info for you. Welcome to Azure.
        
       | alexeldeib wrote:
       | What VM sizes?
       | 
       | Besides what's already been said, internal capacity differs
       | HUGELY based on VM SKU. If you need GPUs or something it'll be
       | tough. But a lot of the newer v4/v5 general compute SKUs
       | (D/Da/E/Ea/etc) have plenty of capacity in many regions.
       | 
       | If changing regions sounds like a pain, consider gambling on
       | other VM size availability.
       | 
       | (azure employee)
        
         | janober wrote:
         | Actually nothing fancy, for sure no GPUs. Just Standard_E4s_v4.
        
           | alexeldeib wrote:
           | Ah, bummer. If it helps, you can try this to list out VM
           | sizes with comparable capabilities and see if you have better
           | luck with any others (--all not really necessary since it
           | filters by NotAvailableForSubscription and similar):
           | az vm list-skus -l germanynorth -r virtualMachines --all >
           | germanynorth.json       jq '.[] | select( any(
           | .capabilities[]; (.name == "vCPUs" and (.value | tonumber) >=
           | 4 )) and any(.capabilities[]; (.name == "MemoryGB" and (
           | .value | tonumber ) >= 32) ) )' germanynorth.json
           | 
           | 4/32 because that's what E4s_v4 would have.
        
             | janober wrote:
             | Thanks a lot! Just checked internally. Apparently are there
             | some instances which we could get but would not work cost
             | wise (have for example a lot of CPUs but we mainly care
             | about RAM). Additionally, is there also still a region-wide
             | CPU limit that would still cause us problems. So sadly not
             | a long-term solution. But thanks a lot!
        
           | cyptus wrote:
           | same goes for every App service, no matter which instance
           | size
        
           | [deleted]
        
       | donedealomg wrote:
        
       | habibur wrote:
       | Looks like github is down right now. Or is it only me?
        
       | TexanFeller wrote:
       | Infinite scaling clouds, they said. In AWS at work we spin up
       | large numbers of EMR nodes and every few days get stuck waiting
       | for availability of certain instance types in our region too. I
       | guess we could reserve more, but that defeats a lot of scale up
       | and down advantages.
        
         | mirekrusin wrote:
         | Serverless runs out of servers.
        
       | lyind wrote:
       | If this is a serious problem for your business, you use K8s and
       | require assistance quickly moving your workloads, consider
       | contacting:
       | 
       | https://www.giantswarm.io/
       | 
       | (I work at Giant Swarm.)
        
       | xwowsersx wrote:
       | Oof, that sucks and I feel for you. That said...
       | 
       | > setting up in a new region would be complicated for us.
       | 
       | Sounds to me like you've got a few weeks to get this working.
       | Deprioritize all other work, get everyone working on this little
       | DevOps/Infra project. You should've been multi-region from the
       | outset, if not multi-cloud.
       | 
       | When using the public cloud, we do tend to take it all for
       | granted and don't even think about the fact that physical
       | hardware is required for our clusters and that, yes, they can run
       | out.
       | 
       | Anyways, however hard getting another region set up may be, it
       | seems you've no choice but to prioritize that work now. May also
       | want to look into other cloud providers as well, depending on how
       | practical or how overkill going multi-cloud may or may not be for
       | your needs.
       | 
       | I wish you luck.
        
         | janober wrote:
         | Thanks a lot! You are totally right, it is for sure something
         | we will find a solution for. But honestly, do I not want to. As
         | a startup, you have very few resources and deliberately place
         | some exact bets. Deprioritizing everything to work on something
         | for a long time that was not prioritized, just to then end up
         | again where you were before (a working cloud solution) is the
         | last thing any startup should be forced to do. Anyway, it seems
         | like we do not have much choice here.
        
           | xwowsersx wrote:
           | I hear you. It's not a fun position to be in. And sometimes
           | you're correct to take calculated risks and maybe the
           | expected value was positive here, despite what ended up
           | happening.
           | 
           | Without knowing the details about your services and
           | infrastructure, it's hard for me to know what's involved in
           | going multi-region now. Are you sure it's such a a gargantuan
           | effort? I would've thought one person working full-time on
           | this for a week or two would be enough, but again I don't
           | know the details of your setup.
           | 
           | One option would be to pay a consultant who is an expert in
           | Azure/cloud stuff to come in and help. May not be cheap, but
           | could be a lot better and quicker for you and better for the
           | business, especially if none of you are really big experts in
           | Azure.
           | 
           | I've been here before (I think)...had to wear many hats and
           | scramble to make sales, build the tech, act as de facto
           | DevOps person even without a lot of experience doing it, etc.
           | That is the way, but stuff happens.
           | 
           | Happy to chat about specifics if you want to bounce ideas off
           | of me or go through your particular situation. Can't promise
           | I'll have concrete advice, but happy to talk it through.
        
             | janober wrote:
             | Thanks a lot, is really super nice of you and appreciated!
             | Luckily we have somebody very knowledgeable on our team.
             | Will tell him to reach out if he wants to have a peer to
             | brain-storm some ideas.
        
               | xwowsersx wrote:
               | Glad to hear you have the right people. Good luck, my
               | friend.
        
           | hgsgm wrote:
        
           | jnsaff2 wrote:
           | I feel for you. Also it sucks to be in this position.
           | 
           | Let the scar you get from this is be a learning experience,
           | hopefully you will not fall into the same trap again to trust
           | this company.
           | 
           | In my career I'm in a place where anyone suggesting I do work
           | on Azure gets an instant doubling of my asking day-rate and I
           | really hope the will be put off and find another victim for
           | this gig.
           | 
           | That said, another learning experience would be to use
           | terraform or something (tbh for azure the only sane thing is
           | terraform, ARM templates are just garbage). Having
           | terraformed your one region switching to the other would be
           | much easier, tho not trivial.
        
           | martinald wrote:
           | I disagree with the other poster you should have been
           | multiregion from the start. It adds a load of complexity and
           | failure cases for early stage Startups.
           | 
           | Very poor position to be in, apparently this happened in
           | azure UK recently too.
        
         | xwowsersx wrote:
         | I'll reply to my own comment in response to a since-deleted
         | reply that went something to the effect of "this is terrible
         | advice for a young startup trying to get to product market
         | for":
         | 
         | I'm totally on board with the idea of being scrappy and taking
         | shortcuts in order to get to PMF as soon as possible. However,
         | it seems the proof is in the pudding here. If you can't service
         | customers due to lack of compute resources, you can't get to
         | PMF.
         | 
         | Also, yes there are certain infrastructure and network
         | topologies that would absolutely be overkill for a young
         | startup. I don't think multi-region is one of those things. I
         | don't have experience with Azure directly, but on every other
         | cloud providers, going multi-region is not something that
         | requires huge amounts of time or resources. You just need to be
         | mindful of it from the outset. And if you decide not to be,
         | then at least be intentional and conscious about the risk and
         | have a plan in place for what happens when you get bit by
         | deciding _not_ to go multi-region.
        
           | cheese123 wrote:
           | GDPR might be a problem here. But this brings us to an
           | important point: this is not your infrastructure, but someone
           | elses.
        
             | masom wrote:
             | GDPR isn't really related to the infrastructure, and isn't
             | a problem if you built your product knowing you'll need to
             | conform. Shopify is GDPR compliant, for all merchants, and
             | runs on Google Cloud in multiple regions.
        
               | nr2x wrote:
               | Privacy shield very much matters where your servers are.
               | EU cracking down hard on extra territorial transfers in
               | the past year with more to come.
               | 
               | Also, lots of companies assert GDPR compliance via
               | magical thinking. They most often are wholly wrong.
               | Shopify can say whatever they want, but there's no
               | certification body.
               | 
               | Source: I'm the person who evaluates and builds
               | compliance systems for a range of services you almost
               | definitely use.
        
               | revicon wrote:
               | There are specific requirements in Germany that require
               | user data to not leave the country. I believe that was
               | what OP was referring to.
        
               | pm90 wrote:
               | Genuine question: is knowledge on how to do this well
               | known? Without that accessibility, I'm picturing folks
               | operating in EU being unwilling to take the risk of not
               | being compliant and just hosting everything in a single
               | region.
        
               | nr2x wrote:
               | They're wrong.
        
               | unionpivo wrote:
               | GDPR is not some boogieman, it can be pain to do on
               | existing products that were build pre GDPR, but if you
               | are starting new project, being GDPR compliant is pretty
               | straight forward and not hard/time consuming, unless you
               | are explicitly trying to do something shady*
               | 
               | *privacy invasive that GDPR is explicitly set up to make
               | harder, so duh
        
             | xwowsersx wrote:
             | Good point, I didn't think about that.
        
             | soco wrote:
             | There are quite a few European regions, but we don't know
             | how the others stand with their computing limits...
        
           | arcturus17 wrote:
           | Also, n8n arguably has product-market fit so the advice was
           | impertinent to start with...
        
           | logisticpeach wrote:
           | I'd add to this by asking: how much more PMF can you get when
           | you have a two week horizon of new customers before you
           | literally run out of compute resource in a major cloud
           | provider data centre?
           | 
           | Sounds like customers are coming in thick and fast.
           | 
           | If this is the dynamic and the company can't spare a few
           | weeks to solve it, something has gone seriously wrong in a
           | very interesting way.
        
         | websap wrote:
         | Is it that much cheaper for you to build a new region on Azure
         | versus getting setup on AWS?
         | 
         | If you rely on Kubernetes for orchestration and have minimal
         | cloud API dependency, it may be worth that evaluating this
         | option.
         | 
         | Also, do you have a TAM associated with your account? Are you
         | just going through regular support channels? Can they deliver
         | different instance types (not sure what the Azure parallel is),
         | can they deliver short term capacity, etc?
         | 
         | I would try to push Microsoft more here. It's not like they've
         | stopped on-boarding new customers into that region right? What
         | happens if you create a new account in that region?
        
           | janober wrote:
           | We already tried to push Microsoft, sadly have they been not
           | very helpful. Still trying to get in contact with somebody
           | that can actually make a difference. After all, are we also
           | not asking for a hundred machines. Can really not imagine
           | that they can not somehow make the resources we require
           | available.
        
             | [deleted]
        
             | dekhn wrote:
             | I'll ask again what the person above asked: do you have a
             | TAM.
             | 
             | If you don't you're at a big disadvantage.
        
               | janober wrote:
               | Sorry, missed that. No, we do not.
        
               | websap wrote:
               | Get one asap. Your TAM is the insider and should push for
               | you.
        
               | cyptus wrote:
               | we tried escalating this through CASM and it did not
               | work. The region is blocked for every quota, even a
               | single instance.
        
             | jiggawatts wrote:
             | Some tricks you can try is to switch to a different SKU.
             | Most Azure databases have different generations of
             | underlying compute. They may be out of just on model. Try a
             | different one.
             | 
             | Similarly, just keep trying to change the size. Often it'll
             | go through when someone else decommissions something.
        
         | psychphysic wrote:
         | > Deprioritize all other work, get everyone working on this
         | little DevOps/Infra project.
         | 
         | This is doubly worthwhile as if this stumble kills the startup
         | (it can happen) this will be excellent experience to take to
         | the next employer :)
        
           | lanstin wrote:
           | Also a lesson here is don't create any prod asserts manually
           | ever. Terraform or some other software to define your cloud
           | assets. Then this issue is just a matter of adding a top
           | level loop or maybe adding a region parameter to a layer of
           | software. Cloud is only efficient if you take the software
           | defined every thing seriously. Otherwise it is premium
           | hosting where you are likely a small fish.
        
         | astrostl wrote:
         | "Multi-cloud from the outset" is probably the single-worst
         | generic cloud advice that I think anyone could be given. In
         | professional cloud consulting the rule of thumb is to do one
         | cloud with excellence until you even think about another one.
         | And even that is really just kicking the conversational can, as
         | both becoming excellent and actually needing multi-cloud
         | combined is one-in-a-billion.
        
           | gumby wrote:
           | That pretty much binds your hands since in our experience the
           | one _provider_ who can do "one cloud with excellence" is AWS.
           | 
           | (As an aside I also agree that multi cloud from the get go is
           | a YAGNI violation. Just keep in the back of your mind "could
           | we have an alternative to this?" when using your provider's
           | proprietary features.)
        
             | marcosdumay wrote:
             | That generalizes to every kind of lock-in: have a viable
             | escape plan, but only execute it if you need or it becomes
             | cheap enough that it won't harm you.
             | 
             | Just having the plan is already expensive enough.
        
             | jiggawatts wrote:
             | My experience is the opposite: AWS has more features on
             | paper but most of them exist only to tick a checkbox. Azure
             | has more integrations between their offerings, as well as
             | Azure Active Directory, and Microsoft 365.
        
               | 29athrowaway wrote:
               | And Active Directory integrates horribly with everything
               | outside Microsoft.
        
               | SgtBastard wrote:
               | Azure Active Directory has both SAML and OpenID Connect
               | endpoints... what's missing?
        
           | ghayes wrote:
           | Completely agree, though certain aspects, such as running on
           | k8s or Docker might make it easier to switch if you ever
           | decide to, versus say, being tightly coupled with many
           | bespoke cloud products.
        
             | rikthevik wrote:
             | My philosophy is to make switching to a new cloud possible.
             | It doesn't have to be easy. We just shouldn't nail our feet
             | to the floor.
        
             | d3ckard wrote:
             | Or you could just deploy on metal, which will be cheaper
             | and sufficient for vast majority of cases. Plus you can
             | always migrate to VMs with relatively low hassle.
        
           | xwowsersx wrote:
           | Who are you quoting? I said "multi-region from the outset"
           | and later acknowledged that multi-cloud would probably be
           | overkill.
        
           | aftbit wrote:
           | If you use generic enough services (container hosts, load
           | balancers, VMs, object stores, even hosted SQL DBs, etc),
           | then the multi-cloud journey is not that hard. The challenge
           | comes when you have build a whole architecture on top of some
           | AWS magic that simply does not have an easy alternative in
           | the non-cloud world.
        
           | elwell wrote:
           | Surely it would depend on the reliability demands of your
           | product.
        
           | askvictor wrote:
           | On top of which, startups often don't have that luxury; you
           | have often need to ruthlessly prioritise your effort.
        
         | [deleted]
        
         | snotrockets wrote:
         | "Being multicloud from the outset" is a very silly idea for
         | most use cases.
         | 
         | The way to get more from most cloud is by becoming a partner,
         | not just a customer. And the way to do that is increase
         | dependency and usage.
        
           | salawat wrote:
           | t. Sales department of Cloud provider
        
         | rch wrote:
         | Thanks largely to k8s, running on multiple cloud providers and
         | your own hardware is a lot more convenient than it was a few
         | years ago. Component interfaces and protocols are a lot more
         | consistent across platforms as well.
        
         | theteapot wrote:
         | > everyone working on this little DevOps/Infra project.
         | 
         | Everyone? That's not going to help.
        
         | mhitza wrote:
         | I take offense with you comment. It's not the first time I'm
         | hearing about multi-region/multi-cloud in online tech forums,
         | however reality doesn't match.
         | 
         | I don't want to be snarky, but when large service providers
         | like AWS have their own crossregion downtime because one
         | snowflake of a service in us-east-1 is down, I kind of dismiss
         | the virtue signaling of high resilient multi-(az/region/cloud)
         | ever existing in practice.
         | 
         | If you can somehow have a separate database per region/cloud,
         | sure, I can understand that, but if you have to shard your
         | database across many clouds, I'd dread having to tame such a
         | beast, especially within a startup.
        
           | panarky wrote:
           | _> dismiss the virtue signaling of high resilient ..._
           | 
           | So you're saying it's impossible to improve reliability from
           | 97% to 99% because you can never make it to 100%.
        
             | quesera wrote:
             | If your single-AZ, single-region cloud is not giving you 3
             | or 4 9's of reliability out of the box, you are using the
             | wrong cloud.
             | 
             | Multi-AZ and multi-region add complexity and cost much more
             | quickly than they add reliability.
             | 
             | Sometimes it is worth it. Sometimes it is not.
        
               | champtar wrote:
               | Depends on your needs, but having your data & database
               | multi-az to ensure durability can avoid you having to
               | restart from backups. I'm thinking about an old AWS
               | incident were they actually lost EBS data:
               | https://www.bleepingcomputer.com/news/technology/amazon-
               | aws-... Also make sure your backups are in a different AZ
               | (thinking about OVH ...) or region or even at a different
               | provider.
        
       | dszoboszlay wrote:
       | Good news is that today is Black Friday, so the e-commerce
       | industry is running at peak capacity. In 30 days it will be
       | Christmas, and by then (the very latest!) everybody will scale
       | back, so you have a good chance to gain access to more compute
       | before you reach the end of your runway.
        
       | ig1 wrote:
       | Ask your VCs/angels for help, this is the kind of thing they can
       | definitely help with.
       | 
       | (Speaking from experience - one of our portfolio companies had a
       | similar challenge and we used our network to get to one of the
       | execs of the vendor involved)
        
         | janober wrote:
         | Thanks a lot. Yes that is also something we are trying in
         | parallel.
        
       | victor106 wrote:
       | I am sorry to say but at this point Azure is so f'ed up I think
       | it should only be considered after AWS and GCP.
       | 
       | The documentation is terrible and the Azure portal is so slow and
       | laggy I can't even believe it. Not to mention how unreliable
       | their stack is.
        
       | jenscow wrote:
       | Maybe Microsoft had just got their AWS bill?
        
         | usgroup wrote:
         | Well I thought that was funny :-)
        
       | jesseryoung wrote:
       | Ran into a similar issue last year in the East US region. We
       | contracted support and they gave a similar response. From my
       | understanding talking to people who use AWS and GCP this isn't
       | uncommon across cloud platforms.
       | 
       | While we could've just swapped a deployment parameter to deploy
       | to another region, we opted to just use a different SKU of VMs
       | for a short period and switch back to the VMs when they were
       | available again.
       | 
       | We haven't seen issues since.
        
         | wstuartcl wrote:
         | yeah AWS tends to have capacity issues during high volume
         | periods like black Friday (I think this is now actually because
         | most large users pre reserve a buffer pool of vms sitting
         | unused) -- but I have never had an issue where AWS has told me
         | there would be no capacity for months. Its usually swapping AZ
         | or regions or being slightly flexible on your sku. And if you
         | are sensitive to this and find it happening take a look at your
         | sku loadout you may be choosing a very high demand vm type and
         | shifting just slightly gets way more capacity.
         | 
         | ^^ and by capacity I am talking like 10's or 100s of vms being
         | available not 1.
        
       | omk wrote:
       | While some may immediately run a comparison between Azure, AWS
       | and GCP let it be noted that any cloud platform facing this and
       | making it to headlines is not good for the cloud industry over
       | all.
        
         | prmoustache wrote:
         | The thing is: if cloud vendors struggle getting new machines,
         | imagine your small company trying to order and get delivered
         | new on-prem servers quickly.
         | 
         | I worked for a company that worked mostly on-prem until 1y ago
         | and last time they had ordered machines availability from Dell
         | was scarce with huge delays.
        
       | l-p wrote:
       | > We never thought our startup would be threatened by the
       | unreliability of a company like Microsoft
       | 
       | You're new to Azure I guess.
       | 
       | I'm glad the outage I had yesterday was only the third major one
       | this year, though the one in august made me lose days of traffic,
       | months of back and forth with their support, and a good chunk of
       | my sanity and patience in face of blatant documented lies and
       | general incompetence.
       | 
       | One consumer-grade fiber link is enough to serve my company's
       | traffic and with two months of what we pay MS for their barely
       | working cloud I could buy enough hardware to host our product for
       | a year of two of sustained growth.
        
         | janober wrote:
         | We actually use Azure for ~2 years now. It worked the most time
         | reasonably well, even though we had also a few issues. But our
         | current issue + ready your and other comments will probably
         | result in looking for a new home.
        
         | Twirrim wrote:
         | It's worth pointing out that every cloud is the same when it
         | comes to capacity / capacity risk. They all apply a lot of time
         | and effort to figuring out the optimal amount of capacity to
         | order based on track record of both customer demand and supply
         | chain satisfaction.
         | 
         | Too much capacity is money spent getting no return, up front
         | capex, ongoing opex, physical space in facilities etc.
         | 
         | On cloud scales (averaged out over all the customers) the
         | demand tends to follow pretty stable and predictable patterns,
         | and the ones that actually tend to put capacity at risk (large
         | customers) have contracts where they'll give plenty of heads-up
         | to the providers.
         | 
         | What has been very problematical over the past few years has
         | been the supply chains. Intel's issues for a few years in
         | getting CPUs out _really_ hurt the supply chains. All of the
         | major providers struggled through it, and the market is still
         | somewhat unpredictable. The supply chain woes that have been
         | wrecking chaos with everything from the car industry to the
         | domestic white goods industry are having similar impacts on the
         | server industry.
         | 
         | The level of unreliability in the supply chain is making it
         | _very_ difficult for the capacity management folks to do their
         | job. It 's not even that predictable which supply chain is
         | going to be affected. Some of them are running far smoother and
         | faster and capacity lands far faster than you'd expect, while
         | others are completely messed up, then next month it's all
         | flipped around. They're being paranoid, assuming the worst and
         | _still_ not getting it right.
         | 
         | This is an area where buying physical hardware directly doesn't
         | provide any particular advantages. Their supply chains are just
         | as messed up.
         | 
         | The best thing to try to do is do your best to be as hardware
         | agnostic as is technically possible, so you can use whatever is
         | available... which sucks.
        
           | whoknew1122 wrote:
           | It may be a risk borne by every cloud provider, but why does
           | this only really happen to Microsoft among large providers?
           | 
           | As far as chip shortages, it probably helps that Amazon makes
           | its own chips. Microsoft could do the same rather than
           | running out of capacity and blaming chip shortages.
           | 
           | Microsoft had to know that at some point they were going to
           | run out of capacity. They should've either did something
           | about it or let customers know.
        
             | [deleted]
        
             | Spooky23 wrote:
             | They have a lot of technical debt. They have like 6
             | different clouds (at least 4 gov clouds alone) and SLA
             | commitments to things like O365 that silo their
             | infrastructure.
             | 
             | MS also makes all sorts of crazy deals and commitments, and
             | I wouldn't be surprised if being collocated with a
             | strategic customer may lead to local shortages of
             | resources.
        
             | Twirrim wrote:
             | There's all sorts of examples of AWS failing to be able to
             | provide capacity too. Just do a search for "aws
             | InsufficientInstanceCapacity" or similar. I remember
             | Fortnite talking about capacity limits in relation to an
             | incident, but I'm struggling to find the post-mortem I saw
             | it in.
             | 
             | Even when Microsoft was being open about Azure having
             | difficulty getting Intel chips, AWS, GCP etc. were in the
             | same position and just not really talking about it. From my
             | time in AWS there were some other times when some services
             | with specialised hardware came really, really close to
             | running out of capacity and had to scramble around with
             | major internal "fire drills" against services to recoup
             | capacity.
             | 
             | Most people won't run in to these issues, the clouds all
             | tend to be good at it, but they still happen.
             | 
             | There are also advantages of the economy of scale and brand
             | recognition. The more customers you have the more the
             | capacity trends smooth out, the easier it is to predict
             | need, even if you're still stuck with uncertainty on the
             | ordering side.
        
             | cosmotic wrote:
             | Amazon's own chips are ARM. ARM requires somewhat
             | specialized builds of software that are likely different
             | than development instances, CI/CD, and/or local dev
             | machines. It's not insurmountable but does certainly
             | complicate usage.
        
           | Spooky23 wrote:
           | > This is an area where buying physical hardware directly
           | doesn't provide any particular advantages. Their supply
           | chains are just as messed up.
           | 
           | Yup. And a few of the OEMs have stopped talking about supply
           | chain integrity. Many folks have observed more memory and
           | power supply problems since the pandemic.
        
           | more_corn wrote:
           | All cloud providers are NOT equal here. Amazon over-
           | provisions and sells the excess capacity as spot instances.
        
             | Twirrim wrote:
             | So does google, so does azure etc. etc.
             | https://cloud.google.com/spot-vms,
             | https://azure.microsoft.com/en-us/products/virtual-
             | machines/...
             | 
             | Spot instances exist just to try to turn over-provisions in
             | to not a complete loss. You're at least making some money
             | from your mistake.
             | 
             | edit: You should consider "spot instances" in general to be
             | a failure as far as a cloud provider is concerned. It means
             | you've got your guesses wrong. You always want a buffer
             | zone, but not _that_ much of a buffer zone. The biggest
             | single cost for cloud providers is the per-rack OpEx, the
             | cost of powering, cooling etc.
        
               | femto113 wrote:
               | Cloud providers aren't guessing at demand to plan
               | capacity, they're literally building new data centers and
               | then wheeling new racks into them as fast as they
               | physically can (short-term decisions are more likely made
               | at the other end, e.g. when to retire old systems, not
               | add new ones). AWS was born out of the fact that Amazon's
               | own compute needs are inherently variable so to meet peak
               | demand they had to "over-provision" compared to average
               | demand--this in turn meant they had a lot of excess
               | compute power most of the time. At the point when Amazon
               | still was a dominant consumer of AWS, spot instances were
               | actually a deliberate convenience to Amazon, since it
               | meant AWS could monetize resources while still ensuring
               | Amazon could claim them instantly when needed (later they
               | added a two minute warning, but early on they could
               | literally disappear at any moment, and regularly did).
        
               | Twirrim wrote:
               | You're talking to someone who has spent the last decade
               | working for major cloud providers, including AWS, on
               | infrastructure and services sides of things, including
               | work around data feeds for the capacity management teams.
               | I have more than a passing familiarity with the way
               | things actually work at a cloud.
               | 
               | They are _constantly_ guessing at cloud capacity. Short,
               | medium, and long term models with forecasting galore, all
               | under constant recalculation based on customer actions
               | (they literally take live feeds of creation /termination
               | actions), and yes they also take in to account hardware
               | failure and repair rates. Consolidating racks of
               | equipment is a pain in the neck and tends to be avoided,
               | unless you can safely live migrate away all instances.
               | 
               | They all build up various models, using all sorts of
               | forecasting techniques. The longer range forecasts are
               | involved in data center provisioning, along with other
               | business analysis, market research, legal analysis etc.
               | that helps define where future regions should be.
               | 
               | It's still a guess. They can't tell what the actual
               | demand will be, and they can't tell what is going to
               | happen with the supply chain (supply chain issues are the
               | biggest nightmare for capacity planning teams). Sometimes
               | they get it wrong.
               | 
               | The capacity management teams spend a lot of time and
               | expertise to keep the company just sufficiently ahead of
               | demand. It's a crucial part of keeping costs under
               | control.
        
             | jiggawatts wrote:
             | So does Azure.
        
           | marcinzm wrote:
           | In my experience there are differences between clouds so
           | while all have the same basic problem in practice some may be
           | better than others. I've never had issues getting GPUs on AWS
           | but GCP constantly has issues with GPU/TPU capacity.
        
             | indoorskier wrote:
             | Is this region dependent? In us-east I can't get them to
             | approve a quota for GPU instance families (G,P) for
             | anything more than 4 CPUs. At one point they rejected my
             | request citing "unprecedented demand". Of course this is
             | small time, just my personal account.
             | 
             | It is true I can get an instance most of the time, but not
             | if I need >16GiB GPU memory.
        
             | ajmurmann wrote:
             | There probably are difference occurrence rates. We had to
             | modify how our test suite provisions instances, since we
             | used to regularly run into instance availability
             | constraints on EC2 during the holidays.
        
         | marcosdumay wrote:
         | New Microsoft customer at all.
        
         | adrr wrote:
         | Azure has some of the biggest outages like when they went down
         | on Feb29th for the whole day.
         | 
         | https://azure.microsoft.com/en-us/blog/summary-of-windows-az...
        
           | Godel_unicode wrote:
           | 10 years ago, has there been something similar recently?
        
             | flippingbits wrote:
             | The last one I remember is this one from August this year:
             | https://redmondmag.com/articles/2022/08/30/microsoft-
             | blames-... It was not a complete outage but these DNS
             | issues caused a lot of pain.
        
         | rufius wrote:
         | Having worked for a company that's a very large customer of
         | AWS's, it's not much better.
         | 
         | I've worked with both Azure and AWS professionally and both
         | have had their fair share of "too many outages" or capacity
         | issues. At this point, you basically _must_ go multi-region to
         | ensure capacity and even better if you can go multi-cloud.
        
         | ethbr0 wrote:
         | Out of curiosity (from someone inexperienced with Azure), is it
         | a skill/ability chasm between MS engineering and outsourced
         | support?
         | 
         | TAMs tend to be a bandaid organizational sign that support-as-
         | normal sucks and isn't sufficient to get the job done (ie fix
         | everything that breaks and isn't self-serve).
        
           | Spooky23 wrote:
           | Microsoft support is really awful. Basically, if you need it
           | regularly, you just pay for resident engineers who can bypass
           | the wall between the product groups and you. I've had nothing
           | but great experiences with those guys.
           | 
           | Otherwise, especially if there's a broader problem, they play
           | lots of games with SLAs, etc.
        
         | roflyear wrote:
         | I have DB connection issues at least a few times a week.
         | Annoying.
        
         | Insanity wrote:
         | The common argument of "our own hardware would be more
         | profitable in X years" is typically countered with "but you
         | need to pay engineers to maintain it, which adds to the cost".
         | 
         | Another advantage of not having to own the hardware is that
         | it's easier to scale, and get started with new types of
         | services. (i.e, datawarehouse solutions, serverless compute,
         | new DB types,..).
         | 
         | I'm not trying to advocate for or against cloud solutions here,
         | but just pointing out that the decision making has more factors
         | apart from "hardware cost".
        
           | unionpivo wrote:
           | Depends on how stable your needs are, but sometimes its
           | cheaper even when you considerer total cost and not just for
           | big deployments.
           | 
           | In the past 2 or three years, we probably moved more services
           | off the cloud than other way. That said one reason for that
           | is that most new services are build in the cloud, so there
           | are less services off the cloud than on it.
           | 
           | Cloud is best, when you are starting out, when you don't know
           | what you need, need high velocity of adding new stuff, of
           | have very burst like demand for either traffic or cpu etc. Or
           | if you are just small developer only team.
           | 
           | But if you have applications that are relatively stable, are
           | mostly feature complete and you don't expect much sudden
           | growth etc, it's useful to run the numbers if cloud is still
           | something you want/need.
        
       | teaearlgraycold wrote:
       | And people roll their eyes when I say I'm dedicated to AWS.
        
       | lars_francke wrote:
       | We have had this issue in and since 2018
       | https://www.opencore.com/blog/2018/6/cloud-has-a-limit/
       | 
       | That said: We also had this issue on GCP last month.
       | 
       | We found that all three (AWS) are unreliable in their own ways.
        
       | jiggawatts wrote:
       | One of the biggest benefits of k8s is that you can easily mix in
       | pools of different hardware types without a "rebuild".
       | 
       | Something to try in scenarios like this is to add the "weird and
       | wonderful" VM SKUs that are less popular and may still have
       | capacity remaining.
       | 
       | For example, the HPC series like HBv2 or HBv3. Also try Lsv3 or
       | Lasv3.
       | 
       | Sure they're a bit more expensive, but you only have use them
       | until April.
        
       | deathanatos wrote:
       | I've seen this before. I think it was in us-west1, ran out of VMs
       | of the size we used for CI. Had to move to a different region.
       | (Never moved back...)
       | 
       | It is shocking to me that it happened at all. Capacity planning
       | shouldn't be so far behind in a cloud that wants to position it
       | as being on-par with AWS/GCP. (Which Azure absolutely isn't.) To
       | me, having capacity planning be solved is part of what I am
       | paying for in that higher price of the VM.
       | 
       | > _We never thought our startup would be threatened by the
       | unreliability of a company like Microsoft, or that they wouldn't
       | proactively inform us about this._
       | 
       | Oh my sweet summer child, welcome to Azure. Don't depend on them
       | being proactive about anything; even depending on them to _react_
       | is a mistake, e.g., they do not reliably post-mortem severe
       | failures. (At least, externally. But as a customer, I want to
       | know what you 're doing to prevent $massive_failure from
       | happening again, and time and time again they're just silent on
       | that front.)
        
       | ttrrooppeerr wrote:
       | > We never thought our startup would be threatened by the
       | unreliability of a company like Microsoft
       | 
       | You will be threatened by your own unreliability of building
       | something that's dependant on one region or one cloud.
        
         | websap wrote:
         | This is an insidious argument to make. When building a startup
         | you should choose 1 reliable cloud provider and use their best
         | practices to support high availability.
        
           | gtirloni wrote:
           | Cross-region architecture will be the first thing you hear
           | about.
        
           | pclmulqdq wrote:
           | No matter the provider, their best practices all say to be
           | multi-region.
        
             | websap wrote:
             | Def not true with AWS, unless you reach a particular scale.
             | Not for product market fit. My technology choices would be
             | fully managed services so I could focus on my actual
             | business.
        
               | Brian_K_White wrote:
               | def true with everything. what a ridiculous statement.
        
               | bradknowles wrote:
               | Read the "Well Architected" paper. Go multi-region.
        
               | websap wrote:
               | Can you link me to the well architected paper that talks
               | about going multi-region for an early stage startup?
        
               | nixgeek wrote:
               | Unless eCF has made some major advancements in the last
               | couple years, Amazon's own retail business isn't multi-
               | region. So it'd be like the cobbler saying "Buy my
               | shoes!" while wearing none, if AWS were to push everyone
               | hard to have a multi-region strategy.
               | 
               | At least on AWS you typically can find capacity (outside
               | of accelerators) by being flexible on instance types (C,
               | M, R), instance sizes, and availability zones. Sounds
               | like this region OP is in for Azure is constrained such
               | that even this advice doesn't work.
        
             | manv1 wrote:
             | Multi-region in AWS means building it yourself.
             | 
             | I suspect that the skills for real HA are atrophying
             | because for 99% of the people multi-AZ is enough and most
             | of the AWS stuff supports multi-az automagically.
             | 
             | The problem with multi-region is that it means
             | configuration, and there are probably lots of services that
             | you can't actually configure to be multi-region. Cognito is
             | one off the top of my head. It looks like the various
             | aurora flavors do multi-region, but what about Neptune?
             | SQS? API Gateway? AWS Lambda? MediaLive?
             | 
             | Maybe you can hide all that behind DNS failover, maybe you
             | can't.
             | 
             | Real multi-region is basically means going back to old-
             | school HA, and that was hard to do when it was your data
             | centers. On AWS it'll be even harder.
             | 
             | That isn't to say it's not possible, it's just a tremendous
             | amount of work.
             | 
             | I mean really, if us-east-1 is down 80% of the internet is
             | screwed...so from an expectations point of view does HA of
             | your particular service matter if that happens? Even for a
             | financial outages happen.
             | 
             | Once you have enough people it might be worth it. For a non
             | mission critical startup? No fucking way.
        
             | Spivak wrote:
             | Multi-AZ and then grow into multi-region if the need
             | arises. Multi-region is a huge lift the moment all your
             | data must live in two regions simultaneously. Very few
             | shops are experienced enough to run clusters across
             | datacenters in a way that can handle the unhappy paths.
        
               | [deleted]
        
         | janober wrote:
         | Totally agree, we could for sure have build from the get go
         | multi-region and multi-cloud but we had good reasons not to do
         | it. Depending on the product, technology, ... would actually
         | also strongly recommend almost every startup to do the same.
        
           | whimsicalism wrote:
           | Seems bold to recommend everyone do the same as you when you
           | are running in to problems you can't solve because of this
           | exact choice you made.
        
             | janober wrote:
             | I am still 100% sure it was exactly the right decision. Was
             | however in hindsight probably the right one to choose Azure
             | and/or that data center.
        
               | ComputerGuru wrote:
               | If you go under because of this, will you still be 100%
               | sure?
               | 
               | Everything is for sure until it's not.
        
           | PaulHoule wrote:
           | No matter what stage of a service you are at you should have
           | a documented procedure (ideally running a script) that can
           | stand up a working instance of the system.
           | 
           | This has vast benefits for agility and fast development when
           | developers are not always fighting the build system and have
           | a "no fear" attitude about deployment.
           | 
           | If you have that, you can build a system in another region
           | and be able to migrate wholesale to another region with more
           | capacity and not be particularly concerned about the general
           | problem of coordinating the service across multiple regions
           | at the same time.
        
         | manv1 wrote:
         | A startup is about managing risks and spending your time/money
         | appropriately. Your cloud provider running out of capacity
         | isn't an obvious risk especially if it's just capacity for
         | general compute.
         | 
         | For some clouds that seem to be run on a manual process (IBM,
         | Oracle) that would be expected, since they're sort of clunky.
         | For other places (Rackspace, etc) it would uncommon. For a
         | major provider like Azure, well, it's bizarre. I mean, the
         | whole point of cloud is that it's all-you-can-eat.
         | 
         | You would think that this would be something they would
         | advertise/talk about up-front. But who would sign up if that
         | was disclosed?
        
       | HeavyStorm wrote:
       | I'm having trouble getting a instance with GPU in east US, but
       | that's always a problem.
        
       | jakear wrote:
       | People saying "shame on you for not being multi-region" are
       | missing the point: This is a German company with German customers
       | subject to German data residency laws. For them to store German
       | data in a region besides Germany requires getting informed
       | consent from the "data subject", who must be "pre-informed about
       | the potential risks involved in cross-border data transfer". [1]
       | This is why Azure has a dedicated German partition, just as it
       | has a dedicated Chinese partition.
       | 
       | Now, they could go the GDPR/Cookies route and prompt absolutely
       | every user on pageload, but doing so would annihilate the purpose
       | of the law into monotonous smithereens, just as it did with
       | Cookies. Good on them for defaulting to the "more secure" mode,
       | but yes this is a potential consequence.
       | 
       | Happy to hear from any German amigos present if I've got
       | something wrong. (But watch out... you might be putting HN at
       | risk - their servers aren't (likely) in Germany!)
       | 
       | [1]: https://incountry.com/blog/which-german-data-privacy-laws-
       | yo...
        
         | mikerg87 wrote:
         | Also note that the German Azure is in transition - I am not
         | 100% sure the new version of Azure unique to Germany is set up
         | just yet - held by a specific data trustee is ready...
         | 
         | https://learn.microsoft.com/en-us/previous-versions/azure/ge...
        
         | dehrmann wrote:
         | This is a good point, but it's a reminder that a lot of these
         | privacy laws are impractical to deal with. When they're
         | universal, it's one thing, but if you're a medium-sized country
         | trying to flex your legislative might, you're going to make the
         | experience worse for your citizens and businesses.
        
         | pclmulqdq wrote:
         | Time to sign up for AWS or GCP, then. If you're using
         | kubernetes anyway, you'll be fine with the switch.
        
           | dilyevsky wrote:
           | I've had capacity issues before with both gcp and aws in
           | smaller regions so not a panacea
        
           | scarface74 wrote:
           | Says someone who has never done a large migration of any
           | type...
        
             | foobiekr wrote:
             | You should be good to go except for debugging accounts
             | billing monitoring habits documentation security evaluation
             | ...
        
         | solatic wrote:
         | No reason why Azure can't run multiple "regions" within Germany
         | proper. Their current region is in Frankfurt; no reason why
         | they couldn't launch regions in Munich or Hamburg as well. Then
         | German companies could go multi-region while staying fully
         | compliant with German data sovereignty laws.
        
       | whalesalad wrote:
       | > We never thought our startup would be threatened by the
       | unreliability of a company like Microsoft, or that they wouldn't
       | proactively inform us about this.
       | 
       | Yikes, this is totally the first thing you need to come to expect
       | when working with MSFT.
        
         | janober wrote:
         | Probably a good learning for the future ;-)
        
         | scotty79 wrote:
         | When Amazon S3 was a new thing, when I managed to convince my
         | company move to it, when we just moved to serving some of our
         | stuff from S3, first week, Amazon has an outage.
        
       | DannyBee wrote:
       | Most of Europe expects the winter to be quite painful from a
       | power perspective. It would not be surprising if cloud providers
       | (major power users) are being asked to not increase (or even
       | decrease) power usage.
       | 
       | The timeframe they gave would match that kind of ask.
       | 
       | I wonder whether you see the same behavior from other cloud
       | providers there (ie if you ask them whether new capacity is
       | available, what do they say)
        
         | arcturus17 wrote:
         | > It would not be surprising if cloud providers (major power
         | users) are being asked to not increase (or even decrease) power
         | usage.
         | 
         | I doubt it. It will be easier - and probably safer - to ask
         | citizens and physical industry (eg, factories) to bear the
         | brunt than to risk having problems in critical IT
         | infrastructure. Ask people and factories to turn the heat 3
         | degrees down and the effects will be more or less predictable.
         | Asking to shut compute power down at random will have
         | unpredictable consequences.
        
       | plantain wrote:
       | I'm baffled to read stories that suggest Azure is a viable
       | competitor to GCP/AWS - they're an absolute nightmare on
       | capacity.
       | 
       | It took me six months to get approved to start six instances!
       | With multiple escalations including being forcibly changed to
       | invoice billing - for which they never match the invoices
       | automatically, every payment requires we file a ticket.
        
       | AaronFriel wrote:
       | Surprised to see no mention of T-Systems, the subsidiary of
       | Deutsche Telekom, that operates Azure Germany.
        
       ___________________________________________________________________
       (page generated 2022-11-25 23:01 UTC)