[HN Gopher] Farewell to the Era of Cheap EC2 Spot Instances
___________________________________________________________________
Farewell to the Era of Cheap EC2 Spot Instances
Author : ericpauley
Score : 108 points
Date : 2023-05-03 13:30 UTC (2 days ago)
(HTM) web link (pauley.me)
(TXT) w3m dump (pauley.me)
| Zetice wrote:
| Yeah but welcome to the era of steeply discounted savings plans.
| Isn't it like half of the cost of in demand, or something kind of
| crazy?
|
| Had a couple of calls with their "please don't leave us" team and
| they cut our bill more or less in half (more through fixing my
| boneheaded designs though).
| epberry wrote:
| I know at AWS they drill into folks that spot capacity isn't
| guaranteed capacity. I didn't even think about more demand for
| spot reducing supply or additionally AWS just simply not
| investing in as much hardware to accommodate spot because of lack
| of growth. That said, we're seeing leading indicators that this
| trend will reverse in as cloud growth returns,
| https://www.vantage.sh/cloud-cost-report/2023-q1
| ericpauley wrote:
| The q1 on-demand metrics are definitely very interesting!
|
| One possible cause of this trend is actually still consistent
| with spot demand growth in aggregate: when customers deploy
| spot fleets with on-demand backup capacity and all spot prices
| get closer to on-demand, those fleets would become more likely
| to revert to on-demand, rather than a more expensive pool of
| spot instances.
|
| Measuring this would be a great opportunity to leverage the
| vantage point that your team has!
| kenhwang wrote:
| Even on-demand isn't guaranteed capacity. There have been a
| handful of times in my career where I've tried to spin up a lot
| more instances and got met with an out of capacity error.
| There's a special kind of reserved instance for guaranteed
| capacity.
| andrewstuart wrote:
| EDIT: if you really need GPU compute power, go buy a GPU from the
| computer store around the corner - they are cheap, fast and
| available.
|
| I gave up on ec2 when they started requiring you "request for
| quota" to start a gpu instance.
|
| You have to "request for quota" even if you want to run a single
| instance.
|
| You have to specify which specific instance type you want.
|
| You have to specify which region you want it in.
|
| Then you sit back and wait for some human to "approve your
| request".
|
| In my case _this took 24 hours_.
|
| In this time I literally could have walked (not driven) down to
| the local computer store, bought a computer, pushed it back to my
| house in a shopping cart, spent a few hours configuring it and
| still be left with 16 hours to have a sleep, eat and do some
| other things.
|
| AWS quota system is so far from "scalable" and "elastic" that
| it's effectively useless. You can't design any sort of
| infrastructure around that sort of quota system.
|
| I dumped AWS at that point. Mind you, Azure has exactly the same
| quota system.
|
| Seriously, just rent a server from Ionos or Hetzner or get a fast
| Internet connection and self host. It's faster and better and
| cheaper than any of the big clouds.
| toomuchtodo wrote:
| We've come full circle. Cloud providers have turned into
| dedicated server providers.
| andrewstuart wrote:
| Also, I felt that Amazon was playing sketchy little games with
| spot instances.
|
| It's not impossible but quite technically challenging to find
| out how much you are actually paying for a spot instance.
|
| It felt really dodgy to me that I might have started a spot
| instance thinking I was paying the minimum listed rate but
| somewhere amongst the formulas and terms and conditions AWS
| actually decided I would be paying maximum rate.
|
| They don't actually tell you up front what your spot instance
| are costing, no doubt it would not be hard for them to do so,
| but it seems a deliberate strategy to hide this information.
|
| It's just not worth the game playing when self hosting or
| renting a server from Ionos is cheaper and takes away the
| uncertainty.
| ericpauley wrote:
| I think it's tough to blame Amazon here. p4d server hardware
| costs around 100k and they were almost certainly having
| countless hours of use on these by stolen aws accounts and
| credit cards. This doesn't excuse the requirement to specify
| region and instance types in advance. The region part at least
| can be explained by the fact that Amazon tries to make regions
| as independent as possible.
| dylan604 wrote:
| >and still be left with 16 hours to have a sleep, eat and do
| some other things.
|
| I hope somewhere in those 16 hours you bother to return the
| shopping cart. Or are you one of those that just leaves it
| where ever because you can't be bothered? I will be bringing
| this up at the next tenant's meeting.
| my123 wrote:
| > I gave up on ec2 when they started requiring you "request for
| quota" to start a gpu instance.
|
| Sadly, the scourge of crypto mining combined with credit card
| fraud makes it really hard to do otherwise for any sizeable
| hoster. :/
| jacurtis wrote:
| Wow, never realized that that could be a great money
| laundering strategy.
|
| 1. Take cash, load it onto prepaid cards.
|
| 2. Buy GPU instances.
|
| 3. Mine crypto.
|
| 4. Pay for compute with #1
|
| 5. Sell crypto
|
| 6. "Clean" cash
|
| I just don't have the mind of a criminal I guess, it sounds
| like this was figured out a long time ago.
|
| There are (were) enough random kids in thier basements making
| millions on crypto that it probably wouldn't turn too many
| eyebrows either by the IRS if you actually tried to report
| your crypto earnings cleanly, they would likely spend their
| efforts chasing down the people who weren't reporting.
| SXX wrote:
| Fortunatelly your evil ML plan going to fail on #1. It's
| basically impossible to get some cash loaded to prepaid
| cards anywhere. Also AWS and other hosting providers
| wouldn't accept prepaid cards either. Instead you just buy
| crypto with cash P2P and report that as mined or whatever.
|
| On other hand credit card fraud is real problem especially
| due to fact that Amazon don't want to add any KYC burden on
| their customers or try to track down any anomaly spending.
| After all good chunk of AWS profits come out of fact how
| bad people of tracking their cloud costs.
|
| If Amazon to implement built-in option to track suspicious
| jumps of AWS costs then it's not only gonna cut on fraud,
| but on overall AWS profits.
| Negitivefrags wrote:
| Maybe they should let you bypass the quota system if you pay
| in advance via bank transfer then?
| andrewstuart wrote:
| You'd think my > 10 year old AWS account with usage history
| would be enough, but apparently not.
| ecnahc515 wrote:
| Devil's advocate: 10 year old accounts are probably just
| as likely as any to get hacked into and used for crypto
| mining, and honestly I bet a majority of their customers
| don't even use GPU instances.
| ericpauley wrote:
| Yep, Amazon's threat model has less to do with how
| reputable a given account is, since credential theft is
| rampant. Even at massive institutional customers on POs
| I've had to apply for quotas.
| angus-prune wrote:
| One could argue that older accounts are even more likely
| to get hacked, as they are more likely to have older
| passwords that are weaker that may have been leaked along
| the way, along with various other accumulated security
| issues (leaked API keys, out of date 2FA choices etc).
| dragonwriter wrote:
| Google Cloud does this, too, for GPUs on their compute
| instances, but the approval (I assume if there is manual
| approval, there are conditions which trigger it that I missed)
| was near instant in my experience.
| ajb wrote:
| I can understand why though. One employers first use of a GPU
| instance was when we were hacked and someone fired up a few to
| mine crypto, only a day off which cost us a few $1000 that,
| fortunately, AWS refunded. It's quite likely that most users
| don't use them and that it is a good signal that you've been
| hacked.
|
| They have a quota system for sending emails too, and it's not
| because they need to purchase any hw for sending more emails.
| It's because those are also a magnet for hackers.
| jldugger wrote:
| AFAICT, its more that they can't keep GPUs in stock. Even
| large users of GPU have to request quotas in new AZs.
| andrewstuart wrote:
| If AWS was really clear about what you were spending then it
| would be easy to be running an app that tells you the AWS
| usage and alerts you of anomalous patterns.
|
| My bank contacts me if there are any questionable
| transactions.
|
| Surely AWS is capable of this?
| leoxiong wrote:
| AWS Cost Explorer gives a pretty good breakdown. There's
| also Cost Anomaly Detection https://aws.amazon.com/aws-
| cost-management/aws-cost-anomaly-...
| taeric wrote:
| I think you would be shocked at how often banks have
| similar restrictions on corporate accounts. As would you be
| surprised at how often companies are hacked.
| kmeisthax wrote:
| Damn it, there goes my plan of using spot GPU instances to train
| AI cheaply.
| [deleted]
| jeremyjh wrote:
| I was wondering if this could be a contributing factor. There
| is almost unlimited demand for training models - and a lot of
| value to derived from doing so - and it seems like a perfect
| workload for spot.
| Zenst wrote:
| Seems like such jobs would be suited to a batch submit type
| service? Does anybody offer that as does seem a viable area to
| offer if not.
| paulddraper wrote:
| > Spot instances are underpriced, as most tenants underestimate
| their tolerance for preemption or overestimate its likelihood.
|
| > In reality, instance preemption is rare in most instance
| families
|
| I have often seen the following:
|
| 1. The demand for spot instances increases for a period of time.
|
| 2. Supply is inelastic so prices quickly rise to the full bid
| amounts.
|
| 3. Most users are unwilling to go hours without any spot
| instances running, so they use very high bids.
|
| You either have to tolerate hours of "down time" every few
| months, or pay exorbitant prices during once in a while.
|
| On multiple occasions, I've seen the spot instances priced
| significantly higher than on-demand.
|
| EDIT: Apparently AWS changed the Spot pricing a few years ago
| though. My anecdotes are 5 years old.
| ericpauley wrote:
| Interesting. While my data does show this being the case in a
| handful of pools (1-5 out of 13k pools at any given time), it
| appears to occur only exceedingly rarely.
| jacurtis wrote:
| To be fair, its also easier than ever before to use spot
| instances.
|
| If you set up an EKS/K8s cluster and install Karpenter. You can
| configure it very easily to use spot instances anytime prices
| are available for less than on-demand and to use on-demand when
| spot instances are unavailable or too expensive.
|
| You end up never thinking about it but having full
| availability.
|
| In practice it means the ceiling on your bill is the on-demand
| price, but you usually average out much lower.
|
| I suspect as more and more people switch over to this model
| then the use of spot instances will stay closer to full
| saturation, with those discounts becoming negligible.
| vasco wrote:
| > I've seen the spot instances priced significantly higher than
| on-demand
|
| How is the cap price not on-demand cost? Why would you not just
| swap to on-demand at the turning point?
| paulddraper wrote:
| Because you don't want to be pre-empted.
| idunno246 wrote:
| there's a cost to engineer something that automatically
| switches or someone going in and manually changing it. so the
| spot prices has to be higher than ondemand + switching costs.
| the new pricing models(a couple years old) though have mostly
| alleviated this
| jacurtis wrote:
| This shocks me too. When I first had to design a system
| around spot instances I assumed it caps out at the on-demand
| price. But in practice that is very much not the case. The
| spot price can routinely go above on-demand, with all the
| downsides of spot still being applied to the instance.
|
| It is because many systems are only built to support spot.
| When running a workload, they might not want to give up
| availability so they just pay the higher price. To be fair,
| this is a game that AWS engineers. They essentially want to
| kick people off the infrastructure to free it up for other
| uses. So this is a way to signal how badly you don't want to
| be kicked off. A lot of times after a price goes above on-
| demand pricing, it will dip down below fairly quickly as
| workloads all re-adjust. Some companies are willing to play
| that gamble, that a short period above on-demand is still
| cheaper in the long run when you average out the invoice at
| the end of the month.
| psanford wrote:
| What is the reason anyone would pay more for a spot instance
| than an on-demand instance?
| great_psy wrote:
| Maybe poorly architected software, where it's cheaper to
| increase the value of a variable than have a bunch of API
| calls that check the current onDemand price and call
| different APIs depending on which number is smaller.
| sokoloff wrote:
| If an instance on-demand is, for the sake of easy discussion,
| $1.00/hr and the spot rate is _usually_ $0.30 /hr, it might
| make sense to bid $1.25/hr (or $1.26/hr) if you didn't want
| to get pre-empted.
|
| _Most_ hours, you'd pay something close to $0.30 /hr. _Some_
| hours, you'd pay over a $1.00 /he, but you'd save money
| overall against on-demand.
|
| (This ignores Reserved Instances and Savings Plans.)
| milesward wrote:
| The analysis ignores the basics: AWS controls the supply. At some
| point there's a limit to that control via supply chain (meaning
| they wouldn't be able to make it cheaper when they want to) but
| if the number is going up, it's pretty likely that's totally
| intentional.
| moomoo11 wrote:
| I stopped using cloud hosting until _it is actually necessary_
|
| I have two home workstations that are collecting dust. I have 6
| and 16 core CPUs with HT hooked up to 32 and 128gb ram.
|
| I figured out how to make my dynamic IP still serve requests
| accounting for IP changes using a background script. If someone
| has a better solution I'd appreciate that.
|
| Both these machines are more than enough to do everything I could
| dream of building and they host dozens of my little apps and
| services that I use for myself.
|
| I will only use cloud when necessary for commercial product and
| services.
| austinshea wrote:
| This seems to ignore the fact that the prices are consistently
| low if you are tracking the lesser used instance types/regions,
| even when talking explicitly within AWS.
|
| Why would that change?
| ericpauley wrote:
| It's tough to know for sure. Many of the instance types in
| these regions are at the price floor, so there could be
| increases in aggregate demand that just don't show up because
| the DCs are overbuilt. These markets are thinner so if demand
| shifts to them (big if because moving regions is hard) we might
| quickly see price spikes.
| cyberax wrote:
| Spot prices don't directly reflect capacity. EC2 Spot used to
| be a real spot market, with actual real auctions where higher
| bids displaced lower bids.
|
| But it was changed a while ago, so prices are now set
| algorithmically based on predicted demand/supply.
|
| We run stateless calculations on EC2 across regions, and we
| definitely see that instances are harder to come by. Especially
| instances with GPUs. And for many instance types, the price
| advantage of EC2 Spot compared to committed spending is not
| significant anymore.
| MichaelZuo wrote:
| Huh, do you remember what the rationale for the change was?
___________________________________________________________________
(page generated 2023-05-05 23:00 UTC)