[HN Gopher] Farewell to the Era of Cheap EC2 Spot Instances
       ___________________________________________________________________
        
       Farewell to the Era of Cheap EC2 Spot Instances
        
       Author : ericpauley
       Score  : 108 points
       Date   : 2023-05-03 13:30 UTC (2 days ago)
        
 (HTM) web link (pauley.me)
 (TXT) w3m dump (pauley.me)
        
       | Zetice wrote:
       | Yeah but welcome to the era of steeply discounted savings plans.
       | Isn't it like half of the cost of in demand, or something kind of
       | crazy?
       | 
       | Had a couple of calls with their "please don't leave us" team and
       | they cut our bill more or less in half (more through fixing my
       | boneheaded designs though).
        
       | epberry wrote:
       | I know at AWS they drill into folks that spot capacity isn't
       | guaranteed capacity. I didn't even think about more demand for
       | spot reducing supply or additionally AWS just simply not
       | investing in as much hardware to accommodate spot because of lack
       | of growth. That said, we're seeing leading indicators that this
       | trend will reverse in as cloud growth returns,
       | https://www.vantage.sh/cloud-cost-report/2023-q1
        
         | ericpauley wrote:
         | The q1 on-demand metrics are definitely very interesting!
         | 
         | One possible cause of this trend is actually still consistent
         | with spot demand growth in aggregate: when customers deploy
         | spot fleets with on-demand backup capacity and all spot prices
         | get closer to on-demand, those fleets would become more likely
         | to revert to on-demand, rather than a more expensive pool of
         | spot instances.
         | 
         | Measuring this would be a great opportunity to leverage the
         | vantage point that your team has!
        
         | kenhwang wrote:
         | Even on-demand isn't guaranteed capacity. There have been a
         | handful of times in my career where I've tried to spin up a lot
         | more instances and got met with an out of capacity error.
         | There's a special kind of reserved instance for guaranteed
         | capacity.
        
       | andrewstuart wrote:
       | EDIT: if you really need GPU compute power, go buy a GPU from the
       | computer store around the corner - they are cheap, fast and
       | available.
       | 
       | I gave up on ec2 when they started requiring you "request for
       | quota" to start a gpu instance.
       | 
       | You have to "request for quota" even if you want to run a single
       | instance.
       | 
       | You have to specify which specific instance type you want.
       | 
       | You have to specify which region you want it in.
       | 
       | Then you sit back and wait for some human to "approve your
       | request".
       | 
       | In my case _this took 24 hours_.
       | 
       | In this time I literally could have walked (not driven) down to
       | the local computer store, bought a computer, pushed it back to my
       | house in a shopping cart, spent a few hours configuring it and
       | still be left with 16 hours to have a sleep, eat and do some
       | other things.
       | 
       | AWS quota system is so far from "scalable" and "elastic" that
       | it's effectively useless. You can't design any sort of
       | infrastructure around that sort of quota system.
       | 
       | I dumped AWS at that point. Mind you, Azure has exactly the same
       | quota system.
       | 
       | Seriously, just rent a server from Ionos or Hetzner or get a fast
       | Internet connection and self host. It's faster and better and
       | cheaper than any of the big clouds.
        
         | toomuchtodo wrote:
         | We've come full circle. Cloud providers have turned into
         | dedicated server providers.
        
         | andrewstuart wrote:
         | Also, I felt that Amazon was playing sketchy little games with
         | spot instances.
         | 
         | It's not impossible but quite technically challenging to find
         | out how much you are actually paying for a spot instance.
         | 
         | It felt really dodgy to me that I might have started a spot
         | instance thinking I was paying the minimum listed rate but
         | somewhere amongst the formulas and terms and conditions AWS
         | actually decided I would be paying maximum rate.
         | 
         | They don't actually tell you up front what your spot instance
         | are costing, no doubt it would not be hard for them to do so,
         | but it seems a deliberate strategy to hide this information.
         | 
         | It's just not worth the game playing when self hosting or
         | renting a server from Ionos is cheaper and takes away the
         | uncertainty.
        
         | ericpauley wrote:
         | I think it's tough to blame Amazon here. p4d server hardware
         | costs around 100k and they were almost certainly having
         | countless hours of use on these by stolen aws accounts and
         | credit cards. This doesn't excuse the requirement to specify
         | region and instance types in advance. The region part at least
         | can be explained by the fact that Amazon tries to make regions
         | as independent as possible.
        
         | dylan604 wrote:
         | >and still be left with 16 hours to have a sleep, eat and do
         | some other things.
         | 
         | I hope somewhere in those 16 hours you bother to return the
         | shopping cart. Or are you one of those that just leaves it
         | where ever because you can't be bothered? I will be bringing
         | this up at the next tenant's meeting.
        
         | my123 wrote:
         | > I gave up on ec2 when they started requiring you "request for
         | quota" to start a gpu instance.
         | 
         | Sadly, the scourge of crypto mining combined with credit card
         | fraud makes it really hard to do otherwise for any sizeable
         | hoster. :/
        
           | jacurtis wrote:
           | Wow, never realized that that could be a great money
           | laundering strategy.
           | 
           | 1. Take cash, load it onto prepaid cards.
           | 
           | 2. Buy GPU instances.
           | 
           | 3. Mine crypto.
           | 
           | 4. Pay for compute with #1
           | 
           | 5. Sell crypto
           | 
           | 6. "Clean" cash
           | 
           | I just don't have the mind of a criminal I guess, it sounds
           | like this was figured out a long time ago.
           | 
           | There are (were) enough random kids in thier basements making
           | millions on crypto that it probably wouldn't turn too many
           | eyebrows either by the IRS if you actually tried to report
           | your crypto earnings cleanly, they would likely spend their
           | efforts chasing down the people who weren't reporting.
        
             | SXX wrote:
             | Fortunatelly your evil ML plan going to fail on #1. It's
             | basically impossible to get some cash loaded to prepaid
             | cards anywhere. Also AWS and other hosting providers
             | wouldn't accept prepaid cards either. Instead you just buy
             | crypto with cash P2P and report that as mined or whatever.
             | 
             | On other hand credit card fraud is real problem especially
             | due to fact that Amazon don't want to add any KYC burden on
             | their customers or try to track down any anomaly spending.
             | After all good chunk of AWS profits come out of fact how
             | bad people of tracking their cloud costs.
             | 
             | If Amazon to implement built-in option to track suspicious
             | jumps of AWS costs then it's not only gonna cut on fraud,
             | but on overall AWS profits.
        
           | Negitivefrags wrote:
           | Maybe they should let you bypass the quota system if you pay
           | in advance via bank transfer then?
        
             | andrewstuart wrote:
             | You'd think my > 10 year old AWS account with usage history
             | would be enough, but apparently not.
        
               | ecnahc515 wrote:
               | Devil's advocate: 10 year old accounts are probably just
               | as likely as any to get hacked into and used for crypto
               | mining, and honestly I bet a majority of their customers
               | don't even use GPU instances.
        
               | ericpauley wrote:
               | Yep, Amazon's threat model has less to do with how
               | reputable a given account is, since credential theft is
               | rampant. Even at massive institutional customers on POs
               | I've had to apply for quotas.
        
               | angus-prune wrote:
               | One could argue that older accounts are even more likely
               | to get hacked, as they are more likely to have older
               | passwords that are weaker that may have been leaked along
               | the way, along with various other accumulated security
               | issues (leaked API keys, out of date 2FA choices etc).
        
         | dragonwriter wrote:
         | Google Cloud does this, too, for GPUs on their compute
         | instances, but the approval (I assume if there is manual
         | approval, there are conditions which trigger it that I missed)
         | was near instant in my experience.
        
         | ajb wrote:
         | I can understand why though. One employers first use of a GPU
         | instance was when we were hacked and someone fired up a few to
         | mine crypto, only a day off which cost us a few $1000 that,
         | fortunately, AWS refunded. It's quite likely that most users
         | don't use them and that it is a good signal that you've been
         | hacked.
         | 
         | They have a quota system for sending emails too, and it's not
         | because they need to purchase any hw for sending more emails.
         | It's because those are also a magnet for hackers.
        
           | jldugger wrote:
           | AFAICT, its more that they can't keep GPUs in stock. Even
           | large users of GPU have to request quotas in new AZs.
        
           | andrewstuart wrote:
           | If AWS was really clear about what you were spending then it
           | would be easy to be running an app that tells you the AWS
           | usage and alerts you of anomalous patterns.
           | 
           | My bank contacts me if there are any questionable
           | transactions.
           | 
           | Surely AWS is capable of this?
        
             | leoxiong wrote:
             | AWS Cost Explorer gives a pretty good breakdown. There's
             | also Cost Anomaly Detection https://aws.amazon.com/aws-
             | cost-management/aws-cost-anomaly-...
        
             | taeric wrote:
             | I think you would be shocked at how often banks have
             | similar restrictions on corporate accounts. As would you be
             | surprised at how often companies are hacked.
        
       | kmeisthax wrote:
       | Damn it, there goes my plan of using spot GPU instances to train
       | AI cheaply.
        
         | [deleted]
        
         | jeremyjh wrote:
         | I was wondering if this could be a contributing factor. There
         | is almost unlimited demand for training models - and a lot of
         | value to derived from doing so - and it seems like a perfect
         | workload for spot.
        
         | Zenst wrote:
         | Seems like such jobs would be suited to a batch submit type
         | service? Does anybody offer that as does seem a viable area to
         | offer if not.
        
       | paulddraper wrote:
       | > Spot instances are underpriced, as most tenants underestimate
       | their tolerance for preemption or overestimate its likelihood.
       | 
       | > In reality, instance preemption is rare in most instance
       | families
       | 
       | I have often seen the following:
       | 
       | 1. The demand for spot instances increases for a period of time.
       | 
       | 2. Supply is inelastic so prices quickly rise to the full bid
       | amounts.
       | 
       | 3. Most users are unwilling to go hours without any spot
       | instances running, so they use very high bids.
       | 
       | You either have to tolerate hours of "down time" every few
       | months, or pay exorbitant prices during once in a while.
       | 
       | On multiple occasions, I've seen the spot instances priced
       | significantly higher than on-demand.
       | 
       | EDIT: Apparently AWS changed the Spot pricing a few years ago
       | though. My anecdotes are 5 years old.
        
         | ericpauley wrote:
         | Interesting. While my data does show this being the case in a
         | handful of pools (1-5 out of 13k pools at any given time), it
         | appears to occur only exceedingly rarely.
        
         | jacurtis wrote:
         | To be fair, its also easier than ever before to use spot
         | instances.
         | 
         | If you set up an EKS/K8s cluster and install Karpenter. You can
         | configure it very easily to use spot instances anytime prices
         | are available for less than on-demand and to use on-demand when
         | spot instances are unavailable or too expensive.
         | 
         | You end up never thinking about it but having full
         | availability.
         | 
         | In practice it means the ceiling on your bill is the on-demand
         | price, but you usually average out much lower.
         | 
         | I suspect as more and more people switch over to this model
         | then the use of spot instances will stay closer to full
         | saturation, with those discounts becoming negligible.
        
         | vasco wrote:
         | > I've seen the spot instances priced significantly higher than
         | on-demand
         | 
         | How is the cap price not on-demand cost? Why would you not just
         | swap to on-demand at the turning point?
        
           | paulddraper wrote:
           | Because you don't want to be pre-empted.
        
           | idunno246 wrote:
           | there's a cost to engineer something that automatically
           | switches or someone going in and manually changing it. so the
           | spot prices has to be higher than ondemand + switching costs.
           | the new pricing models(a couple years old) though have mostly
           | alleviated this
        
           | jacurtis wrote:
           | This shocks me too. When I first had to design a system
           | around spot instances I assumed it caps out at the on-demand
           | price. But in practice that is very much not the case. The
           | spot price can routinely go above on-demand, with all the
           | downsides of spot still being applied to the instance.
           | 
           | It is because many systems are only built to support spot.
           | When running a workload, they might not want to give up
           | availability so they just pay the higher price. To be fair,
           | this is a game that AWS engineers. They essentially want to
           | kick people off the infrastructure to free it up for other
           | uses. So this is a way to signal how badly you don't want to
           | be kicked off. A lot of times after a price goes above on-
           | demand pricing, it will dip down below fairly quickly as
           | workloads all re-adjust. Some companies are willing to play
           | that gamble, that a short period above on-demand is still
           | cheaper in the long run when you average out the invoice at
           | the end of the month.
        
         | psanford wrote:
         | What is the reason anyone would pay more for a spot instance
         | than an on-demand instance?
        
           | great_psy wrote:
           | Maybe poorly architected software, where it's cheaper to
           | increase the value of a variable than have a bunch of API
           | calls that check the current onDemand price and call
           | different APIs depending on which number is smaller.
        
           | sokoloff wrote:
           | If an instance on-demand is, for the sake of easy discussion,
           | $1.00/hr and the spot rate is _usually_ $0.30 /hr, it might
           | make sense to bid $1.25/hr (or $1.26/hr) if you didn't want
           | to get pre-empted.
           | 
           |  _Most_ hours, you'd pay something close to $0.30 /hr. _Some_
           | hours, you'd pay over a $1.00 /he, but you'd save money
           | overall against on-demand.
           | 
           | (This ignores Reserved Instances and Savings Plans.)
        
       | milesward wrote:
       | The analysis ignores the basics: AWS controls the supply. At some
       | point there's a limit to that control via supply chain (meaning
       | they wouldn't be able to make it cheaper when they want to) but
       | if the number is going up, it's pretty likely that's totally
       | intentional.
        
       | moomoo11 wrote:
       | I stopped using cloud hosting until _it is actually necessary_
       | 
       | I have two home workstations that are collecting dust. I have 6
       | and 16 core CPUs with HT hooked up to 32 and 128gb ram.
       | 
       | I figured out how to make my dynamic IP still serve requests
       | accounting for IP changes using a background script. If someone
       | has a better solution I'd appreciate that.
       | 
       | Both these machines are more than enough to do everything I could
       | dream of building and they host dozens of my little apps and
       | services that I use for myself.
       | 
       | I will only use cloud when necessary for commercial product and
       | services.
        
       | austinshea wrote:
       | This seems to ignore the fact that the prices are consistently
       | low if you are tracking the lesser used instance types/regions,
       | even when talking explicitly within AWS.
       | 
       | Why would that change?
        
         | ericpauley wrote:
         | It's tough to know for sure. Many of the instance types in
         | these regions are at the price floor, so there could be
         | increases in aggregate demand that just don't show up because
         | the DCs are overbuilt. These markets are thinner so if demand
         | shifts to them (big if because moving regions is hard) we might
         | quickly see price spikes.
        
         | cyberax wrote:
         | Spot prices don't directly reflect capacity. EC2 Spot used to
         | be a real spot market, with actual real auctions where higher
         | bids displaced lower bids.
         | 
         | But it was changed a while ago, so prices are now set
         | algorithmically based on predicted demand/supply.
         | 
         | We run stateless calculations on EC2 across regions, and we
         | definitely see that instances are harder to come by. Especially
         | instances with GPUs. And for many instance types, the price
         | advantage of EC2 Spot compared to committed spending is not
         | significant anymore.
        
           | MichaelZuo wrote:
           | Huh, do you remember what the rationale for the change was?
        
       ___________________________________________________________________
       (page generated 2023-05-05 23:00 UTC)