[HN Gopher] Mistakes I've Made in AWS
       ___________________________________________________________________
        
       Mistakes I've Made in AWS
        
       Author : aoms
       Score  : 306 points
       Date   : 2021-09-11 08:12 UTC (14 hours ago)
        
 (HTM) web link (laravel-news.com)
 (TXT) w3m dump (laravel-news.com)
        
       | defaultname wrote:
       | On a price sensitive project I almost exclusively used spot
       | instances at a _dramatically_ reduced price over on-demand. It
       | forced me to built high availability elements into the design at
       | the outside, though ultimately spot instances got shut down no
       | less frequently than my experience with on demand maintenance and
       | individual machine outages.
       | 
       | Obviously mileage will vary, but going in I was under the
       | impression that spot instances were on the knife's edge, when
       | with a decent pricing strategy they're as robust as on demand at
       | a fraction of the cost.
        
         | doomslice wrote:
         | We use GCPs equivalent of spot instances (preemptibles) to
         | great effect as well. It actually works better at larger scale
         | since a smaller % of your machines get preempted at a given
         | time.
        
         | noogle wrote:
         | Spot instances for GPU are shutdown within hours. As frequently
         | touted in favor of AWS, engineer time is the most important
         | thing. The time to adapt the code to frequent failure, and the
         | delays in getting the results, costs money as well, negating
         | the financial saving from spot instances.
        
           | defaultname wrote:
           | Designing to remain robust in the face of failures is
           | compulsory for any project of any significance. Or at least
           | it should be, though a lot of projects go on a wing and a
           | prayer that nothing will go awry and "save" those engineering
           | hours until a catastrophe at some future point. It basically
           | just prioritized what already should be a priority.
           | 
           | I have no doubt that fringe/niche instances have more
           | competitive spot behaviors, though how you set your bid range
           | dramatically impacts how you survive through competition, but
           | I had vanilla instances last for literal _years_ (note that
           | by default the spot requisition has a lifespan of one year so
           | you have to modify that) at per hour pricing somewhere in the
           | range of 1 /5th on demand.
           | 
           | But mileage will vary.
           | 
           | I don't use those spot instances anymore as my projects are
           | much better financed now, and I have significant compute on
           | other platforms including bare metal in colocation
           | facilities. However when I did I stayed silent about it,
           | feeling almost like it was a secret that would be ruined if
           | others knew about it.
        
             | noogle wrote:
             | There it is - the hidden cost of AWS. For bare-metal the
             | risk of hardware failure is so low that it's faster to just
             | handle the interruption when it happens (e.g. just restart
             | the process) than to implement interruption tolerance. The
             | hardware fails only once in many years. The chances of that
             | happening during the 24 hours we train a model are almost
             | zero. On a spot instances, the risk of the same are almost
             | 100%, requiring investment up-front.
             | 
             | For the price of a spot instance we can get an always-on
             | bare-metal server without having to worry for how long it
             | will remain available.
        
       | danjac wrote:
       | I've made it a habit to absolutely avoid any and all AWS services
       | for any side projects, unless it's on the employer's dime. I'd
       | rather pay a bit more per month for a flat-fee Digital Ocean
       | droplet. Maybe I'll end up paying a few dollars more than I would
       | with the equivalent AWS setup, but I'll rest easy knowing I won't
       | get a surprise bill thanks to the opaque and byzantine billing. I
       | mean, there are consultancies whose entire premise is expertise
       | on AWS billing, so the chance of AWS newbie-me running up many
       | thousands because I forgot to switch off service A or had the
       | wrong setting for service B is non-zero.
       | 
       | And the general advice is "don't worry, call their customer
       | support and they'll refund you". Um, seriously? If I want to
       | spend a morning on hold to deal with a huge unplanned bill I'll
       | call my local tax office, thank you.
       | 
       | Which sucks as I learn best by building things in my spare time,
       | but AWS makes that learning process a bit more stressful than I'd
       | prefer.
        
         | tsss wrote:
         | Who told you to call their customer support for a refund? AWS
         | (and other cloud vendors) practically never refund. They will
         | give out credits for their platform but that won't help you
         | much as a private individual who just lost hundreds of dollars.
        
           | nucleardog wrote:
           | > Who told you to call their customer support for a refund?
           | AWS (and other cloud vendors) practically never refund.
           | 
           | I've gotten refunds about a half a dozen times now. Every
           | time I've asked. One was for over a hundred thousand dollars.
           | 
           | I've never paid for support, I have no contacts within the
           | company (like are often necessary at places like Google), I
           | literally just put in a support ticket asking for a refund
           | and got a refund.
           | 
           | They usually require some documentation/explanation of how
           | you're going to avoid making the same mistake again (which is
           | fair), but otherwise have been very cooperative.
        
           | ratww wrote:
           | That's kind of a meme in HN and Reddit: there were a few
           | public occasions where users were refunded and people now
           | just assume AWS will also refund for every instance.
        
             | scrollaway wrote:
             | I've never had a refund request rejected myself, and I've
             | made multiple mistakes over multiple accounts. Even things
             | such as "Hey, I forgot to turn off this ec2, i wanted to
             | destroy it, any chance for a refund?"
        
           | sofixa wrote:
           | They _always_ refund mistakes, of private individuals or
           | companies.
        
             | isbvhodnvemrwvn wrote:
             | The first ones anyway. After that, not really.
        
         | mathnmusic wrote:
         | I was recently forced to migrate my hobby FOSS project the
         | other way: from DigitalOcean to AWS. The primary reason being:
         | a generous quota of 60,000 emails per month to send via SES.
         | Most mail providers give only up to 3,000 to 6,000 emails per
         | month.
        
           | lamnk wrote:
           | You can host your project on DO and connect to SES to send
           | emails. Why do you have to move the complte project over?
        
             | mhitza wrote:
             | Those 60k free emails only apply when SES is invoked from
             | an EC2 instance or Elastic Beanstalk
        
               | basmango wrote:
               | I think they mean having a seperate small service just
               | for mailing on beanstalk.
        
           | BackBlast wrote:
           | You can use ses without moving your digital ocean server.
        
             | mhitza wrote:
             | From the AWS SES free tier fine print
             | 
             | > 62,000 Outbound Messages per month to any recipient when
             | you call Amazon SES from an Amazon EC2 instance directly or
             | through AWS Elastic Beanstalk.
        
               | TriNetra wrote:
               | wow, they are really tracking from where you're calling
               | the API to give credit, first of its kind I've heard :)
        
               | BackBlast wrote:
               | Good eye.
        
         | mattmanser wrote:
         | In terms for bang for your buck, DO is much cheaper.
         | 
         | Yeah you can get a cheaper AWS server, but it's a much lower
         | performance one.
        
           | marcosdumay wrote:
           | From what I can see, it's not a matter of bang for your buck,
           | what matters is that AWS scales lower than a DO, so if your
           | are not fully using your VPS, AWS is cheaper.
           | 
           | Of course, I side with the GP here, it's just not worth the
           | risk. I could save a bit by switching my VPS too, but I
           | won't.
        
         | [deleted]
        
         | bsd44 wrote:
         | I would avoid DO and similar McDonald's-type cloud providers
         | for anything production. Commercial or private.
         | 
         | AWS might make it difficult to figure out the cost (most common
         | complaint) but the services are professional grade and their
         | support is as well. DO on the other hand provided me an
         | instance with an IP that was on a public blacklist and banned
         | my account within 5min of spawning an instance with the
         | explanation that "it was compromised and hacking" failing to
         | accept that they provided me with the OS image and the public
         | IP. Took me two months of arguing to get the account unblocked
         | and balance withdrawn. Lesson learned; you get what you pay
         | for. Back to AWS for me.
        
         | tomxor wrote:
         | Pretty much summarises my decision to use Linode, at a small
         | company AWS presents a bigger monetary risk and drain on
         | precious developer time and mental overhead than relatively
         | small savings it might return at smaller scales...
         | 
         | I also actually like Linode as a company and enjoy using their
         | services and management interface; Amazon is challenging to be
         | positive about.
        
           | wly_cdgr wrote:
           | Also use Linode, they're great. Their docs are a treasure
           | 
           | Seems absolutely insane to use AWS for small
           | personal/learning projects (unless the goal is to learn AWS
           | for career purposes, I guess). It'd be like using Unreal
           | Engine to make your 2d indie game
           | 
           | Always use the smallest and simplest solution that'll do the
           | job. Simple solutions are not just as good for simple
           | jobs...they're better
        
             | remram wrote:
             | > It'd be like using Unreal Engine to make your 2d indie
             | game
             | 
             | Except that would be free.
        
               | wly_cdgr wrote:
               | Heh, true. So, even worse!
        
         | VadimPR wrote:
         | You can set a global budget cap to avoid this kind of concern.
         | That said, I've also blown the budget cap by accident - so
         | agree with you on the DO.
        
           | triska wrote:
           | What good is a budget cap that can be "blown by accident"?
           | From a budget cap, I expect the key invariant that it
           | reliably _caps the budget_.
        
         | id5j1ynz wrote:
         | > I mean, there are consultancies whose entire premise is
         | expertise on AWS billing, so the chance of AWS newbie-me
         | running up many thousands because I forgot to switch off
         | service A or had the wrong setting for service B is non-zero.
         | 
         | That line of reasoning is wrong. I'm sure there are
         | consultancies that specialize in office stationery procurement;
         | doesn't mean anything for your small use case of buying a few
         | pens for your home office.
        
           | imadethis wrote:
           | There's no chance I accidentally buy $10,000 worth of
           | staplers when I walk into Office Depot though, while the
           | opposite is extremely easy in AWS. Plus, when I checkout I'll
           | get an itemized total of what I owe before I pay, I won't be
           | charged an unclear amount at a future date.
        
         | [deleted]
        
         | [deleted]
        
         | mrtksn wrote:
         | That must be some kind of "trick of the trade" because Firebase
         | originally had feature for limiting the bill with a hard cap
         | and they even have videos explaining how to use it however it
         | was removed later on. Now they suggest building a script that
         | monitors your bill and nukes the project if something happens.
         | The catch? Billing is not real time.
        
         | sokoloff wrote:
         | IMO, AWS isn't competing on cheaper-in-dollars-per-byte but
         | rather in faster and cheaper for your engineering team. If your
         | engineering team is free (as you might decide your time is on a
         | hobby/side project), it's harder to make the case (I still run
         | my side projects there though), but when they can make the ops
         | team half wearing AWS badges, that offsets a lot of lone-item
         | markups.
        
           | noogle wrote:
           | But it doesn't save much engineering time. You now need to
           | manage those services, and the high mark-up means you need to
           | invest effort in scaling things up/down.
           | 
           | Yes, it took me about a week to learn to set-up a Postgresql
           | high availability cluster. But now it saves me $4,000 per
           | MONTH for each of our 10 databases.
           | 
           | And if you are using EC2 instances, AWS saves very little
           | effort compared to bare metal.
        
             | Daishiman wrote:
             | So did you send your database logs to a centralized logging
             | system? Did you set up roles and keys to access those
             | systems? Are your roles integrated with the rest of your
             | permissions system? Do you have a perf dashboard where you
             | can see the real-time usage of your DBs? Have you already
             | rehearsed updating your database version?
             | 
             | The thing is that it's totally possible that you learned
             | everything needed to set up the cluster, in my experience
             | most database systems that aren't set up by a professional
             | DBA will sooner or later hit a configuration or maintenance
             | snag. Once that happens, you're pretty much totally on your
             | own and for critical systems that downtime is going to cost
             | you more than the costs of your infra.
             | 
             | So you either need to have a DBA on retainer if you're
             | serious about data integrity, or you pay management costs
             | which means your system was set by literally the most
             | expert people on the planet in the area.
             | 
             | If you're running a cluster where performance is the
             | highest priority and downtime and maintenance isn't a huge
             | issue because you have a nice decent maintenance window and
             | enough dev cycles to spend on staying up to date, for sure,
             | go for it.
             | 
             | But in my experience, if you care more about a system
             | staying up, good managed infra is so much more reliable
             | that it's not even a question.
        
           | rualca wrote:
           | > If your engineering team is free (..), it's harder to make
           | the case (...), but when they can make the ops team half
           | wearing AWS badges, that offsets a lot of lone-item markups.
           | 
           | I have to call bullshit on this claim.
           | 
           | Let's look at the facts. With AWS there are only two
           | scenarios: either you go with the classic "VMs provided by a
           | cloud provider" which is represented by EC2, or you go with
           | hosted services and higher level abstractions like AWS's
           | serverless offerings.
           | 
           | Regarding the EC2, AWS offers absolutely no operational
           | advantage over any other cloud provider, at the expense of
           | being far more expensive. Also, CloudFormation/CDK is
           | arguably far worse and outright developer-hostile than any
           | configuration-as-code alternative. This comparison makes even
           | less sense if we look into AWS' containerization offering,
           | which is either half-baked (ECS) or an afterthought that lags
           | behind alternatives (EKS).
           | 
           | Then we have the higher level abstractions of AWS' managed
           | services and serverless options. Price-gouging runs rampant
           | on this domain, and arguably demands much more training and
           | man-hours to become effective at running production services
           | when compared with just running your own services. This
           | scenario entails higher costs and the only arguments that any
           | ops team can muster revolve around sunk cost and vendor lock-
           | in.
        
             | sokoloff wrote:
             | The price gouging services make sense if they avoid you
             | having to hire additional employees. That's the benefit of
             | any managed service provider: they can run it more
             | efficiently than you can (once you add all the people,
             | supervisors of those people, people to cover when those
             | people are on vacation, etc).
             | 
             | It's a way to shift people costs to IT operational
             | expenses. If you don't do that, it's more expensive. If you
             | do, it can easily be less expensive. I'm pretty sure we're
             | at the point where it's less expensive because developers
             | are waiting minutes rather than weeks [or more] for TechOps
             | actions to happen (we were on-prem previously [and I ran
             | TechOps]). That saves time and changes the way you think
             | about TechOps changes. If they're lengthy, you make choices
             | that avoid changes in TechOps. If they're fast, you make
             | choices that make the most sense for the product and
             | customer.
        
               | rualca wrote:
               | > The price gouging services make sense if they avoid you
               | having to hire additional employees.
               | 
               | But the fact is that it doesn't. It's another service
               | that needs training/experience to develop and operate.
               | Arguing about these hypothetical savings is just a veiled
               | appeal to the sunk cost fallacy and vendor lock-in I've
               | mentioned.
               | 
               | > I'm pretty sure we're at the point where it's less
               | expensive because developers are waiting minutes rather
               | than weeks [or more] for TechOps actions to happen (we
               | were on-prem previously [and I ran TechOps]).
               | 
               | I'm not sure this scenario is remotely realistic for the
               | past decade or so, specially after the inception of
               | containerization. Even in bare metal deployments anyone
               | can get multiple databases configured and going in a
               | matter of minutes.
        
               | sokoloff wrote:
               | Containers don't get you more host machines racked or
               | more disk shelves added to the SAN. On-prem, it solves
               | configuration within a (nearly) fixed scale which, if
               | that's your only problem, is great. If you're in your own
               | DC/colo, there's more advantages to moving out than
               | containers can provide alone.
               | 
               | I literally can't afford to invest to the level that AWS
               | can to run operations. AWS bandwidth is incredibly
               | expensive right up until the point where you or a
               | neighbor is getting a DDoS attack that Amazon just
               | "handles" for you. My customers don't care where we're
               | hosted and won't pay extra for either on-prem or cloud-
               | hosted. They just want it to be up and transparent. For
               | us, AWS is cheaper/faster all-in. That's not true for
               | everyone and, if it's not, please don't use it.
        
               | rualca wrote:
               | > Containers don't get you more host machines racked or
               | more disk shelves added to the SAN.
               | 
               | That's immaterial to the discussion, and reads like a
               | non-sequitur. You want a service. You deploy the service
               | in your infrastructure. If necessary, you scale your
               | infrastructure to meet demand. That's it. If you want to
               | spin up a database instance, just do it. With
               | containerization that takes between minutes and seconds.
               | 
               | And to drive the point home, in case you're not aware,
               | AWS is not the only cloud provider that offers horizontal
               | autoscaling. Some small providers even sell it out of the
               | box, both through their Kubernetes offerings and/or
               | through their own APIs.
               | 
               | Also, the sales brochure for managed services mentions
               | scalability and reliability, and in the case of AWS also
               | global deployments, but the truth of the matter is that
               | it costs a hefty premium and in most cases it's totally
               | irrelevant.
               | 
               | So, pointing out databases in practical terms means close
               | to nothing.
               | 
               | > I literally can't afford to invest to the level that
               | AWS can to run operations.
               | 
               | And that's perfectly fine because a) AWS really is not
               | full proof (see the latest outage of AWS's US-WEST-2
               | region which might have single-handedly dropped AWS's
               | reliability to only 99.5 this year), b) operating your
               | own infra already gets you plenty of 9s easily, c) the
               | theoretical difference in the 9s you get and the 9s that
               | are advertised by AWS is more often than not totally
               | irrelevant to the usecases you need to meet.
               | 
               | To sum it up, you may argue all you want about how AWS's
               | Rolls Royce is far superior than any car in the market,
               | but the truth of the matter is that the vast majority has
               | all their needs decisively met and even surpassed by
               | running any other cloud provider's Ford hatchback.
        
           | ratww wrote:
           | _> faster and cheaper for your engineering team._
           | 
           | I really wonder how true that is. Sure, for things like S3 or
           | RDS it's indeed easier, but for most other things I find AWS
           | either very limiting or extremely arcane.
           | 
           | Even "simple" things like Lambda underdeliver, just this week
           | we run into problems using it with VPC, for example.
           | ElasticBeanstalk was another one, fine-ish for simplistic
           | things but problematic with the smallest customization, also
           | lots of undocumented and undebuggable quirks, like breaking
           | if you use UTF-8 characters in your commit messages, for
           | example.
           | 
           | Of course, we now have the problem where some people, both
           | seniors and juniors, _only_ know or only ever worked with
           | AWS, which makes the assertion that it is "faster and
           | cheaper" correct, but is worrisome, as lots of people are not
           | being taught what used to be the basics 10 or 20 years ago.
        
             | zrail wrote:
             | Curious what your issue was with Lambda and VPCs. We use
             | that combo all the time where I work and it's fine as long
             | as you have the IAM roles correct.
        
               | selfhoster11 wrote:
               | Lambda had some gotchas around the warm-up time the last
               | time I used it to implement something. We had to have
               | some extra workarounds to prevent the functions from
               | going "cold".
        
               | Ancapistani wrote:
               | Ah, that's the issue, at least for me - IAM roles are
               | pretty nuanced, and it's difficult to understand all
               | that's happening.
               | 
               | I'm working extensively with EventBridge now, and their
               | "security" docs mix "what can access EventBridge" with
               | "what EventBridge can access". Also, different AWS
               | services all seem to have different requirements - e.g,
               | some are role-based, some are service-based, and some are
               | resource-based. It gets complicated very quickly.
        
               | ratww wrote:
               | Don't get me wrong, I'm a Lambda fan. But unlike
               | EC2/EBS/etc, talking to some AWS services from a Lambda
               | that's inside a VPC requires additional infra and you
               | have to pay for the egress. In the end it just wasn't
               | worth the price for us. It was a bad surprise money-wise.
        
             | thinkharderdev wrote:
             | I recently switched jobs. At my old company the dev team
             | basically had carte blanche to setup their own AWS infra
             | without any real restrictions (there were some but very
             | few). It was nice, I almost never had to ask anybody
             | outside my team to do anything to unblock us (at least from
             | an infra standpoint).
             | 
             | In my new job we also use AWS for everything BUT I haven't
             | used the AWS cli once. I don't even have credentials.
             | Basically, their is a platform team which is responsible
             | for running a k8s cluster (or clusters really) and some
             | other common infrastructure which is all running on bare
             | EC2 (no RDS, no EKS, we do use S3 for blob storage but
             | that's about it). And I have to say it is pretty amazing.
             | No more dealing with arcane IAM rules, or trying to figure
             | out how to string together a chain of lambdas to do some
             | sort of complex orchestration task).
             | 
             | I've really come to appreciate the model of using k8s as
             | your "cloud platform." Having a dedicated team that manages
             | the k8s clusters and makes sure they are elastically
             | scalable and reliable. Everyone else is just deploying
             | stuff to kubernetes. They could decide to move everything
             | to a colo tomorrow and I would have to change exactly
             | nothing about how I do my job.
        
         | Galanwe wrote:
         | I don't get why you're so stressed out by AWS billing.
         | 
         | From my experience, once you've worked a bit seriously with
         | AWS, billing is not a blackbox anymore and you're able to plan
         | ahead without too much surprise.
         | 
         | If you're still worried, there's also the option of settings
         | alerts on budget spent and forecast of budget, which should
         | settle the debate. (these are also part of the API, so you can
         | deploy and configure these alerts through terraform)
        
           | GordonS wrote:
           | A bad actor could hammer any publicity available services,
           | and you could be hit with an enormous egress bandwidth bill.
           | 
           | Spend alerts can _help_ with this, but spend is only
           | calculated every 24h or so, so it 's far from a panacea.
        
             | belter wrote:
             | You can mitigate that, for enterprise projects, using AWS
             | Shield Advanced as it comes with DDoS cost protection:
             | 
             | https://aws.amazon.com/shield/features/
             | 
             | I say for enterprise projects because although it's cost is
             | reasonable for corporate projects, not probably something
             | you can justify for most personal/private deployments.
        
             | gonzo41 wrote:
             | Not if you're doing simple and sensible things like using
             | Cloud Front, and setting limits with API Gateway.
        
               | arriu wrote:
               | What if you're using something like grpc?
        
             | triska wrote:
             | This is also one of the things I fear most when running a
             | service in the cloud: A huge bill due to excessive network
             | usage triggered for example by a search engine, web scraper
             | etc. I consider it very unfortunate that "capped cost" has
             | gone somewhat out of fashion, and nowadays many major cloud
             | providers bill excess usage rather than cutting off or
             | slowing down traffic etc.
             | 
             | Here is a simple Bash script that monitors outgoing eth0
             | traffic (once per second) and automatically shuts down the
             | instance once it is greater than 1 TB:
             | #!/bin/bash              # shut down instance if outgoing
             | traffic > 1 TB              # 1 MB         limit=$((10**6))
             | # 10 MB         limit=$((10**7))              # 1 GB
             | limit=$((10**9))              # 1 TB
             | limit=$((10**12))              while true         do
             | date             tx=$(<
             | /sys/class/net/eth0/statistics/tx_bytes)             echo
             | "$tx (limit: $limit)"             if (( tx > limit ))
             | then                 echo cutting                 systemctl
             | poweroff             fi             sleep 1         done
             | 
             | If you save it as cutnetwork.sh in
             | /home/admin/cutnetwork.sh, you can run it as a systemd
             | service:                   [Unit]         Description=Cut
             | Network              [Service]         UMask=022
             | Environment=LANG=en_US.utf8         Restart=on-abort
             | StartLimitInterval=60         StartLimitBurst=5
             | WorkingDirectory=/home/admin         ExecStart=bash
             | cutnetwork.sh              [Install]
             | WantedBy=multi-user.target
             | 
             | This simplistic approach may require adjustments depending
             | on network settings and operating environment, and will not
             | work for example if the instance is rebooted during the
             | billing period, since that resets the counter. I would much
             | prefer a hard-coded setting that reliable works on the
             | instance itself, or a reliable hard billing limit that
             | reliably turns off the service if the accumulated cost
             | exceeds the set amount.
        
               | arriu wrote:
               | Thanks for sharing this.
        
           | ghaff wrote:
           | I know experienced people who have woken up to several
           | thousand dollar AWS bill they didn't expect. And the large
           | cloud providers have clearly indicated by their actions that
           | they're simply not interested in implementing hard cost
           | circuit breakers.
           | 
           | I use AWS very lightly but I totally understand why someone
           | wouldn't.
        
             | geoduck14 wrote:
             | Yup. I used AWS at my last job. We had teams of people
             | using AWS, we had fancy 3rd party tools and extraction
             | metrics to track and report on costs. There were still
             | PLENTY of times when I would just scratch my head "well it
             | looks like EC2s cost an extra $1000 this month. I wonder
             | what happened"
        
             | user3939382 wrote:
             | > the large cloud providers have clearly indicated by their
             | actions that they're simply not interested in implementing
             | hard cost circuit breakers.
             | 
             | I agree, my term for this is "bad faith".
             | 
             | I recently had a free $200 credit for Azure. I setup their
             | default MariaDB instance for a side project, figuring I'd
             | get my feet wet with Azure. I didn't spend time evaluating
             | the cost bc I figured, how much could the default be if I
             | haven't cranked up the instance resources at all? Turns out
             | the answer is more than $10/day which I discovered when
             | authentication failed to my test DB. Back to Digital Ocean.
        
               | TriNetra wrote:
               | Yes in some cases the default is quite expensive - the
               | same was there with SQL Azure (though they have changed
               | that recently) and it had created a good amount of bill
               | for us (though for their credit, Azure did refund in all
               | such occasions because we didn't use the capacity at
               | all). However, I don't know why the alert system doesn't
               | have an option to say "here's my budget, alert me as soon
               | as when my daily pace is set to exceed the monthly
               | budget" instead, you have % of budget amount consumed
               | based alerts, like you can get email if you say 50% of my
               | budget is consumed, which happens every month so kinds of
               | defeat the purpose of an alert.
               | 
               | We ended up creating a simple solution (cloudalarm.in -
               | in beta) that provides such budgeted pace based alert and
               | more ways to get instant alert which isn't possible with
               | usage based alerts.
        
               | whoknew1122 wrote:
               | It's not bad faith. It's 'providing the resources you
               | signed up for'.
               | 
               | Does it mean you have to go into your planning with more
               | consideration as to cost? Yeah.
               | 
               | But how would you feel if your start-up finally goes
               | viral, you're having your best day ever, and then your
               | app just stops working because someone forgot to remove a
               | hard spend limit?
               | 
               | Most people would rather see their app continue running.
               | 
               | And what does turning off the lights look like? If your
               | database hits your cost limit, do you stop serving
               | requests? Delete the data? To what extent do you want
               | 'cost protection' for resources you signed up for?
        
               | imwillofficial wrote:
               | It's not unreasonable to ask for a mechanism to not be
               | billed thousands unexpectedly.
               | 
               | Cloud billing is not easy to understand.
               | 
               | I would know, I work for the part of AWS that calculates
               | people's bills.
        
               | whoknew1122 wrote:
               | It's not rocket science, either. I would know, I work for
               | premium support.
               | 
               | Every time I see an unexpected bill of thousands of
               | dollars, it's because the customer poorly architected
               | their infrastructure.
               | 
               | People seemingly want the freedom of complete control
               | without the responsibility that comes with having that
               | much control.
        
               | nlitened wrote:
               | > If your database hits your cost limit, do you stop
               | serving requests? Delete the data? To what extent do you
               | want 'cost protection' for resources you signed up for?
               | 
               | Sounds like a reasonable configurable option rather than
               | "you shouldn't be able to choose at all".
        
               | eropple wrote:
               | I am sympathetic to the concern about cost overages--I've
               | hit them in AWS before--but given the way that developers
               | and managers think about SaaS products (generally, not
               | just cloud stuff), I tend to think that even if you
               | required them to click three checkboxes and sign their
               | name in blood, the first time you vaporized somebody's
               | production database because they hit their overages and
               | didn't think it would ever happen would be apocalyptic.
               | And the second, and the third. And you're at fault, not
               | the customer, in the public square.
               | 
               | By comparison, chasing off "cost conscious" (read:
               | relentlessly cheap--and I note that in my personal life
               | I'm one of these, no shade being thrown here) users is
               | probably better for them overall.
        
               | whoknew1122 wrote:
               | Work in AWS Premium Support. This is 100% how it goes.
               | 
               | Take KMS keys for example. You can't outright delete a
               | KMS master key; you have to schedule it for deletion. The
               | shortest period you can schedule for deletion is 7 days
               | (default 30). Once the key is deleted, all encrypted data
               | is orphaned.
               | 
               | Guess who gets blamed for deleted keys?
               | 
               | HINT: It's not the customer.
        
               | nlitened wrote:
               | I am sorry, I might be missing something, but I call
               | bullshit. How much does it cost for Amazon to store
               | several bytes that make a key? 5 cents per decade?
               | 
               | "Yeah, so uhm, you hit zero, so we deleted all your keys
               | in an irrecoverable way, sorry not sorry" -- is not a
               | circuit breaker. Make all services inaccessible to public
               | and store the data safely until customer tops up their
               | balance. That's how VPSes have worked forever.
               | 
               | I don't argue that "cheapo" clients are worth retaining
               | for AWS, clearly they are not. But this kind of hypocrisy
               | really triggers me.
               | 
               | Edit: a helpful person below suggested I misunderstood
               | the parent, and I now I think I did.
        
               | supaslide wrote:
               | I'm pretty sure they meant that the customer schedules it
               | for deletion and then blames AWS when they can't access
               | their encrypted data.
        
               | nlitened wrote:
               | Oh, in this case I misunderstood what the parent meant,
               | and I replied to a wrong interpretation of their words.
               | 
               | Thank you for the clarification.
        
               | eropple wrote:
               | AWS doesn't retain _anything_ for you unless you tell
               | them to, and when you tell them to delete something (as
               | in the example relayed by the person you are replying
               | to), they delete it as best as they are able. That 's
               | part of the value proposition: when you delete the thing,
               | it goes away. Why would they start now for clients who
               | want their bills to be in the tens of dollars (when if
               | you really care you can do it yourself off of billing
               | alerts[0])?
               | 
               | Going to be real: you aren't "triggered", which is
               | actually a real thing out there that you demean with this
               | usage of the term. You're just not the target market and
               | you're salty that it's more complex than you think it is.
               | 
               | [0]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/
               | monitori...
        
               | [deleted]
        
               | eropple wrote:
               | I used to run an AWS consultancy, which is how _I_ know.
               | ;) More than once I had a customer go  "well support
               | won't help me, how can I get my data back?". And I had to
               | tell them "well, support isn't just not helping you for
               | kicks, you know?".
        
               | user3939382 wrote:
               | Stop serving requests until the finances are rectified,
               | delete the data 30 days after it stops. Final migration
               | out/egress requires a small balance for that purpose.
               | 
               | The engineers designing and building these systems are
               | some of the best in the world, this is relatively
               | trivial.
        
               | Daishiman wrote:
               | There is absolutely nothing trivial about this.
        
               | user3939382 wrote:
               | Relatively trivial. In other words compared to the rest
               | of the infrastructure and billing system this is nothing.
        
               | ghaff wrote:
               | My term for it is "you're not their use case." For better
               | or worse, they've prioritized usages that would much
               | rather have an unexpected few thousand dollar bill than
               | have services paused or shutdown unexpectedly.
        
               | civilized wrote:
               | But computers can behave differently based on user
               | choice. Right? So there could be a user option to cut
               | service beyond a fixed spend. It wouldn't be hard to
               | implement, and tons of people would use it. They don't do
               | it.
               | 
               | It's not a tragic case of priority and limited
               | engineering resources. They _like_ surprise bills, just
               | like hospitals do.
               | 
               | Businesspeople _love_ it when you come to their service
               | and click through their Russian novel of a service
               | agreement that would take a team of lawyers to parse.
               | Once you do that, your money belongs to them! It 's their
               | court, their rules! They love it!
        
               | nucleardog wrote:
               | > It wouldn't be hard to implement, and tons of people
               | would use it. They don't do it.
               | 
               | Please describe to me, in detail, how this works.
               | 
               | Because every time this comes up everyone claims it's the
               | easiest thing in the world, but if you try and drill into
               | it what they end up actually wanting is generally "pay
               | what you want" cloud services.
               | 
               | There are a _ton_ of resources on AWS that accrue on-
               | going costs with no way to turn them off. A "hard circuit
               | breaker" that brings your newly accruing charges to zero
               | needs to not just shut down your EC2 instances, but
               | delete your EBS volumes, empty your S3 buckets, delete
               | your encryption keys, delete your DNS zones, stop all
               | your DB instances and delete all snapshots and backups,
               | etc, etc.
               | 
               | The only people I see using a feature like this are some
               | individuals doing some basic proof-of-concept work and...
               | a bunch of people that are going to turn it on not
               | understanding the implications and then when they get a
               | burst of traffic that wipes out their AWS account they're
               | going to publish angry blog posts about how AWS killed
               | their startup.
               | 
               | If, like most people, you don't want literally everything
               | to disappear the first time your site gets an unexpected
               | traffic spike, you can already do this by setting up a
               | response tailored to your workload--run a lambda in
               | response to billing alerts that shuts down VMs, or stops
               | your RDS instance but leaves the storage, etc.
        
               | void_mint wrote:
               | > Because every time this comes up everyone claims it's
               | the easiest thing in the world, but if you try and drill
               | into it what they end up actually wanting is generally
               | "pay what you want" cloud services.
               | 
               | Why is it on any (usually a relatively new) user to
               | define how an entire cloud should behave?
               | 
               | Users are asking for a feature that helps them stop
               | accidentally spending more than they intended. This
               | feature request is totally fair. Implementing such a
               | feature would be an act of good faith towards
               | new/onboarding users (also obviously just any user with a
               | very specific budget use-case).
               | 
               | > The only people I see using a feature like this are
               | some individuals doing some basic proof-of-concept work
               | and...
               | 
               | Yes exactly. GCP offers sandboxed accounts for this exact
               | purpose. Why is this such a far reach?
               | 
               | > setting up a response tailored to your workload--run a
               | lambda in response to billing alerts that shuts down VMs,
               | or stops your RDS instance but leaves the storage, etc.
               | 
               | If you're telling every individual user that falls into a
               | specific category to build a specific set of
               | infrastructure, why is it not acceptable to you to just
               | ask AWS to build it?
        
               | thinkharderdev wrote:
               | I think the sandbox idea is a great one. They should just
               | do away with the free tier entirely except for sandbox
               | accounts in which everything just gets shut down the
               | second you go over the free allowance. If you want to
               | build something for real then you pay for whatever
               | resources you use, but if you just want to tinker around
               | and learn a few things then you can get a safe sandbox to
               | do it in.
               | 
               | BUT, I think the parent's point is that such a feature
               | would actually be quite complicated. It's not just a
               | matter of saying "I only want to spend $X in this account
               | per month/total" but defining exactly what you want to do
               | in the case where you hit that limit. Shut everything
               | down? My guess is almost nobody would want to do that. So
               | it ends up being some complicated configuration where you
               | have to deeply understand all of the services and their
               | billing models in order to configure it in the first
               | place. What are the odds that the student who
               | accidentally spins up 100 EC2s for a school project is
               | going to configure this tool correctly?
               | 
               | But I do think the sandbox would be great. Either you are
               | a professional in which case it is your responsibility to
               | manage your system and put in appropriate controls to
               | prevent huge unexpected bills or you are a student (in
               | the general sense of someone learning AWS, not
               | necessarily just someone in school) in which case they
               | provide a safe environment for you to experiment.
        
               | void_mint wrote:
               | > BUT, I think the parent's point is that such a feature
               | would actually be quite complicated.
               | 
               | Sure, but so is making a cloud. Putting the onus of
               | defining a feature like this on users, only after hearing
               | their request ("I want to control my spend"), is IMO
               | unfair.
        
               | thinkharderdev wrote:
               | Not complicated as in "too hard for AWS to build" but
               | complicated as in "really hard to use as someone trying
               | to limit your spend on AWS." So the people most at risk
               | of huge unexpected bills are also not going to be the
               | people knowledgable enough to setup the billing cap
               | correctly. So it would mostly be a feature for
               | enterprises and most enterprises would rather just pay
               | the extra $ rather than potentially turn off a critical
               | system or accidentally delete some user data.
               | 
               | I worked at a company that spent ~$10m per month on AWS.
               | We had a whole "cloud governance" team who built tools to
               | identify both over and underutilized resources. But they
               | STILL never cut any thing off automatically. The
               | risk/reward ratio just wasn't there. You make the right
               | call and shave $10k off a $10m bill every month, but the
               | one time you take down a mission critical service, you
               | give all of that back and then some.
        
               | void_mint wrote:
               | > So the people most at risk of huge unexpected bills are
               | also not going to be the people knowledgable enough to
               | setup the billing cap correctly
               | 
               | Yes, which is why AWS builds it.
               | 
               | > . So it would mostly be a feature for enterprises and
               | most enterprises would rather just pay the extra $ rather
               | than potentially turn off a critical system or
               | accidentally delete some user data.
               | 
               | It would be mostly not Enterprises IMO
        
               | BackBlast wrote:
               | I've been there. I shut down a bunch of what looked like
               | idle instances doing nothing to reduce spend. 80% of
               | which were, in fact, doing nothing. I did drop off two
               | vms that were supporting critical infrastructure.
               | 
               | Everyone who had done any work on them was long gone. I
               | had done my due diligence to identify what they could
               | possibly be.
               | 
               | Still, the day of reckoning came, and we got calls of
               | services down a week after I turned them off. I spun them
               | back up, and they were going again without any real
               | impact to the business.
               | 
               | This turned out to be a blessing as the very next week
               | the cert these same services depended on expired and if I
               | hadn't learned about the system by turning them off we
               | never would have known which boxes held up those
               | services.
               | 
               | Also a lesson in what happens when people leave without
               | any documentation on where the work they did lives and
               | how it works.
        
               | [deleted]
        
             | nucleardog wrote:
             | > And the large cloud providers have clearly indicated by
             | their actions that they're simply not interested in
             | implementing hard cost circuit breakers.
             | 
             | Since I enjoy tilting at windmills--how do you propose this
             | works? Like, in detail.
             | 
             | Because every time I try and drill into details of this
             | with someone, it winds up what they really want is "pay
             | what you want" cloud services.
             | 
             | AWS is much, much more than just a place to run a virtual
             | server and many resources accrue on-going costs with no way
             | to "turn them off". When you hit your hard circuit breaker,
             | do they delete all your EBS volumes and data in S3? Your
             | private SSL root? Your user directory? Encryption keys? DNS
             | zones?
             | 
             | The number of people that would want all of that removed
             | when they hit their $X/mo limit is likely minuscule in
             | comparison to the number of people that would turn this on
             | not understanding what it really meant and then publishing
             | angry blog posts about how Amazon killed their startup
             | right when they got popular and traffic spiked.
        
               | alexeldeib wrote:
               | https://docs.microsoft.com/en-us/azure/cost-management-
               | billi...
               | 
               | e.g. "virtual machines are stopped and de-allocated. The
               | data in your storage accounts are available as read-
               | only."
               | 
               | Most control plane operations will also be blocked. It
               | gets complicated with more complex resource types, but it
               | gets the job done anyway.
               | 
               | Note that this functionality is sort of _required_ to
               | support pre-paid plans without allowing them to exceed
               | specific limits, which do exist on Azure. So there 's a
               | business dependency on this functionality today, it's not
               | a hypothetical.
        
               | ghaff wrote:
               | That wasn't a value judgement on my part. I've made the
               | exact same comment as you previously.
        
           | igetspam wrote:
           | Until you're into EC2-Other and then you have to follow
           | various guides to figure out most of what that means. Even
           | then, it's black box billing that Teams even struggle to
           | explain. I spend a ridiculous amount on egress that's nearly
           | impossible to track.
        
           | the_jeremy wrote:
           | If I'm using my own money on a personal project, I do not
           | want "alerts". I want a maximum budget spend per X, where X
           | is a _small_ increment of time, like an hour or day.
           | 
           | Supporting hobbyist projects would absolutely lead to higher
           | AWS adoption, at least in smaller companies.
        
           | Tenoke wrote:
           | >I don't get why you're so stressed out by AWS billing.
           | 
           | Presumably because if you haven't used it and are digging
           | deeper there's so many services with different types of
           | billing that it can be hard to keep track.
           | 
           | And who among us has not left something running way after it
           | should've been shut down accidentally..
        
           | danjac wrote:
           | > once you've worked a bit seriously with AWS
           | 
           | Kind of a chicken-and-egg situation, no? Unless you're on the
           | company dime, learning to work with AWS entails that risk. A
           | beginner simply won't know how to configure all of these
           | things.
        
             | RobRivera wrote:
             | the risk is minimalized by reading the docs and proper
             | planning.
             | 
             | EDIT: got to love the holy downvoters
        
               | danjac wrote:
               | That's the point though. A beginner is going to make
               | mistakes and should be able to learn in a safe
               | environment. Think of a student on a tight budget at
               | college or in a bootcamp, who has to learn AWS because
               | it's on the curriculum.
        
               | fleaaaa wrote:
               | Exactly, it's pretty common for them to shoot their own
               | foot with a couple of hundred dollar bill, just for one
               | tiny instance with additional options that 'you have to
               | do'.
               | 
               | IMO AWS is deliberately make these things happen and
               | reimburse it later with excuses. It's rather a strategy
               | at this point it seems like.
        
               | danjac wrote:
               | I don't think it's so much a money-grabbing strategy as
               | the problem that AWS is less a suite of unified services
               | and more a litter of puppies fighting in a sack. With
               | that kind of org chart it's difficult to have a unified,
               | simple billing experience with good beginner training-
               | wheels and on-ramps.
        
               | geoduck14 wrote:
               | >AWS is less a suite of unified services and more a
               | litter of puppies fighting in a sack
               | 
               | Can I quote you on this? I'm going to quote you.
        
               | tenaciousDaniel wrote:
               | Yep. I'm a novice at AWS. Last year I heard about RDS,
               | and tried playing around with it.
               | 
               | I thought I had shut it down because I clicked some
               | button that _looked_ like it was the turn-off button. I
               | let it go on for a few months, only to discover that I
               | had been charged $1,500.
               | 
               | The one thing that really pissed me off was how easy it
               | was to set up vs how hard it was to take it back down. I
               | can't remember the details, but basically you cannot
               | simply turn off an RDS instance only in the UI (even
               | though you can turn it on in the UI). You have to install
               | the SDK and perform some (seemingly complex) commands.
               | 
               | I tried explaining that I was a beginner, and that I made
               | this mistake by accident, and that they could easily see
               | that I had not actually used this instance at all or even
               | put any data into the DB. But they wanted this huge list
               | of things from me in order to refund it, like a super in-
               | depth explanation of how it happened. Really shitty
               | experience overall. So I canceled my AWS account and
               | likely won't go back until I have a job that pays me to
               | learn and use it.
        
               | scrose wrote:
               | You can 'Stop' a DB and 'Terminate' a DB through the UI.
               | If you have deletion protection turned on, you can only
               | stop the DB until it's turned off, which can also be done
               | through the UI.
               | 
               | You most likely stopped the DB, but the problem there is
               | that AWS will automatically turn on the DB after 7 days.
               | You also still get charged for storage for the time your
               | DB is off.
               | 
               | Sorry to hear that though. I know it's a really sucky
               | situation to be in.
        
               | tenaciousDaniel wrote:
               | Ah yeah, I remember now. You could turn off the DB, but
               | there was some other kind of scaffolding thing that I was
               | still getting charged for. It seemed like an RDS-specific
               | thing. They had to point me to a tutorial for turning
               | that piece off.
        
               | geoduck14 wrote:
               | >You could turn off the DB, but there was some other kind
               | of scaffolding thing that I was still getting charged
               | for.
               | 
               | It is a feature! /s
        
               | Galanwe wrote:
               | > I let it go on for a few months, only to discover that
               | I had been charged $1,500.
               | 
               | You get billed monthly, so really, letting it go for a
               | few months is on you.
               | 
               | > The one thing that really pissed me off was how easy it
               | was to set up vs how hard it was to take it back down >
               | You have to install the SDK and perform some (seemingly
               | complex) commands.
               | 
               | Hu, no it's not. Really it's litterally one click to
               | shutdown an RDS instance, and always has been.
               | 
               | > But they wanted this huge list of things from me in
               | order to refund it, like a super in-depth explanation of
               | how it happened.
               | 
               | I mean that makes sense to me. They did reserve and
               | partially used these resources for you, so it's only fair
               | that you have to go through the trouble of explaining why
               | they would let it go. If there's no downside everyone
               | would just reserve a bunch of resources all the time.
               | 
               | From your comment, it seems you didn't even bothered to
               | answer their questions to get a refund, I would totally
               | not hesitate to charge you if I was in AWS place.
        
               | leeoniya wrote:
               | > You get billed monthly, so really, letting it go for a
               | few months is on you.
               | 
               | right? what a moron, that bill could have been a surprise
               | of only $500!
               | 
               | /s
        
               | tenaciousDaniel wrote:
               | lol exactly. I've discussed this incident before on HN
               | and had the same kind of "well tough luck but you
               | deserved it" responses. They seem not to understand how
               | off-putting it is to newcomers or students, to basically
               | say "hey well YOU made a mistake, the trillion-dollar
               | corporation SHOULD take your money, idiot!"
        
               | geoduck14 wrote:
               | >I've discussed this incident before on HN and had the
               | same kind of "well tough luck but you deserved it"
               | responses.
               | 
               | Brush it off. I've worked with AWS pros who get lost in
               | the billing. In my last job, we had a big "hackathon"
               | where the objective was to reduce our AWS spend. Overall,
               | we reduced our annual bill by a couple million dollars.
        
               | tenaciousDaniel wrote:
               | > From your comment
               | 
               | Pretty stupid to just assume that I "didn't even bother"
               | to answer their questions. I'm not going to write a novel
               | explaining every minutia of my interactions with AWS.
               | 
               | > letting it go is on you
               | 
               | I never said it wasn't. If you pay attention, you'll see
               | that the context of this comment is discussing the "it's
               | on you" culture surrounding AWS and how hostile it is to
               | newcomers.
        
               | _wldu wrote:
               | Doesn't the free tier cover a lot of what a student may
               | want to do?
        
               | lmz wrote:
               | The free tier is just that, a tier (not a hard limit),
               | and will not cover any use above the tier.
        
               | triska wrote:
               | I did not downvote this, but I have a comment: The risk
               | is _not_ "minimalized by reading the docs and proper
               | planning". To minimize something means to reduce it to
               | the smallest possible amount, and I do not like to take
               | any chances whatsoever when a huge excess bill is a
               | possible outcome of a single misconfigured setting that
               | can only be ruled out by reading hundreds if not
               | thousands of pages of documentation and then following
               | the documentation without mistake to the letter.
               | 
               | There is a clear possible solution for reliably
               | preventing any amount of unintended overpayment, and that
               | would be to configure a hard billing limit that can
               | _never_ be exceeded, _no matter what else is being
               | configured_. All services that generate additional costs
               | would simply have to stop or be removed if the configured
               | limit is exceeded.
               | 
               | That would truly minimize the risk, because any
               | configuration error I make will then not lead to excess
               | payment if I configure such a limit and the cloud
               | provider respects it.
        
           | pastage wrote:
           | I do not have the time for alerts in my personal life,
           | billing PTSD is not fun.
        
         | Tenoke wrote:
         | With employers I almost always use AWS, for side projects I
         | almost always use hetzner for cheap servers. I don't even think
         | you need to worry much about learning AWS unless you need it
         | but if you do you can limit your budget, set alerts and hope
         | they go off.
        
           | danjac wrote:
           | I'd be happy to never touch AWS unless a) I'm not the one
           | paying for it and b) there is a genuine need. Unfortunately
           | it's increasingly a job requirement.
        
         | StratusBen wrote:
         | [Disclosure] I'm Co-Founder and CEO of http://vantage.sh/ and
         | was previously the lead PM on DigitalOcean's Droplet product as
         | well as on the product team at AWS for container services.
         | 
         | We try to help out a bit on this with Vantage which essentially
         | gives you a DigitalOcean-esque view of your AWS costs. The
         | first $2,500 in AWS costs are tracked for free which would
         | seemingly cover your side-projects.
         | 
         | It sounds like you've found your home on DigitalOcean, but I'd
         | be curious if something like Vantage would potentially change
         | your decision to build on AWS? In particular what you mention
         | about runaway bills is something that Vantage sends alerts on
         | in advance. We also show you a full inventory of your AWS
         | resources and what they cost you.
        
           | civilized wrote:
           | What happens if your tracking doesn't end up matching the
           | actual bill?
        
             | StratusBen wrote:
             | Vantage integrates at an AWS account level through a
             | mechanism called a Cross account IAM role which allows us
             | to ingest and process the raw data that AWS uses for its
             | own billing systems (Cost and Usage Reports, Service APIs
             | and Cost Explorer)
             | 
             | We haven't seen a single case where we don't end up
             | matching the actual AWS bill. In fact, with a release
             | currently in BETA and rolling out in a week or two, we'll
             | be providing _richer_ data _faster_ than AWS Cost Explorer
             | provides.
             | 
             | You can see a demo of that here (designs still need to be
             | implemented)
             | https://www.loom.com/share/dcb72a921f134e59b19a0dd3d3ab0e2f
        
               | nlitened wrote:
               | Yeah but do you guarantee that you will cover any real
               | billing differences, or you're not sure enough to put
               | your money where your mouth is?
        
               | NathanKP wrote:
               | Having seen firsthand the kind of devious folks who are
               | out there constantly trying to do fraudulent activity on
               | AWS, I don't think any small startup like Vantage would
               | ever want to offer a bill insurance scheme. It would be
               | ripe for exploitation, such as someone trying to spin up
               | 1000 instances in the last couple minutes of the month
               | and then say "Gotcha, your bill prediction didn't match
               | up with the real bill after all!"
               | 
               | At a more general level this may also be one of the most
               | entitled asks I've ever seen in the 12+ years I've been
               | on HN.
        
               | mping wrote:
               | This is crazy. You suggest that if AWS changes billing
               | somehow, his startup should shoulder the cost? If you
               | don't trust it don't use it.
        
               | jeswin wrote:
               | This is important, and will show confidence in your
               | product.
               | 
               | The requirement is insurance against not exceeding a hard
               | limit. We're talking about a rare event, so the offering
               | (vantage.sh) is not good for this specific usecase if it
               | isn't absolutely foolproof.
        
               | echelon wrote:
               | StratusBen, this is where you could charge an incredible
               | premium for your service.
               | 
               | If you could underwrite price guarantees and show an
               | insurance company or lender that your figures are right
               | in most cases and that you've never lost over a certain
               | amount, you could really hike the cost of your service
               | and provide an incredible utility across the board.
               | 
               | If you have spare bandwidth and can reliably do this, try
               | building this.
        
             | TriNetra wrote:
             | While building CloudAlarm [0] (supports Azure as of now),
             | we found that the usage data on Azure wasn't available for
             | at least a day - in fact, they keep adding the data to it
             | during the next day so technically two days would have
             | passed when the actual usage data is available. I haven't
             | gone deeper in AWS but they also gradually make the data
             | available as per I've read. SO instantly alert we thought
             | cannot be possible with usage and hence we chose a novel
             | route - of 'New Resource' alarm - wherein you can get
             | alerted for all resources created or for anything expensive
             | than the tier you choose. The resources in your Azure
             | subscription are available almost instantly via the API, so
             | this was something a nice workaround we thought.
             | 
             | 0: https://cloudalarm.in/
        
         | Thaxll wrote:
         | DO is very amateurish, exactly how to use secure resources with
         | rbac or anything? I mean it took them 10 years to have a load
         | balancer service lol.
         | 
         | I would never use DO or Linode at work, those are for garage
         | project over the week-end.
        
           | sethammons wrote:
           | DO does not have a good support channel. My droplet died. I
           | couldn't reach it, and neither could the world. There was no
           | emergency button for getting help. Just send in an email.
           | After a small time window, I had to just delete and rebuild
           | my droplet. Two days or so later, they got back to me and
           | because I deleted it, they couldn't debug it. Two days to get
           | back to me on a dead, non-reachable droplet. I don't think
           | that is acceptable for anything running in production.
        
             | mwcampbell wrote:
             | But the whole point of treating cloud servers as cattle
             | rather than pets is that when one dies, you can spin up
             | another one, right? Ironically, that's one reason I prefer
             | AWS over DO and the like, because AWS's EC2 auto-scaling
             | and availability zones are great for this kind of
             | resilience.
        
         | tester756 wrote:
         | I struggle to understand what's so interesting in
         | X(AWS/Azure/GCP/Alibaba/SAP/Oracle...) Cloud that it appears
         | basically daily on HN
         | 
         | I see it like +-decade old (in mainstream) wrapper/apis over
         | managing VMs/Infra while being proprietary as hard as possible
         | 
         | What's the difference between this and javascript frameworks
         | posts? (except js frameworks being OSS)
         | 
         | How many years have to pass until "Cloud" stops being
         | $hot_topic?
         | 
         | Were "admin" topics (heh you know those guys that were
         | predecessors of "DevOps") 10-15 years ago hot too?
        
           | duiker101 wrote:
           | I don't think we are even near the peak "Cloud" before it can
           | slow down. We are just starting to see now entire dev envs in
           | the cloud, and I'm pretty sure we will continue to go in that
           | direction.
        
           | dehrmann wrote:
           | The original, main value proposition was that you don't need
           | to physically manage your server anymore. Some products are
           | just wrapped, managed versions of something you're already
           | familiar with. Then there are more "original" offerings like
           | DynamoDB, Bigquery, and Bigtable that bring a lot of value on
           | their own and are significantly easier to operate at large
           | scale than any open source equivalent.
        
           | sofixa wrote:
           | > I struggle to understand what's so interesting in
           | X(AWS/Azure/GCP/Alibaba/SAP/Oracle...) Cloud that it appears
           | basically daily on HN
           | 
           | > I see it like +-decade old (in mainstream) wrapper/apis
           | over managing VMs/Infra while being proprietary as hard as
           | possible
           | 
           | Don't know if serious or not, but nevertheless let me try.
           | It's the global and near infinite scale, the enormous amounts
           | of managed services you get behind those APIs. You need a
           | database/message queue/object storage whatever at whatever
           | scale? Have at it, and pay as you go. If you can't see the
           | interest in that, i wonder what it is that you do.
           | 
           | And IMHO there's nothing inherently complex about the APIs of
           | AWS or GCP ( the only ones I've really used). They're as
           | complex as the things they manage.
        
           | throwdecro wrote:
           | > I struggle to understand what's so interesting in
           | X(AWS/Azure/GCP/Alibaba/SAP/Oracle...) Cloud...
           | 
           | We're still afraid of the cloud spending all of our money.
        
           | theamk wrote:
           | We are talking about things we re using or want to use. And
           | the usage of cloud is not going to go down anytime soon.
           | 
           | For example, I think AWS spend is likely the biggest spend in
           | my company after the payroll/offices, so it is pretty
           | important topic, business-wise.
           | 
           | And unlike JS frameworks which only matter to a subset of JS
           | frontend developers, everyone can use cloud: JS or Java or
           | Rust or C++ or C; frontent, backend, data science, ML,
           | compilers, embedded.
        
         | ManuelKiessling wrote:
         | I feel you. I do take the risk - the leverage on automation and
         | manageability that Terraform e.a. give me are just too good to
         | pass, and only with a 100% API approach like AWS provides it
         | can I play the 100% infrastructure-as-code game, and I simply
         | won't play any other game anymore.
         | 
         | Through the very same means, first thing I do with every new
         | AWS-based project is setting up a cleanly organized Org with
         | centralized billing, centralized IAM&Roles, centralized billing
         | alarms, centralized SCP limitations(!!!) (as in "I will never
         | run anything in Southeast Asia, so I disallow anything in
         | Southeast Asia for all Org accounts), and very not-centralized
         | resources per stage/subproject/vertical/whatever.
         | 
         | Plus sensible service limits on everything that has a service
         | limit (request on API gateway etc.).
         | 
         | But as someone here said: your risk will remain > 0, you just
         | have to accept that.
        
           | whichquestion wrote:
           | DigitalOcean has a terraform provider you can use for an IaC
           | approach
        
             | atmosx wrote:
             | DO doesn't offer accounting in services (who accessed what
             | and when), no IAM which is a huge problem and their APIs
             | and services are a bit low on reliability. Spaces has
             | extremely low rate limit and their communication API time
             | outs often. The k8s service works overall but has some
             | annoying hiccups that only support tickets will fix, the
             | CDN returns random 503 errors. All droplets are shared and
             | resource contention is a thing.
             | 
             | That said, AWS support has its own issues, rarely solving
             | the problem even when we pay for a TAM. Services like
             | elasticache are hard to upgrade with zero downtime. Their
             | solutions always involved spending inordinate amounts of
             | money on open source clones with 1/10 of features and their
             | good services (DynamoDB) will cost an arm and a leg.
             | 
             | My 2 cents.
        
         | TriNetra wrote:
         | It's the same with Azure. On multiple occasion, I had databases
         | created in tiers several times expensive than the one I use
         | with my subscription. This wasn't a manual mistake; a sleeping
         | app got awakened (may be I'd have hit the run button by
         | mistake) and it ended up creating the database via the ORM
         | framework. Since the ORM framework is only executing create
         | database on the sql server, Azure goes with the default for
         | tier which they had chosen as one with $250 or something
         | monthly price. I've setup the budget alerts on Azure but these
         | are threshold (% of budget consumed) based and they come every
         | month so technically they aren't alert rather information which
         | requires you to do the math whether or not you're in the
         | budget. So you tend to ignore them after a while. Recently, we
         | decided to build a simple solution ourselves [0] which gives
         | alerts based on budgeted pace and not consumption.
         | 
         | 0: https://cloudalarm.in/
        
       | helsinkiandrew wrote:
       | Nothing for me compares to the time I purchased 2 reserved EC2
       | instances for about $5K on my personal account rather than
       | companies. I can still remember that sinking feeling as I
       | realized what I'd done.
       | 
       | Amazon refunded the next day.
        
         | stingraycharles wrote:
         | It's incredibly easy to spend a lot of money on the cloud,
         | indeed. I remember using Google Cloud's translate API on a
         | bunch of documents -- it took several hours for the bill to pop
         | up at $1500. This was a hobby / personal project of mine,
         | Google did not refund it, because of course I should have read
         | the pricing more carefully.
        
           | dotancohen wrote:
           | This is the advantage with AWS, they _will_ refund mistakes.
           | I've seen it happen twice, and both times were resolved
           | quickly with a rep on the phone.
           | 
           | I've also once had an issue with my own personal account.
           | Five minutes with a rep on the phone saved not my bank
           | account, but my website and hosted services, because my
           | credit card was cancelled and it would be another few months
           | before I could get another.
        
             | nobleach wrote:
             | Amazon as a company tends to side with the customer. Their
             | whole mantra is that it's not worth chasing after x* amount
             | of dollars. Now repeat offenses? No, you're not getting
             | away with using their services for free. (You mention
             | twice, but I imagine 4 or 5 times, and they're going to
             | fault you without escalating the issue)
             | 
             | *within reason... you're not going to serve up an app all
             | month long and skip out on a million dollar bill.
        
               | dehrmann wrote:
               | > Amazon as a company tends to side with the customer.
               | 
               | Completely agree. Google might be learning parts of this
               | with GCP, but historically, customer-obsessed isn't in
               | Google's DNA.
        
       | dncornholio wrote:
       | Mistakes? How about the flaws of that what is AWS and there
       | terrible, terrible pricing system that rewards them for your
       | mistakes.
        
       | fukmbas wrote:
       | Mistake #1: using AWS
       | 
       | lol
        
       | zackmorris wrote:
       | I view AWS as a study in doing everything the "bare hands" way.
       | Here are some examples of the old sysadmin ways of doing things
       | vs the modern "web" way:
       | 
       | * regions -> self-balancing algorithms like RAFT
       | 
       | * roles/permissions -> tokens
       | 
       | * IP address filtering -> tokens
       | 
       | * CPU clusters -> multicore/containerization/Actor model
       | 
       | * S3 -> IPFS or similar content-addressable filesystems
       | 
       | It's not just AWS having to deal with this stuff either:
       | 
       | * CORS -> Subresource Integrity (SRI)
       | 
       | * server languages (CGI) -> Server-Side Includes (SSI)
       | 
       | * Javascript -> functional reactive, declarative and data-driven
       | components within static HTML
       | 
       | * async -> sandbox processes, fork/join, auto-parallelization
       | (seen mostly in vector languages but extendable to higher-level
       | functions)
       | 
       | * CSS -> a formal inheritance spec (analogous to knowing set
       | theory vs working around SQL errata)
       | 
       | I could go on forever but I'll stop there. We are living at a
       | very interesting time in the evolution of the web. I think that
       | web dev has reached the point where desktop dev was in the
       | mid-1990s and is ripe for disruption. No disruption will come
       | from the big companies though, so this is your chance to do it
       | from your parents' basement!
        
       | StratusBen wrote:
       | [Disclosure] I'm Co-Founder and CEO of http://vantage.sh/, a
       | cloud cost platform for AWS. Previously I was a product manager
       | at AWS and DigitalOcean.
       | 
       | Since the author and so many people are commenting about AWS
       | costs (and in particular, choosing cheaper EC2 instances and EBS
       | volumes) I thought I'd mention that Vantage has recommendations
       | that look to tell you for these exact things so you don't get
       | tripped up / spend more than you have to.
       | 
       | If you have "antiquated" EC2 instances or EBS volumes, Vantage
       | will give you a recommendation for which instance to switch to
       | and how much money you'll save.
       | 
       | The first $2,500/month in AWS costs are also tracked for free so
       | people get a lot of value out of the free tier and can save
       | significant parts of their bills when developing on AWS.
        
         | 7sidedmarble wrote:
         | Respect that you are all about that grindset for your product
         | in this thread, but it's also a little insane that you need a
         | third party tool to make sense of what's going on in AWS.
         | 
         | I'm a bit of a GCP fan, and while it's billing is also arcane,
         | it think it is just a little bit easier to understand and
         | better laid out. For bread and butter stuff like regular VPSs
         | though, AWS is often a little cheaper. But GCPs other cloud
         | offerings are occasionally very respectably priced.
        
           | swyx wrote:
           | every $X00 billion dollar business is big enough that third
           | party tools will always be desired because the default
           | experience wont be good enough for some part of the market.
           | 
           | question is whether or not that part is big enough to warrant
           | its own venture scale business, as with Vantage :)
        
         | smoldesu wrote:
         | I'd frankly just prefer to use a VPS. The fact that I need to
         | have _a payment stack_ alongside my technology one is just
         | ridiculous to me.
        
       | hughrr wrote:
       | Biggest mistake I've made:
       | 
       | Shifting any non trivial infrastructure into AWS verbatim is
       | always more expensive than running it yourself. You need to
       | rearchitect it carefully around the PaaS services to make a cost
       | saving or even break even.
       | 
       | An extreme example of this is it cousin who works for a small dev
       | company doing LOB stuff. They moved their SQL box into EC2 and
       | it's costing more to run that single RDS instance than their
       | entire legacy infra cost was per year.
       | 
       | I'd still rather use AWS though. The biggest gain is not
       | technology but not having to argue with several vendor sales
       | teams or file a PO and wait for finance to approve it. All I do
       | is click a button and the thing's there.
        
         | Aeolun wrote:
         | > The biggest gain is not technology but not having to argue
         | with several vendor sales teams or file a PO and wait for
         | finance to approve it. All I do is click a button and the
         | thing's there.
         | 
         | This is _so_ ridiculous. I have to argue endlessly (and again
         | for every employee) with IT support and enterprise security to
         | give them the ability to upload attachments on Teams.
         | 
         | But giving that same person access to start a few $100/hour
         | instances on AWS? No problem.
         | 
         | The balance is completely out of whack once your infra is on
         | AWS.
        
           | thinkharderdev wrote:
           | But that's the thing. A senior engineer is also costing a
           | business ~$100/h (all in) so even if you accept that there
           | will be a fair amount of waste (misconfigurations, devs
           | spinning up boxes and then forgetting about them, PoC
           | projects never torn down, etc) it can still be a net-positive
           | proposition. People always want to compare the cost of
           | compute/storage/bandwidth but that isn't really the value
           | proposition. Of course you are going to pay more for cloud-
           | hosted infra than for equivalent infra in a colo or on-prem
           | DC. But I used to spend hundreds of hours a year doing random
           | busy work to deal with on-prem infrastructure. Need a new
           | server, put in a ticket and when the ticket gets no response
           | follow up with emails and finally setup a meeting to discuss
           | with the ops team. Need a new firewall rule, same deal.
        
           | sethammons wrote:
           | Have them build an s3 attachment service and just let finance
           | know you are spending $X/yr and you have ideas for
           | streamlining IT support that would eliminate the cost.
        
         | rualca wrote:
         | > Shifting any non trivial infrastructure into AWS verbatim is
         | always more expensive than running it yourself.
         | 
         | The free tier of AWS lambdas has enough room to do non-trivial
         | applications for free, and in EC2 we can get t2.micro and
         | t3.micro instances (2vCPU, 1GB RAM) with 750h/month for free,
         | which pretty much means you can have the instance running the
         | whole month for free.
         | 
         | Depending on what you need to do, in the very least it's
         | possible to run a system (or parts of it ) for free, which is
         | hard to beat.
         | 
         | Having said this, allowing a system architect to go nuts with
         | AWS without being mindful of its cost is something that easily
         | gets far too expensive far too fast. If all anyone wants is EC2
         | and there's no need for global deployments then you'd be better
         | off going with cloud providers such as Hetzner. A couple of
         | minutes with a calculator and a napkin at hand is enough to
         | arrive at the conclusion that AWS makes absolutely no sense,
         | cost-wise.
        
         | macpete42 wrote:
         | I can confirm that: cloud helps to evade the incompetent sales
         | and infrastructure teams in many companies. Saving money never
         | works once your product scales out.
        
         | AndrewDucker wrote:
         | It's always more expensive to have someone else run your
         | infrastructure than to do it yourself unless it's something you
         | only use intermittently.
         | 
         | If you need 5 seconds of compute time per day then running that
         | as a Lambda makes perfect sense. If you need a database server
         | that's available 24/7 then I can't see how hosting that on
         | Amazon could be cheaper.
         | 
         | (Unless you're employing a full time ops person to look after
         | that one server, in which case you'll have to do your own
         | maths.)
        
           | helsinkiandrew wrote:
           | Previous job, I ran website on RDS for about 7 years and only
           | touched the control panel to restore from a backup when we'd
           | screwed up the data, and to tell it when to upgrade. That was
           | worth quite a lot in ops time and piece of mind.
        
             | habibur wrote:
             | Database crashing on a hosted service is such a rate event
             | unless you mess it up yourself running rouge queries. No
             | surprise there. It doesn't have to be AWS though. Can be
             | Digital Ocean or anything else.
        
           | hughrr wrote:
           | Yep that. Lambda is a massive win for me personally. I have
           | some scraping and processing stuff that runs daily. Costs me
           | $0.60 a month to run it even outside of free tier which is
           | less than a cheap DO or linode box and I don't have to look
           | after the OS.
        
             | AndrewDucker wrote:
             | Same. Powershell script that collects links from Pinboard
             | and posts them to my blog. It would be massively overkill
             | to run a whole server for that. (And Microsoft charges me
             | about PS0.15 per month)
        
           | l33tman wrote:
           | Well that 24/7 db server is not going to back up itself and
           | maintain itself when the HW fails or the power goes out. This
           | is not trivial/cheap to assure and maintain. Likewise with
           | hosting your own S3. The value is not in the HW really (and
           | the cloud providers know this when pricing).
           | 
           | I have this setup on AWS and look at "bringing it home" every
           | year but the nightmare of having to assure a good level of
           | availability is not worth the saving of a few extra $1000 per
           | year for us at least. Completely different issues not related
           | to your HW can happen, your office internet simply goes down
           | or power goes out while you're on vacation etc..
           | 
           | At least there is major competition between a lot of cloud
           | providers nowadays so nobody can get away with insane prices
           | anymore. Though, would be cool to see some kind of
           | standardised price comparision metric for medium/high
           | complexity cloud setups. Sort of how you compare grocery
           | prices, you have a standard purchase list.
        
             | AnIdiotOnTheNet wrote:
             | > Well that 24/7 db server is not going to back up itself
             | and maintain itself when the HW fails or the power goes
             | out.
             | 
             | Uh, yeah, that's why you pay IT people to do that sort of
             | thing. Hiring your own will almost certainly be less
             | expensive than paying Amazon's people, and provides you
             | more control and more options in the event of any problems.
             | It's not like AWS never has problems, and when it does all
             | you can do is twiddle your thumbs until someone else fixes
             | it.
        
               | Daishiman wrote:
               | In what world do you live in where competent IT support
               | for production infrastructure is cheap?
        
               | AnIdiotOnTheNet wrote:
               | The one that doesn't live in SV.
        
             | hughrr wrote:
             | It's cheaper for us to bring our own staff and licenses
             | than pay for RDS at our scale :)
        
               | WaxProlix wrote:
               | Once you hit that point, AWS will almost always cut you
               | some specific pricing deals to ensure that your pain
               | point is competitively priced, since they want you in the
               | ecosystem anyways.
        
         | tpetry wrote:
         | Wouldn't it make sense booki g typical SaaS on AWS marketplace?
         | I mean you wouldn't have to talk to the billing department,
         | just activate a SaaS within AWS and everything is put on your
         | normal AWS bill?
        
       | wly_cdgr wrote:
       | Heh, I like how Amazon literally took the boost mechanic from
       | arcade racing games for the CPU credits in T2/T3
        
       | physicles wrote:
       | Burst CPU and IOPS has bitten me a couple times over the years.
       | In fact, it's basically the sole cause of nearly all our downtime
       | in recent history. That's frustrating. I get that it's a
       | technical solution to the problem of resource utilization at
       | scale, but they could've spent some time making it easier to
       | observe -- for example, rescale the CPU or IOPS graphs so that
       | 100% is your max sustained budget, and anything over 100% eats
       | into your quota.
        
       | [deleted]
        
       | tedk-42 wrote:
       | Few easy ones as well:
       | 
       | 1) Terminating instances that had ephemeral disks with stuff you
       | needed while thinking the EBS volumes would remain
       | 
       | 2) Leaving NAT gateways lying around or ELBs that do nothing and
       | have no instances attached.
       | 
       | 3) Public S3 buckets - arguably the most common one that can lead
       | to security incidents
       | 
       | 4) Debugging security groups/Network ACLs and straight up break
       | networking for something without knowing it. Reverse of that
       | would be you want to fix something quickly and open 0.0.0.0/0 to
       | everyone and never get around to tightening up the firewall later
       | on.
        
         | jnieminen wrote:
         | I was playing with the Azure "free" tier. Even I tried to be
         | extremely careful with it, after a while noticed that I had
         | left a storage blob for a VM hanging around and some external
         | IPv4 address. I will continue to use Hetzner online for my own
         | stuff instead running this on "public cloud".
        
       | projectramo wrote:
       | My biggest mistake: years ago I ended pushing personal
       | credentials to GitHub at night and waking up to a several
       | thousand dollar bill in the morning.
       | 
       | Changed credentials and cancelled all the running instances only
       | to find that I'd missed some.
       | 
       | It was resolved by the afternoon.
        
         | judge2020 wrote:
         | Thankfully GitHub now runs secret scanning and AWS is a
         | partner. If you did this today AWS will revoke the key before
         | malicious scanners find it.
         | 
         | https://docs.github.com/en/code-security/secret-scanning/abo...
        
       | unglaublich wrote:
       | But what mistakes did he make? Did he screw up the bill? Did he
       | fail to keep services available? I only read facts about the ins
       | and outs of AWS' billing and credits system.
        
         | weird-eye-issue wrote:
         | If you run out of CPU or IOPS burst balance then your system
         | will suddenly slow to a crawl and it can easily cause downtime
         | or in the case of background jobs it will cause long queues or
         | never ending jobs. Learned that the hard way, a couple times.
         | 
         | One time I optimized DB access which fixed the IOPS usage, and
         | then that caused more CPU usage on the app servers which caused
         | them to run out of CPU burst... Fun times. Switched from one
         | burst issue to another.
        
           | sethammons wrote:
           | And that is scaling systems. Open up one bottleneck to
           | discover the next. Rinse and repeat to gain experience.
        
       | mfrye0 wrote:
       | One of the biggest mistakes I made is not exploring spot
       | instances and reserved instances earlier.
       | 
       | I cut my bill by 70-80%% after paying full price for years...
       | 
       | If you have an active web server or backend workers with fairly
       | short jobs, spot instances will work for you.
        
         | thanatos519 wrote:
         | Does the 'cpu credits' stuff apply to spot instances too? I
         | have been thinking of shortening my animation render time with
         | spot instances, but it only makes sense if I can run every core
         | at 100% for the entire life of the instance.
        
       | calmlynarczyk wrote:
       | This is more just "missed optimization opportunities in EC2" than
       | a statement about mistakes in AWS as a whole.
       | 
       | If you want to talk systemic AWS mistakes you can make, we
       | accidentally created an infinite event loop between two Lambdas.
       | Racked up a several-hundred-thousand dollar bill in a couple of
       | hours. You can accidentally create this issue across lots of
       | different AWS services if you don't verify you haven't created
       | any loops between resources and don't configure scaling
       | limitations where available. "Infinite" scaling is great until
       | you do it when you didn't mean to.
       | 
       | That being said, I think AWS (can't speak for other big
       | providers) does offer a lot of value compared to bare-metal and
       | self-hosting. Their paradigms for things like VPCs, load
       | balancing, and permissions management are something you end up
       | recreating in most every project anyways, so might as well
       | railroad that configuration process. I've experienced how painful
       | companies that tried to run their own infrastructure made things
       | like DB backups and upgrades that it would be hard to go back to
       | a non-managed DB service like RDS for anything other than a
       | personal project.
       | 
       | After so many years using AWS at work, I'd never consider
       | anything besides Fargate or Lambda for compute solutions, except
       | maybe Batch if you can't fit scheduled processes into Lambda's
       | time/resource limitations. If you're just going to run VMs on
       | EC2, you're better off with other providers that focus on simple
       | VM hosting.
        
         | itisit wrote:
         | > Racked up a several-hundred-thousand dollar bill in a couple
         | of hours.
         | 
         | Not doubting you, but curious how you hit such a high figure.
         | Can you walk through the math? Are we talking trillions of
         | <10ms requests?
        
         | mfrye0 wrote:
         | > If you want to talk systemic AWS mistakes you can make, we
         | accidentally created an infinite event loop between two
         | Lambdas. Racked up a several-hundred-thousand dollar bill in a
         | couple of hours.
         | 
         | I did more or less the same thing, but with a 3rd party
         | webhook. The bill almost killed my company.
        
           | heurisko wrote:
           | If you are able to share the story, what went wrong with the
           | webhook?
        
           | cutemonster wrote:
           | > The bill almost killed my company.
           | 
           | You had to pay although it was a mistake?
        
             | vbezhenar wrote:
             | You spent resources. Of course to have to pay.
        
               | Uehreka wrote:
               | The resource usage required to tank a small startup (that
               | could've become a bigger customer later) is probably
               | peanuts to Amazon. I'm not sure how often they do this
               | (or whether they do it at all) but it would make business
               | sense for them to occasionally grant "billing
               | forgiveness" in serious situations.
        
         | FpUser wrote:
         | >"Racked up a several-hundred-thousand dollar bill in a couple
         | of hours."
         | 
         | This is enough to rent big server from Hetzner / OVH for like
         | forever and have person looking after it with plenty of money
         | left.
         | 
         | >"I've experienced how painful companies that tried to run
         | their own infrastructure made things like DB backups"
         | 
         | I run businesses on rented dedicated servers. It had taken me a
         | couple of days to create universal shell script that can create
         | new server from the scratch and / or restore the state from
         | backups / standby. I test this script every once in a while and
         | so far had zero problems. And frankly excluding cases when I
         | want to move stuff to a different server there was not a single
         | time in many years when I had to use it for real recovery.
         | 
         | I did deployments and managed some infrastructure on Azure /
         | AWS for some clients and contrary to your experience I would
         | never touch those with the wooden pole when I have a choice.
         | Way more expensive and actually requires way more attention
         | than dedicated servers.
         | 
         | Sure there a cases when someone need "infinite scalability".
         | Personally I have yet to find a client where my C++ servers
         | deployed on real multicore CPU with plenty of RAM and array of
         | SSD came anywhere close to being strained. Zero problems
         | handling sustained rate of thousands of requests per second on
         | mixed read / write load.
        
           | calmlynarczyk wrote:
           | I'm not saying it can't be done cheaper or more efficiently
           | on simpler providers or even self-hosting, but you need the
           | expertise and time to stand up the foundation of a secure
           | platform yourself then. For example, AWS Secrets Manager is
           | just there and ready to code against, as opposed to standing
           | up a Vault service and working through all of the
           | configuration oddities before you can even start integrating
           | secrets management into an application. If you already have a
           | configuration-in-a-box that you can scale up, then more power
           | to you.
           | 
           | Your use-case of running a web service that is written in a
           | very efficient language like C++ is not something you see too
           | much these days. While it would be nice if most devs could
           | pump out services built on performant tech stacks, our
           | industry isn't doing things that way for a reason. Even high-
           | prestige companies with loads of talented engineers only
           | build select parts of their systems using low-level
           | languages.
        
           | talolard wrote:
           | I think your last paragraph is the sales pitch for AWS.
           | Hiring that level of expertise doesn't scale. Easier and
           | cheaper to hire 10x as many "developers" and pay the AWS bill
           | than headhunt performance gurus that understand hardware and
           | retain them .
        
             | FpUser wrote:
             | What expertise? My specialty is new product design. I am
             | very far from being performance hardware guru. I just
             | understand basics and do not swallow propaganda by loads.
        
             | Dylan16807 wrote:
             | Even if you're right, it's still cheaper to get a dozen
             | dedicated servers than to get a huge pile of AWS servers.
             | 
             | Bad performance means you need more servers, it doesn't
             | mean you need instant scaling.
        
         | rightbyte wrote:
         | AWS is the solution looking for a problem, which happened to be
         | modern web dev practices.
        
           | tomnipotent wrote:
           | Many companies want disaster recovery and multi-region
           | deployments without the capital expenses required to deploy
           | this themselves.
           | 
           | I don't want to have to buy hardware from a vendor, find
           | cabinet space, negotiate peering and power agreements, deal
           | with 3am alerts for failed NICs, or hear about someone
           | spending hours freeing up disk space while waiting on new
           | drives to arrive.
           | 
           | I want the benefit of all these things, but I'd rather pay a
           | premium for it over time than deal with the upfront capital
           | expenses.
        
           | Salgat wrote:
           | The problem is that not everyone wants to self-host, not
           | everyone wants to manage hardware, and not everyone's tech
           | scales in an extremely predictable and easy way. We launched
           | a new tenant that required a bunch of new EC2s, databases,
           | etc. Was trivial with AWS with terraform. If we did our own
           | homegrown solution we would have had to have that hardware
           | either ordered and waited on or have that hardware ready in
           | reserve just burning cash doing nothing.
        
           | Daishiman wrote:
           | Things AWS solves for me that I've always wanted to have
           | solved:
           | 
           | * Database administration
           | 
           | * Security best practices by default
           | 
           | * Updated infrastructure
           | 
           | * Automatic load balancing
           | 
           | * Trivial credentials management
           | 
           | * 2FA for all infra administration
           | 
           | * Container image repositories
           | 
           | * Distributed file systems
           | 
           | I was and old-school bare-metal UNIX systems admin 15 years
           | ago. Each of those things, in medium to large companies,
           | would take a full-time sysadmin to keep it all up to date.
        
         | billisonline wrote:
         | > we accidentally created an infinite event loop between two
         | Lambdas. Racked up a several-hundred-thousand dollar bill in a
         | couple of hours
         | 
         | May I ask how you dealt with this? Were you able to explain it
         | to Amazon support and get some of these charges forgiven? Also,
         | how would you recommend monitoring for this type of issue with
         | Lambda?
         | 
         | Btw, this reminds me a lot of one of my own early career screw-
         | ups, where I had a batch job uploading images that was set up
         | with unlimited retries. It failed halfway through, and the
         | unlimited retries caused it to upload the same three images
         | 100,000 times each. We emailed Cloudinary, the image CDN we
         | were using, and they graciously forgave the costs we had
         | incurred for my mistake.
        
           | calmlynarczyk wrote:
           | > May I ask how you dealt with this? Were you able to explain
           | it to Amazon support and get some of these charges forgiven?
           | Also, how would you recommend monitoring for this type of
           | issue with Lambda?
           | 
           | AWS support caught it before we did, so they did something on
           | their end to throttle the Lambda invocations. We asked for
           | billing forgiveness from them; last I heard that negotiation
           | was still ongoing over a year after it occurred.
           | 
           | Part of the problem was we had temporarily disabled our
           | billing alarms at the time for some reason, which caused our
           | team to miss this spike. We've enabled alerts on both billing
           | and Lambda invocation counts to see if either go outside of
           | normal thresholds. It still doesn't hard-stop this from
           | occurring again, but we at least get proactively notified
           | about it before it gets as bad as it did. I don't think we've
           | ever found a solution to cut off resource usage if something
           | like this is detected.
        
           | BackBlast wrote:
           | We use memory safe languages, type safe languages. AWS is not
           | fundamentally billing safe.
           | 
           | Just to give you nightmares. There's been DDoS in the news
           | lately, I'm surprised nobody has yet leveraged those bot nets
           | to bankrupt orgs they don't like who use cloud autoscaling
           | services.
           | 
           | I don't know how you monitor it, part of the issue is the
           | sheer complexity. How do you know what to monitor? The
           | billing page is probably the place to start - but it is too
           | slow for many of these events.
           | 
           | I guess you could start with the common problems. Keep
           | watchdogs on the number of lambdas being evoked, or any
           | resource you spin up or that has autoscaling utilization.
           | Egress bandwidth is definitely another I'd watch.
           | 
           | Dunno, just seems to me you'd need to watch every metric and
           | report any spikes to someone who can eyeball the system.
           | 
           | For me? I limit my exposure to AWS as much as I reasonably
           | can. The possibilities combined with the known nightmare
           | scenarios, with a "recourse" that isn't always effective
           | doesn't make for good sleep at night.
        
             | rileymat2 wrote:
             | > There's been DDoS in the news lately, I'm surprised
             | nobody has yet leveraged those bot nets to bankrupt orgs
             | they don't like who use cloud autoscaling services.
             | 
             | That's interesting because I seems like it would happen,
             | but what is in it for the attacker, whrn under threat they
             | can implement caps?
        
               | sjtindell wrote:
               | Could only be an attack of spite, can't really hold a
               | ransom because the IPs of malicious traffic could be
               | blocked or limits set after initial overspend. Perhaps if
               | the botnet was big enough.
        
               | BackBlast wrote:
               | A severe enough bill can cause an organization to be
               | instantly bankrupt. No opportunity to try to do something
               | like caps.
               | 
               | Regardless, turning on spending caps isn't a final
               | solution to this particular attack. With caps the
               | site/resources will hit the cap and go offline.
               | Accomplishing what a DDoS generally tries to accomplish
               | anyway.
               | 
               | The only real solution is that you have to have a cheap
               | way to filter out the attacking requests.
        
               | ivanhoe wrote:
               | Some people get paid to destroy competition, others just
               | enjoy watching the world burn...
        
       | Kiro wrote:
       | Slightly OT: I love Forge but recently I've started using it for
       | my non-PHP projects which feels... wrong. Are there any similar
       | services that are more agnostic?
        
       | jcims wrote:
       | I feel like large enterprises primarily see AWS as a way to
       | outsource capital expenses.
        
         | StopHammoTime wrote:
         | This is literally the main reason a lot of companies use AWS.
         | In Australia, it is very hard for Government Departments to get
         | capital expenses approved for infrastructure as it requires a
         | lot of rigmarole.
         | 
         | However, once you're in AWS its OpEx, who cares as long as you
         | don't break the budget too soon before EOFY.
        
         | jnieminen wrote:
         | AWS and Azure are a permission to spend.
        
           | lloydatkinson wrote:
           | Are they really though? A serverless event driven
           | architecture system I'm working on literally costs less than
           | PS10 a month on Azure. Running full blown VMs instead of
           | cheaper more appropriate technologies like containers or
           | functions will always cost more.
        
             | TriNetra wrote:
             | As per our calculation for CloudAlarm [0], as we reach a
             | few hundred users, it'd be cheaper to use a dedicated
             | instance than serverless (Azure Functions) design. So it
             | may vary from system to system depending the amount of work
             | you perform for each user.
             | 
             | 0: https://cloudalarm.in/ - btw, you may wish to have daily
             | budgeted pace based alerts using it - to inform you when
             | the usage spikes up (much faster than Azure's consumption
             | threshold based alerts).
        
         | thanatos519 wrote:
         | So basically this ... <<Ah, I see you have the machine that
         | goes ping. This is my favorite. You see we lease it back from
         | the company we sold it to and that way it comes under the
         | monthly current budget and not the capital account.>>
         | 
         | https://www.youtube.com/watch?v=tKodtNFpzBA
        
       | igammarays wrote:
       | AWS is complexity-as-a-service. This is why, as a one-man
       | company, I went baremetal[1]. One flat price, screaming fast
       | performance, and massive scalability if you get a beefy enough
       | machine[2]. I don't have time to fiddle with k8s, try to figure
       | out AWS billing/performance tradeoffs, or deal with untraceable
       | performance issues due to noisy neighbours and VM overhead. My
       | disaster recovery plan is a simple DB dump script to S3, and I
       | know I can get another baremetal server up and running in less
       | than 20 minutes.
       | 
       | [1] with IBM Cloud 1 year free startup credits
       | 
       | [2] Let's Encrypt and StackOverflow run their entire databases on
       | a single beefy baremetal machine.
       | https://letsencrypt.org/2021/01/21/next-gen-database-servers...
        
         | StreamBright wrote:
         | What is blocking you from using just EC2?
        
           | count wrote:
           | A year of free startup credits, I'd guess.
        
         | ufmace wrote:
         | I tend to agree with this. AWS etc is nice if your scale is big
         | enough that you need to run a big cloud of dozens of servers
         | with complex interconnections for security etc. If a single
         | plain old server with database etc on it will do the job fine,
         | much better to stick with that.
        
           | Daishiman wrote:
           | No?
           | 
           | A bare metal Postgres install needs optimization, and a
           | _working_ backup and restore plan (you did test your backups,
           | right?).
           | 
           | That's half a day of work lost to get your system set up.
           | 
           | Now your app keeps serious data and you want a read replica.
           | How long does that take?
           | 
           | Now you need a separate development environment. Here you go
           | again, adding a few hours of work.
           | 
           | Then you need to update your database version. Gotta read the
           | changelog and make sure you did everything right, and do it
           | in a reaonsable change window.
           | 
           | You just racked up several day's worth of work, and for a DB
           | instance with a similar amount of infra work done, the RDS
           | solution is way cheaper and easier to provision.
           | 
           | If your time is worth money, there's no reason to go bare
           | metal.
        
             | ufmace wrote:
             | I still disagree and say yes.
             | 
             | Why does my bare metal Postgres install need optimization?
             | My sites mostly doesn't get much traffic, and it runs fine
             | as-is. It'd be silly to try and optimize it without being
             | able to measure what's actually slow.
             | 
             | Backup systems should also be set up according to desired
             | reliability. I have a 10-line bash script that pulls a DB
             | dump, zips it, and sends it to S3. Under 5 minutes to
             | install, including setting up a new AWS role and keypair
             | for it, just have to add in some Ansible commands I already
             | have set up, and set a cron job to run once a day.
             | 
             | Read replicas are nice for some applications, but not
             | needed for any of my current ones. I probably wouldn't want
             | to set one up on bare-metal admittedly, but I'm not
             | worrying about it until I need it.
             | 
             | I don't see a need for a separate cloud deployment for a
             | development environment for my current application either.
             | Would be nice if I had multiple developers and testers
             | working on it, but I don't now.
             | 
             | Never needed to update the DB version, and the traffic is
             | low enough that I don't need to really care about keeping
             | reasonable change windows if I did.
             | 
             | So nope, 10 minutes of work for a low-traffic application.
             | Meanwhile, a AWS RDS setup is easy to start, but then you
             | have to muck with security groups, VPCs, permissions, etc
             | to get it working right. That's not necessarily easy if you
             | don't already make use of that stuff.
        
         | tomerbd wrote:
         | Which scripting or which infra do you use for automatic
         | installation/configuration of your server?
        
           | chrisandchris wrote:
           | Not OP, but I did the same and I use
           | 
           | - Ansible for the low-level stuff (like network, mounts,
           | iSCSI, configuration files) - Terraform for high-level stuff
           | (like DB users)
           | 
           | In my case, as I have several services that use a lot of RAM
           | running, I couldn't afford The Cloud but can easily afford a
           | colocation. I don't mind the maintenance (it's a couple hours
           | each month) and I don't care much if services are down a few
           | hours.
           | 
           | If you need something running 24/7 with 99.9%, colocation
           | will be more expensive just because of the human you need.
        
             | candiddevmike wrote:
             | Why wouldn't you use Ansible for the high level stuff? It
             | can easily manage DBs and you wouldn't need another tool.
        
               | chrisandchris wrote:
               | As nijave said, the declarative style is the difference.
               | I can read a terraform file and already know the exact
               | state my system will have.
        
               | nijave wrote:
               | Not sure about op but you have to put forth quite a bit
               | of effort to get declarative infra with Ansible. Some of
               | it is declarative out of the box but a lot is imperative.
               | 
               | The main difference, if I revoke a DB privilege, I have
               | to add a line to Ansible with a REVOKE in most cases
               | versus Terraform you just delete the config line and the
               | tool realizes during its diff stage and performs the
               | removal change (it's stateful and declarative)
        
           | igammarays wrote:
           | Laravel Forge
        
         | FpUser wrote:
         | On bare metal as well. Not a trace of doubt.
        
         | icecap12 wrote:
         | The comment on "complexity-as-a-service" resonates. IMHO, it's
         | primarily because they want to make a product out of
         | everything, including stuff companies build to manage their own
         | AWS implementations. Instead of a simple list of products, its
         | a complex list, with lots of nuances per each service offering.
         | The other day, I was giving a high level summary of cloud
         | technology to an intern; there was a point where I couldn't
         | even find the AWS service I was telling her about from the
         | product list, which annoyed me. Maybe that's more a comment
         | about the marketing site though, but still, when your product
         | catalog gets that big, its hard to avoid ridiculous levels of
         | complexity.
        
         | arbuge wrote:
         | From that Let's Encrypt article: "We have a number of replicas
         | of the database active at any given time, and we direct some
         | read operations to replica database servers to reduce load on
         | the primary."
        
         | shreddit wrote:
         | Their config costs around 230,000$, which i think is impressive
         | for a single server
        
         | pibefision wrote:
         | +1 also it's easy to use Docker containers and Traeffik as
         | reverse proxy to manage many services.
        
       | lysecret wrote:
       | Ok im going to admit to a mistake revolving around NAT gateways
       | and Lambdas. So, i basically wanted to connect a Lambda to a
       | Postgres / RDS database, for that I had to put into a private
       | VPC, but the lambdas still had to talk to the world (a lot) so i
       | just put a nat gateway around it no biggy. Well, end of the story
       | on one day i produced 2000 Euro in cost for the Nat gateway haha
        
       | nickjj wrote:
       | My favorite billing mistake was forgetting to delete an unused
       | elastic IP address and then realizing I was being charged $34 /
       | month for 2 months just to have it exist while doing nothing.
       | 
       | Edit: It's exactly $33.62 and I was mistaken on what caused it.
       | It came from having a NAT Gateway just idling which is $0.045 per
       | hour x 747 hours = $33.62 on us-east-1.
       | 
       | I know it's not the biggest mistake ever, but these things creep
       | up on you when you use CloudFormation and it continuously fails
       | to delete resources so you're left having to manually trace
       | through a bunch of resources. It's easy to leave things hanging.
        
         | jrochkind1 wrote:
         | unused Elastic IP pricing looks to me like $3.60/month on their
         | pricing page. ($0.005 per hour). What am I missing to get to
         | $34/month? (Or did you have 10 of em?)
         | 
         | https://aws.amazon.com/ec2/pricing/on-demand/#Elastic_IP_Add...
        
           | nickjj wrote:
           | Thanks, I edited my post to correct it. It was a single NAT
           | gateway that's $33.62 / month.
        
       | jbverschoor wrote:
       | Most common made mistake: assuming that your data is safe on an
       | EC2 instance (ephemeral storage)
        
       | arno1 wrote:
       | Discover Akash Network!
       | 
       | Censorship-resistant, permissionless, and self-sovereign, Akash
       | Network is the world's first open source cloud.
       | 
       | It's at the early stages, the amount of deployments is steadily
       | growing!
       | 
       | Soon GPU compute and persistent storage!
       | 
       | As well as you can already become a provider and earn AKT tokens
       | (which are neat, driven by the Cosmos based blockchain)
       | 
       | https://akash.network
       | 
       | https://akashlytics.com/price-compare
        
       | daneel_w wrote:
       | _" Technically they are a smidgen slower than Intel for certain
       | workloads."_
       | 
       | In my experience, after migrating several servers with quite
       | varying workloads, they're _faster_ than Intel - and more than a
       | smidgen. Just as is the general case with current AMD Ryzen vs
       | Intel.
        
       | sebazzz wrote:
       | In summary: Either overprovisioning, or not realising every extra
       | CPU cycle or I/O operation costs extra money.
       | 
       | This is, of course, the real way "the cloud" makes money.
       | Carefully tuned, it can no doubt be cheaper than do-it-yourself,
       | however, it is also quite easy to make a lot of costs.
        
         | gizdan wrote:
         | Contrary to popular believe, the case for going to the Cloud
         | isn't cost saving, it is flexibility and value for money. It'll
         | likely cost you around the same if you run it in a DC, but you
         | won't have features like auto scaling, increased security, and
         | much more.
        
           | luckylion wrote:
           | About the same? Last I checked for our somewhat static work
           | load on a bunch of webservers, AWS would be x10 in pricing.
           | Not to mention that you need someone who has deep AWS
           | knowledge and experience to manage your system, just like you
           | need someone who manages your dedicated servers in a DC.
           | 
           | It's great for workloads that fluctuate extremely, or require
           | massive scaling in very short time. Not sure about the
           | increased security. If you run your images on EC2, it's still
           | up to you to not mess up the config.
        
             | jimmaswell wrote:
             | How many workloads actually fluctuate so extremely and
             | unpredictably?
        
             | sokoloff wrote:
             | Lightsail is more reasonably priced for a lot of simple web
             | serving use cases.
        
               | fermentation wrote:
               | It's also super easy to use. I have an instance hosting a
               | game server for friends. I might be wasting money since
               | the server sits idle about half the time though.
        
             | maccard wrote:
             | We're actively planning an aws workload right now, and with
             | reserved instances for the baseline workload, the pricing
             | is closer to 1.5-2x, but the cost savings of only needing
             | to scale up for a couple of hours per week make up for
             | that. Yes it would be cheaper to run out own infra for the
             | baseload and burst into aws, but that adds operational load
             | onto the development team, which defeats the purpose of
             | going with AWS in the first place
        
           | Viliam1234 wrote:
           | From the perspective of a developer, flexibility is a double-
           | edged weapon.
           | 
           | Before cloud: we have database quota of a few gigabytes, and
           | once in a few years we need to justify to management why the
           | quota should be doubled.
           | 
           | After cloud: whenever we add a new table, or a new column, or
           | import lots of data, the invoice slightly increases, the
           | management notices, and we need to justify the extra
           | megabytes.
        
       | steveBK123 wrote:
       | On billing.. they will never do it, but on smaller accounts they
       | could build trust by offering some sort of "prepaid" mode like
       | cell phone services do at the low end.
       | 
       | That is - you deposit $X in your account, and AWS nukes your live
       | services if you breach it. The worst that ever happens is you are
       | out sunk cost of the $X you had already deposited.
        
       | noir_lord wrote:
       | I nearly made myself a very nice footgun not long since.
       | 
       | So MediaConvert (video transcoding), direct s3 upload to s3
       | bucket, bucket fires event to my application, my application
       | builds the job and submits it to media convert with the output
       | bucket as the destination.
       | 
       | Straight forward enough, unless you happen to be copying a config
       | tired and put your input/output buckets as the same bucket...
       | 
       | Fortunately previous-me was paranoid enough to have put in an if
       | check and die if they where the same but otherwise that could
       | have cost a lot of money.
        
         | swyx wrote:
         | why would MediaConvert not build that if check in? perhaps a
         | good feature request for them.
        
           | noir_lord wrote:
           | Because you can write back to the same bucket at different
           | prefixes if you want to.
           | 
           | It's simply simpler to split the buckets in my case.
           | 
           | I added further checks to not only check the bucket made
           | sense but also that the inbound and outbound had the correct
           | prefixes.
           | 
           | So if another person does the same it'll catch both ways.
        
       ___________________________________________________________________
       (page generated 2021-09-11 23:00 UTC)