[HN Gopher] Mistakes I've Made in AWS
___________________________________________________________________
Mistakes I've Made in AWS
Author : aoms
Score : 306 points
Date : 2021-09-11 08:12 UTC (14 hours ago)
(HTM) web link (laravel-news.com)
(TXT) w3m dump (laravel-news.com)
| defaultname wrote:
| On a price sensitive project I almost exclusively used spot
| instances at a _dramatically_ reduced price over on-demand. It
| forced me to built high availability elements into the design at
| the outside, though ultimately spot instances got shut down no
| less frequently than my experience with on demand maintenance and
| individual machine outages.
|
| Obviously mileage will vary, but going in I was under the
| impression that spot instances were on the knife's edge, when
| with a decent pricing strategy they're as robust as on demand at
| a fraction of the cost.
| doomslice wrote:
| We use GCPs equivalent of spot instances (preemptibles) to
| great effect as well. It actually works better at larger scale
| since a smaller % of your machines get preempted at a given
| time.
| noogle wrote:
| Spot instances for GPU are shutdown within hours. As frequently
| touted in favor of AWS, engineer time is the most important
| thing. The time to adapt the code to frequent failure, and the
| delays in getting the results, costs money as well, negating
| the financial saving from spot instances.
| defaultname wrote:
| Designing to remain robust in the face of failures is
| compulsory for any project of any significance. Or at least
| it should be, though a lot of projects go on a wing and a
| prayer that nothing will go awry and "save" those engineering
| hours until a catastrophe at some future point. It basically
| just prioritized what already should be a priority.
|
| I have no doubt that fringe/niche instances have more
| competitive spot behaviors, though how you set your bid range
| dramatically impacts how you survive through competition, but
| I had vanilla instances last for literal _years_ (note that
| by default the spot requisition has a lifespan of one year so
| you have to modify that) at per hour pricing somewhere in the
| range of 1 /5th on demand.
|
| But mileage will vary.
|
| I don't use those spot instances anymore as my projects are
| much better financed now, and I have significant compute on
| other platforms including bare metal in colocation
| facilities. However when I did I stayed silent about it,
| feeling almost like it was a secret that would be ruined if
| others knew about it.
| noogle wrote:
| There it is - the hidden cost of AWS. For bare-metal the
| risk of hardware failure is so low that it's faster to just
| handle the interruption when it happens (e.g. just restart
| the process) than to implement interruption tolerance. The
| hardware fails only once in many years. The chances of that
| happening during the 24 hours we train a model are almost
| zero. On a spot instances, the risk of the same are almost
| 100%, requiring investment up-front.
|
| For the price of a spot instance we can get an always-on
| bare-metal server without having to worry for how long it
| will remain available.
| danjac wrote:
| I've made it a habit to absolutely avoid any and all AWS services
| for any side projects, unless it's on the employer's dime. I'd
| rather pay a bit more per month for a flat-fee Digital Ocean
| droplet. Maybe I'll end up paying a few dollars more than I would
| with the equivalent AWS setup, but I'll rest easy knowing I won't
| get a surprise bill thanks to the opaque and byzantine billing. I
| mean, there are consultancies whose entire premise is expertise
| on AWS billing, so the chance of AWS newbie-me running up many
| thousands because I forgot to switch off service A or had the
| wrong setting for service B is non-zero.
|
| And the general advice is "don't worry, call their customer
| support and they'll refund you". Um, seriously? If I want to
| spend a morning on hold to deal with a huge unplanned bill I'll
| call my local tax office, thank you.
|
| Which sucks as I learn best by building things in my spare time,
| but AWS makes that learning process a bit more stressful than I'd
| prefer.
| tsss wrote:
| Who told you to call their customer support for a refund? AWS
| (and other cloud vendors) practically never refund. They will
| give out credits for their platform but that won't help you
| much as a private individual who just lost hundreds of dollars.
| nucleardog wrote:
| > Who told you to call their customer support for a refund?
| AWS (and other cloud vendors) practically never refund.
|
| I've gotten refunds about a half a dozen times now. Every
| time I've asked. One was for over a hundred thousand dollars.
|
| I've never paid for support, I have no contacts within the
| company (like are often necessary at places like Google), I
| literally just put in a support ticket asking for a refund
| and got a refund.
|
| They usually require some documentation/explanation of how
| you're going to avoid making the same mistake again (which is
| fair), but otherwise have been very cooperative.
| ratww wrote:
| That's kind of a meme in HN and Reddit: there were a few
| public occasions where users were refunded and people now
| just assume AWS will also refund for every instance.
| scrollaway wrote:
| I've never had a refund request rejected myself, and I've
| made multiple mistakes over multiple accounts. Even things
| such as "Hey, I forgot to turn off this ec2, i wanted to
| destroy it, any chance for a refund?"
| sofixa wrote:
| They _always_ refund mistakes, of private individuals or
| companies.
| isbvhodnvemrwvn wrote:
| The first ones anyway. After that, not really.
| mathnmusic wrote:
| I was recently forced to migrate my hobby FOSS project the
| other way: from DigitalOcean to AWS. The primary reason being:
| a generous quota of 60,000 emails per month to send via SES.
| Most mail providers give only up to 3,000 to 6,000 emails per
| month.
| lamnk wrote:
| You can host your project on DO and connect to SES to send
| emails. Why do you have to move the complte project over?
| mhitza wrote:
| Those 60k free emails only apply when SES is invoked from
| an EC2 instance or Elastic Beanstalk
| basmango wrote:
| I think they mean having a seperate small service just
| for mailing on beanstalk.
| BackBlast wrote:
| You can use ses without moving your digital ocean server.
| mhitza wrote:
| From the AWS SES free tier fine print
|
| > 62,000 Outbound Messages per month to any recipient when
| you call Amazon SES from an Amazon EC2 instance directly or
| through AWS Elastic Beanstalk.
| TriNetra wrote:
| wow, they are really tracking from where you're calling
| the API to give credit, first of its kind I've heard :)
| BackBlast wrote:
| Good eye.
| mattmanser wrote:
| In terms for bang for your buck, DO is much cheaper.
|
| Yeah you can get a cheaper AWS server, but it's a much lower
| performance one.
| marcosdumay wrote:
| From what I can see, it's not a matter of bang for your buck,
| what matters is that AWS scales lower than a DO, so if your
| are not fully using your VPS, AWS is cheaper.
|
| Of course, I side with the GP here, it's just not worth the
| risk. I could save a bit by switching my VPS too, but I
| won't.
| [deleted]
| bsd44 wrote:
| I would avoid DO and similar McDonald's-type cloud providers
| for anything production. Commercial or private.
|
| AWS might make it difficult to figure out the cost (most common
| complaint) but the services are professional grade and their
| support is as well. DO on the other hand provided me an
| instance with an IP that was on a public blacklist and banned
| my account within 5min of spawning an instance with the
| explanation that "it was compromised and hacking" failing to
| accept that they provided me with the OS image and the public
| IP. Took me two months of arguing to get the account unblocked
| and balance withdrawn. Lesson learned; you get what you pay
| for. Back to AWS for me.
| tomxor wrote:
| Pretty much summarises my decision to use Linode, at a small
| company AWS presents a bigger monetary risk and drain on
| precious developer time and mental overhead than relatively
| small savings it might return at smaller scales...
|
| I also actually like Linode as a company and enjoy using their
| services and management interface; Amazon is challenging to be
| positive about.
| wly_cdgr wrote:
| Also use Linode, they're great. Their docs are a treasure
|
| Seems absolutely insane to use AWS for small
| personal/learning projects (unless the goal is to learn AWS
| for career purposes, I guess). It'd be like using Unreal
| Engine to make your 2d indie game
|
| Always use the smallest and simplest solution that'll do the
| job. Simple solutions are not just as good for simple
| jobs...they're better
| remram wrote:
| > It'd be like using Unreal Engine to make your 2d indie
| game
|
| Except that would be free.
| wly_cdgr wrote:
| Heh, true. So, even worse!
| VadimPR wrote:
| You can set a global budget cap to avoid this kind of concern.
| That said, I've also blown the budget cap by accident - so
| agree with you on the DO.
| triska wrote:
| What good is a budget cap that can be "blown by accident"?
| From a budget cap, I expect the key invariant that it
| reliably _caps the budget_.
| id5j1ynz wrote:
| > I mean, there are consultancies whose entire premise is
| expertise on AWS billing, so the chance of AWS newbie-me
| running up many thousands because I forgot to switch off
| service A or had the wrong setting for service B is non-zero.
|
| That line of reasoning is wrong. I'm sure there are
| consultancies that specialize in office stationery procurement;
| doesn't mean anything for your small use case of buying a few
| pens for your home office.
| imadethis wrote:
| There's no chance I accidentally buy $10,000 worth of
| staplers when I walk into Office Depot though, while the
| opposite is extremely easy in AWS. Plus, when I checkout I'll
| get an itemized total of what I owe before I pay, I won't be
| charged an unclear amount at a future date.
| [deleted]
| [deleted]
| mrtksn wrote:
| That must be some kind of "trick of the trade" because Firebase
| originally had feature for limiting the bill with a hard cap
| and they even have videos explaining how to use it however it
| was removed later on. Now they suggest building a script that
| monitors your bill and nukes the project if something happens.
| The catch? Billing is not real time.
| sokoloff wrote:
| IMO, AWS isn't competing on cheaper-in-dollars-per-byte but
| rather in faster and cheaper for your engineering team. If your
| engineering team is free (as you might decide your time is on a
| hobby/side project), it's harder to make the case (I still run
| my side projects there though), but when they can make the ops
| team half wearing AWS badges, that offsets a lot of lone-item
| markups.
| noogle wrote:
| But it doesn't save much engineering time. You now need to
| manage those services, and the high mark-up means you need to
| invest effort in scaling things up/down.
|
| Yes, it took me about a week to learn to set-up a Postgresql
| high availability cluster. But now it saves me $4,000 per
| MONTH for each of our 10 databases.
|
| And if you are using EC2 instances, AWS saves very little
| effort compared to bare metal.
| Daishiman wrote:
| So did you send your database logs to a centralized logging
| system? Did you set up roles and keys to access those
| systems? Are your roles integrated with the rest of your
| permissions system? Do you have a perf dashboard where you
| can see the real-time usage of your DBs? Have you already
| rehearsed updating your database version?
|
| The thing is that it's totally possible that you learned
| everything needed to set up the cluster, in my experience
| most database systems that aren't set up by a professional
| DBA will sooner or later hit a configuration or maintenance
| snag. Once that happens, you're pretty much totally on your
| own and for critical systems that downtime is going to cost
| you more than the costs of your infra.
|
| So you either need to have a DBA on retainer if you're
| serious about data integrity, or you pay management costs
| which means your system was set by literally the most
| expert people on the planet in the area.
|
| If you're running a cluster where performance is the
| highest priority and downtime and maintenance isn't a huge
| issue because you have a nice decent maintenance window and
| enough dev cycles to spend on staying up to date, for sure,
| go for it.
|
| But in my experience, if you care more about a system
| staying up, good managed infra is so much more reliable
| that it's not even a question.
| rualca wrote:
| > If your engineering team is free (..), it's harder to make
| the case (...), but when they can make the ops team half
| wearing AWS badges, that offsets a lot of lone-item markups.
|
| I have to call bullshit on this claim.
|
| Let's look at the facts. With AWS there are only two
| scenarios: either you go with the classic "VMs provided by a
| cloud provider" which is represented by EC2, or you go with
| hosted services and higher level abstractions like AWS's
| serverless offerings.
|
| Regarding the EC2, AWS offers absolutely no operational
| advantage over any other cloud provider, at the expense of
| being far more expensive. Also, CloudFormation/CDK is
| arguably far worse and outright developer-hostile than any
| configuration-as-code alternative. This comparison makes even
| less sense if we look into AWS' containerization offering,
| which is either half-baked (ECS) or an afterthought that lags
| behind alternatives (EKS).
|
| Then we have the higher level abstractions of AWS' managed
| services and serverless options. Price-gouging runs rampant
| on this domain, and arguably demands much more training and
| man-hours to become effective at running production services
| when compared with just running your own services. This
| scenario entails higher costs and the only arguments that any
| ops team can muster revolve around sunk cost and vendor lock-
| in.
| sokoloff wrote:
| The price gouging services make sense if they avoid you
| having to hire additional employees. That's the benefit of
| any managed service provider: they can run it more
| efficiently than you can (once you add all the people,
| supervisors of those people, people to cover when those
| people are on vacation, etc).
|
| It's a way to shift people costs to IT operational
| expenses. If you don't do that, it's more expensive. If you
| do, it can easily be less expensive. I'm pretty sure we're
| at the point where it's less expensive because developers
| are waiting minutes rather than weeks [or more] for TechOps
| actions to happen (we were on-prem previously [and I ran
| TechOps]). That saves time and changes the way you think
| about TechOps changes. If they're lengthy, you make choices
| that avoid changes in TechOps. If they're fast, you make
| choices that make the most sense for the product and
| customer.
| rualca wrote:
| > The price gouging services make sense if they avoid you
| having to hire additional employees.
|
| But the fact is that it doesn't. It's another service
| that needs training/experience to develop and operate.
| Arguing about these hypothetical savings is just a veiled
| appeal to the sunk cost fallacy and vendor lock-in I've
| mentioned.
|
| > I'm pretty sure we're at the point where it's less
| expensive because developers are waiting minutes rather
| than weeks [or more] for TechOps actions to happen (we
| were on-prem previously [and I ran TechOps]).
|
| I'm not sure this scenario is remotely realistic for the
| past decade or so, specially after the inception of
| containerization. Even in bare metal deployments anyone
| can get multiple databases configured and going in a
| matter of minutes.
| sokoloff wrote:
| Containers don't get you more host machines racked or
| more disk shelves added to the SAN. On-prem, it solves
| configuration within a (nearly) fixed scale which, if
| that's your only problem, is great. If you're in your own
| DC/colo, there's more advantages to moving out than
| containers can provide alone.
|
| I literally can't afford to invest to the level that AWS
| can to run operations. AWS bandwidth is incredibly
| expensive right up until the point where you or a
| neighbor is getting a DDoS attack that Amazon just
| "handles" for you. My customers don't care where we're
| hosted and won't pay extra for either on-prem or cloud-
| hosted. They just want it to be up and transparent. For
| us, AWS is cheaper/faster all-in. That's not true for
| everyone and, if it's not, please don't use it.
| rualca wrote:
| > Containers don't get you more host machines racked or
| more disk shelves added to the SAN.
|
| That's immaterial to the discussion, and reads like a
| non-sequitur. You want a service. You deploy the service
| in your infrastructure. If necessary, you scale your
| infrastructure to meet demand. That's it. If you want to
| spin up a database instance, just do it. With
| containerization that takes between minutes and seconds.
|
| And to drive the point home, in case you're not aware,
| AWS is not the only cloud provider that offers horizontal
| autoscaling. Some small providers even sell it out of the
| box, both through their Kubernetes offerings and/or
| through their own APIs.
|
| Also, the sales brochure for managed services mentions
| scalability and reliability, and in the case of AWS also
| global deployments, but the truth of the matter is that
| it costs a hefty premium and in most cases it's totally
| irrelevant.
|
| So, pointing out databases in practical terms means close
| to nothing.
|
| > I literally can't afford to invest to the level that
| AWS can to run operations.
|
| And that's perfectly fine because a) AWS really is not
| full proof (see the latest outage of AWS's US-WEST-2
| region which might have single-handedly dropped AWS's
| reliability to only 99.5 this year), b) operating your
| own infra already gets you plenty of 9s easily, c) the
| theoretical difference in the 9s you get and the 9s that
| are advertised by AWS is more often than not totally
| irrelevant to the usecases you need to meet.
|
| To sum it up, you may argue all you want about how AWS's
| Rolls Royce is far superior than any car in the market,
| but the truth of the matter is that the vast majority has
| all their needs decisively met and even surpassed by
| running any other cloud provider's Ford hatchback.
| ratww wrote:
| _> faster and cheaper for your engineering team._
|
| I really wonder how true that is. Sure, for things like S3 or
| RDS it's indeed easier, but for most other things I find AWS
| either very limiting or extremely arcane.
|
| Even "simple" things like Lambda underdeliver, just this week
| we run into problems using it with VPC, for example.
| ElasticBeanstalk was another one, fine-ish for simplistic
| things but problematic with the smallest customization, also
| lots of undocumented and undebuggable quirks, like breaking
| if you use UTF-8 characters in your commit messages, for
| example.
|
| Of course, we now have the problem where some people, both
| seniors and juniors, _only_ know or only ever worked with
| AWS, which makes the assertion that it is "faster and
| cheaper" correct, but is worrisome, as lots of people are not
| being taught what used to be the basics 10 or 20 years ago.
| zrail wrote:
| Curious what your issue was with Lambda and VPCs. We use
| that combo all the time where I work and it's fine as long
| as you have the IAM roles correct.
| selfhoster11 wrote:
| Lambda had some gotchas around the warm-up time the last
| time I used it to implement something. We had to have
| some extra workarounds to prevent the functions from
| going "cold".
| Ancapistani wrote:
| Ah, that's the issue, at least for me - IAM roles are
| pretty nuanced, and it's difficult to understand all
| that's happening.
|
| I'm working extensively with EventBridge now, and their
| "security" docs mix "what can access EventBridge" with
| "what EventBridge can access". Also, different AWS
| services all seem to have different requirements - e.g,
| some are role-based, some are service-based, and some are
| resource-based. It gets complicated very quickly.
| ratww wrote:
| Don't get me wrong, I'm a Lambda fan. But unlike
| EC2/EBS/etc, talking to some AWS services from a Lambda
| that's inside a VPC requires additional infra and you
| have to pay for the egress. In the end it just wasn't
| worth the price for us. It was a bad surprise money-wise.
| thinkharderdev wrote:
| I recently switched jobs. At my old company the dev team
| basically had carte blanche to setup their own AWS infra
| without any real restrictions (there were some but very
| few). It was nice, I almost never had to ask anybody
| outside my team to do anything to unblock us (at least from
| an infra standpoint).
|
| In my new job we also use AWS for everything BUT I haven't
| used the AWS cli once. I don't even have credentials.
| Basically, their is a platform team which is responsible
| for running a k8s cluster (or clusters really) and some
| other common infrastructure which is all running on bare
| EC2 (no RDS, no EKS, we do use S3 for blob storage but
| that's about it). And I have to say it is pretty amazing.
| No more dealing with arcane IAM rules, or trying to figure
| out how to string together a chain of lambdas to do some
| sort of complex orchestration task).
|
| I've really come to appreciate the model of using k8s as
| your "cloud platform." Having a dedicated team that manages
| the k8s clusters and makes sure they are elastically
| scalable and reliable. Everyone else is just deploying
| stuff to kubernetes. They could decide to move everything
| to a colo tomorrow and I would have to change exactly
| nothing about how I do my job.
| Galanwe wrote:
| I don't get why you're so stressed out by AWS billing.
|
| From my experience, once you've worked a bit seriously with
| AWS, billing is not a blackbox anymore and you're able to plan
| ahead without too much surprise.
|
| If you're still worried, there's also the option of settings
| alerts on budget spent and forecast of budget, which should
| settle the debate. (these are also part of the API, so you can
| deploy and configure these alerts through terraform)
| GordonS wrote:
| A bad actor could hammer any publicity available services,
| and you could be hit with an enormous egress bandwidth bill.
|
| Spend alerts can _help_ with this, but spend is only
| calculated every 24h or so, so it 's far from a panacea.
| belter wrote:
| You can mitigate that, for enterprise projects, using AWS
| Shield Advanced as it comes with DDoS cost protection:
|
| https://aws.amazon.com/shield/features/
|
| I say for enterprise projects because although it's cost is
| reasonable for corporate projects, not probably something
| you can justify for most personal/private deployments.
| gonzo41 wrote:
| Not if you're doing simple and sensible things like using
| Cloud Front, and setting limits with API Gateway.
| arriu wrote:
| What if you're using something like grpc?
| triska wrote:
| This is also one of the things I fear most when running a
| service in the cloud: A huge bill due to excessive network
| usage triggered for example by a search engine, web scraper
| etc. I consider it very unfortunate that "capped cost" has
| gone somewhat out of fashion, and nowadays many major cloud
| providers bill excess usage rather than cutting off or
| slowing down traffic etc.
|
| Here is a simple Bash script that monitors outgoing eth0
| traffic (once per second) and automatically shuts down the
| instance once it is greater than 1 TB:
| #!/bin/bash # shut down instance if outgoing
| traffic > 1 TB # 1 MB limit=$((10**6))
| # 10 MB limit=$((10**7)) # 1 GB
| limit=$((10**9)) # 1 TB
| limit=$((10**12)) while true do
| date tx=$(<
| /sys/class/net/eth0/statistics/tx_bytes) echo
| "$tx (limit: $limit)" if (( tx > limit ))
| then echo cutting systemctl
| poweroff fi sleep 1 done
|
| If you save it as cutnetwork.sh in
| /home/admin/cutnetwork.sh, you can run it as a systemd
| service: [Unit] Description=Cut
| Network [Service] UMask=022
| Environment=LANG=en_US.utf8 Restart=on-abort
| StartLimitInterval=60 StartLimitBurst=5
| WorkingDirectory=/home/admin ExecStart=bash
| cutnetwork.sh [Install]
| WantedBy=multi-user.target
|
| This simplistic approach may require adjustments depending
| on network settings and operating environment, and will not
| work for example if the instance is rebooted during the
| billing period, since that resets the counter. I would much
| prefer a hard-coded setting that reliable works on the
| instance itself, or a reliable hard billing limit that
| reliably turns off the service if the accumulated cost
| exceeds the set amount.
| arriu wrote:
| Thanks for sharing this.
| ghaff wrote:
| I know experienced people who have woken up to several
| thousand dollar AWS bill they didn't expect. And the large
| cloud providers have clearly indicated by their actions that
| they're simply not interested in implementing hard cost
| circuit breakers.
|
| I use AWS very lightly but I totally understand why someone
| wouldn't.
| geoduck14 wrote:
| Yup. I used AWS at my last job. We had teams of people
| using AWS, we had fancy 3rd party tools and extraction
| metrics to track and report on costs. There were still
| PLENTY of times when I would just scratch my head "well it
| looks like EC2s cost an extra $1000 this month. I wonder
| what happened"
| user3939382 wrote:
| > the large cloud providers have clearly indicated by their
| actions that they're simply not interested in implementing
| hard cost circuit breakers.
|
| I agree, my term for this is "bad faith".
|
| I recently had a free $200 credit for Azure. I setup their
| default MariaDB instance for a side project, figuring I'd
| get my feet wet with Azure. I didn't spend time evaluating
| the cost bc I figured, how much could the default be if I
| haven't cranked up the instance resources at all? Turns out
| the answer is more than $10/day which I discovered when
| authentication failed to my test DB. Back to Digital Ocean.
| TriNetra wrote:
| Yes in some cases the default is quite expensive - the
| same was there with SQL Azure (though they have changed
| that recently) and it had created a good amount of bill
| for us (though for their credit, Azure did refund in all
| such occasions because we didn't use the capacity at
| all). However, I don't know why the alert system doesn't
| have an option to say "here's my budget, alert me as soon
| as when my daily pace is set to exceed the monthly
| budget" instead, you have % of budget amount consumed
| based alerts, like you can get email if you say 50% of my
| budget is consumed, which happens every month so kinds of
| defeat the purpose of an alert.
|
| We ended up creating a simple solution (cloudalarm.in -
| in beta) that provides such budgeted pace based alert and
| more ways to get instant alert which isn't possible with
| usage based alerts.
| whoknew1122 wrote:
| It's not bad faith. It's 'providing the resources you
| signed up for'.
|
| Does it mean you have to go into your planning with more
| consideration as to cost? Yeah.
|
| But how would you feel if your start-up finally goes
| viral, you're having your best day ever, and then your
| app just stops working because someone forgot to remove a
| hard spend limit?
|
| Most people would rather see their app continue running.
|
| And what does turning off the lights look like? If your
| database hits your cost limit, do you stop serving
| requests? Delete the data? To what extent do you want
| 'cost protection' for resources you signed up for?
| imwillofficial wrote:
| It's not unreasonable to ask for a mechanism to not be
| billed thousands unexpectedly.
|
| Cloud billing is not easy to understand.
|
| I would know, I work for the part of AWS that calculates
| people's bills.
| whoknew1122 wrote:
| It's not rocket science, either. I would know, I work for
| premium support.
|
| Every time I see an unexpected bill of thousands of
| dollars, it's because the customer poorly architected
| their infrastructure.
|
| People seemingly want the freedom of complete control
| without the responsibility that comes with having that
| much control.
| nlitened wrote:
| > If your database hits your cost limit, do you stop
| serving requests? Delete the data? To what extent do you
| want 'cost protection' for resources you signed up for?
|
| Sounds like a reasonable configurable option rather than
| "you shouldn't be able to choose at all".
| eropple wrote:
| I am sympathetic to the concern about cost overages--I've
| hit them in AWS before--but given the way that developers
| and managers think about SaaS products (generally, not
| just cloud stuff), I tend to think that even if you
| required them to click three checkboxes and sign their
| name in blood, the first time you vaporized somebody's
| production database because they hit their overages and
| didn't think it would ever happen would be apocalyptic.
| And the second, and the third. And you're at fault, not
| the customer, in the public square.
|
| By comparison, chasing off "cost conscious" (read:
| relentlessly cheap--and I note that in my personal life
| I'm one of these, no shade being thrown here) users is
| probably better for them overall.
| whoknew1122 wrote:
| Work in AWS Premium Support. This is 100% how it goes.
|
| Take KMS keys for example. You can't outright delete a
| KMS master key; you have to schedule it for deletion. The
| shortest period you can schedule for deletion is 7 days
| (default 30). Once the key is deleted, all encrypted data
| is orphaned.
|
| Guess who gets blamed for deleted keys?
|
| HINT: It's not the customer.
| nlitened wrote:
| I am sorry, I might be missing something, but I call
| bullshit. How much does it cost for Amazon to store
| several bytes that make a key? 5 cents per decade?
|
| "Yeah, so uhm, you hit zero, so we deleted all your keys
| in an irrecoverable way, sorry not sorry" -- is not a
| circuit breaker. Make all services inaccessible to public
| and store the data safely until customer tops up their
| balance. That's how VPSes have worked forever.
|
| I don't argue that "cheapo" clients are worth retaining
| for AWS, clearly they are not. But this kind of hypocrisy
| really triggers me.
|
| Edit: a helpful person below suggested I misunderstood
| the parent, and I now I think I did.
| supaslide wrote:
| I'm pretty sure they meant that the customer schedules it
| for deletion and then blames AWS when they can't access
| their encrypted data.
| nlitened wrote:
| Oh, in this case I misunderstood what the parent meant,
| and I replied to a wrong interpretation of their words.
|
| Thank you for the clarification.
| eropple wrote:
| AWS doesn't retain _anything_ for you unless you tell
| them to, and when you tell them to delete something (as
| in the example relayed by the person you are replying
| to), they delete it as best as they are able. That 's
| part of the value proposition: when you delete the thing,
| it goes away. Why would they start now for clients who
| want their bills to be in the tens of dollars (when if
| you really care you can do it yourself off of billing
| alerts[0])?
|
| Going to be real: you aren't "triggered", which is
| actually a real thing out there that you demean with this
| usage of the term. You're just not the target market and
| you're salty that it's more complex than you think it is.
|
| [0]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/
| monitori...
| [deleted]
| eropple wrote:
| I used to run an AWS consultancy, which is how _I_ know.
| ;) More than once I had a customer go "well support
| won't help me, how can I get my data back?". And I had to
| tell them "well, support isn't just not helping you for
| kicks, you know?".
| user3939382 wrote:
| Stop serving requests until the finances are rectified,
| delete the data 30 days after it stops. Final migration
| out/egress requires a small balance for that purpose.
|
| The engineers designing and building these systems are
| some of the best in the world, this is relatively
| trivial.
| Daishiman wrote:
| There is absolutely nothing trivial about this.
| user3939382 wrote:
| Relatively trivial. In other words compared to the rest
| of the infrastructure and billing system this is nothing.
| ghaff wrote:
| My term for it is "you're not their use case." For better
| or worse, they've prioritized usages that would much
| rather have an unexpected few thousand dollar bill than
| have services paused or shutdown unexpectedly.
| civilized wrote:
| But computers can behave differently based on user
| choice. Right? So there could be a user option to cut
| service beyond a fixed spend. It wouldn't be hard to
| implement, and tons of people would use it. They don't do
| it.
|
| It's not a tragic case of priority and limited
| engineering resources. They _like_ surprise bills, just
| like hospitals do.
|
| Businesspeople _love_ it when you come to their service
| and click through their Russian novel of a service
| agreement that would take a team of lawyers to parse.
| Once you do that, your money belongs to them! It 's their
| court, their rules! They love it!
| nucleardog wrote:
| > It wouldn't be hard to implement, and tons of people
| would use it. They don't do it.
|
| Please describe to me, in detail, how this works.
|
| Because every time this comes up everyone claims it's the
| easiest thing in the world, but if you try and drill into
| it what they end up actually wanting is generally "pay
| what you want" cloud services.
|
| There are a _ton_ of resources on AWS that accrue on-
| going costs with no way to turn them off. A "hard circuit
| breaker" that brings your newly accruing charges to zero
| needs to not just shut down your EC2 instances, but
| delete your EBS volumes, empty your S3 buckets, delete
| your encryption keys, delete your DNS zones, stop all
| your DB instances and delete all snapshots and backups,
| etc, etc.
|
| The only people I see using a feature like this are some
| individuals doing some basic proof-of-concept work and...
| a bunch of people that are going to turn it on not
| understanding the implications and then when they get a
| burst of traffic that wipes out their AWS account they're
| going to publish angry blog posts about how AWS killed
| their startup.
|
| If, like most people, you don't want literally everything
| to disappear the first time your site gets an unexpected
| traffic spike, you can already do this by setting up a
| response tailored to your workload--run a lambda in
| response to billing alerts that shuts down VMs, or stops
| your RDS instance but leaves the storage, etc.
| void_mint wrote:
| > Because every time this comes up everyone claims it's
| the easiest thing in the world, but if you try and drill
| into it what they end up actually wanting is generally
| "pay what you want" cloud services.
|
| Why is it on any (usually a relatively new) user to
| define how an entire cloud should behave?
|
| Users are asking for a feature that helps them stop
| accidentally spending more than they intended. This
| feature request is totally fair. Implementing such a
| feature would be an act of good faith towards
| new/onboarding users (also obviously just any user with a
| very specific budget use-case).
|
| > The only people I see using a feature like this are
| some individuals doing some basic proof-of-concept work
| and...
|
| Yes exactly. GCP offers sandboxed accounts for this exact
| purpose. Why is this such a far reach?
|
| > setting up a response tailored to your workload--run a
| lambda in response to billing alerts that shuts down VMs,
| or stops your RDS instance but leaves the storage, etc.
|
| If you're telling every individual user that falls into a
| specific category to build a specific set of
| infrastructure, why is it not acceptable to you to just
| ask AWS to build it?
| thinkharderdev wrote:
| I think the sandbox idea is a great one. They should just
| do away with the free tier entirely except for sandbox
| accounts in which everything just gets shut down the
| second you go over the free allowance. If you want to
| build something for real then you pay for whatever
| resources you use, but if you just want to tinker around
| and learn a few things then you can get a safe sandbox to
| do it in.
|
| BUT, I think the parent's point is that such a feature
| would actually be quite complicated. It's not just a
| matter of saying "I only want to spend $X in this account
| per month/total" but defining exactly what you want to do
| in the case where you hit that limit. Shut everything
| down? My guess is almost nobody would want to do that. So
| it ends up being some complicated configuration where you
| have to deeply understand all of the services and their
| billing models in order to configure it in the first
| place. What are the odds that the student who
| accidentally spins up 100 EC2s for a school project is
| going to configure this tool correctly?
|
| But I do think the sandbox would be great. Either you are
| a professional in which case it is your responsibility to
| manage your system and put in appropriate controls to
| prevent huge unexpected bills or you are a student (in
| the general sense of someone learning AWS, not
| necessarily just someone in school) in which case they
| provide a safe environment for you to experiment.
| void_mint wrote:
| > BUT, I think the parent's point is that such a feature
| would actually be quite complicated.
|
| Sure, but so is making a cloud. Putting the onus of
| defining a feature like this on users, only after hearing
| their request ("I want to control my spend"), is IMO
| unfair.
| thinkharderdev wrote:
| Not complicated as in "too hard for AWS to build" but
| complicated as in "really hard to use as someone trying
| to limit your spend on AWS." So the people most at risk
| of huge unexpected bills are also not going to be the
| people knowledgable enough to setup the billing cap
| correctly. So it would mostly be a feature for
| enterprises and most enterprises would rather just pay
| the extra $ rather than potentially turn off a critical
| system or accidentally delete some user data.
|
| I worked at a company that spent ~$10m per month on AWS.
| We had a whole "cloud governance" team who built tools to
| identify both over and underutilized resources. But they
| STILL never cut any thing off automatically. The
| risk/reward ratio just wasn't there. You make the right
| call and shave $10k off a $10m bill every month, but the
| one time you take down a mission critical service, you
| give all of that back and then some.
| void_mint wrote:
| > So the people most at risk of huge unexpected bills are
| also not going to be the people knowledgable enough to
| setup the billing cap correctly
|
| Yes, which is why AWS builds it.
|
| > . So it would mostly be a feature for enterprises and
| most enterprises would rather just pay the extra $ rather
| than potentially turn off a critical system or
| accidentally delete some user data.
|
| It would be mostly not Enterprises IMO
| BackBlast wrote:
| I've been there. I shut down a bunch of what looked like
| idle instances doing nothing to reduce spend. 80% of
| which were, in fact, doing nothing. I did drop off two
| vms that were supporting critical infrastructure.
|
| Everyone who had done any work on them was long gone. I
| had done my due diligence to identify what they could
| possibly be.
|
| Still, the day of reckoning came, and we got calls of
| services down a week after I turned them off. I spun them
| back up, and they were going again without any real
| impact to the business.
|
| This turned out to be a blessing as the very next week
| the cert these same services depended on expired and if I
| hadn't learned about the system by turning them off we
| never would have known which boxes held up those
| services.
|
| Also a lesson in what happens when people leave without
| any documentation on where the work they did lives and
| how it works.
| [deleted]
| nucleardog wrote:
| > And the large cloud providers have clearly indicated by
| their actions that they're simply not interested in
| implementing hard cost circuit breakers.
|
| Since I enjoy tilting at windmills--how do you propose this
| works? Like, in detail.
|
| Because every time I try and drill into details of this
| with someone, it winds up what they really want is "pay
| what you want" cloud services.
|
| AWS is much, much more than just a place to run a virtual
| server and many resources accrue on-going costs with no way
| to "turn them off". When you hit your hard circuit breaker,
| do they delete all your EBS volumes and data in S3? Your
| private SSL root? Your user directory? Encryption keys? DNS
| zones?
|
| The number of people that would want all of that removed
| when they hit their $X/mo limit is likely minuscule in
| comparison to the number of people that would turn this on
| not understanding what it really meant and then publishing
| angry blog posts about how Amazon killed their startup
| right when they got popular and traffic spiked.
| alexeldeib wrote:
| https://docs.microsoft.com/en-us/azure/cost-management-
| billi...
|
| e.g. "virtual machines are stopped and de-allocated. The
| data in your storage accounts are available as read-
| only."
|
| Most control plane operations will also be blocked. It
| gets complicated with more complex resource types, but it
| gets the job done anyway.
|
| Note that this functionality is sort of _required_ to
| support pre-paid plans without allowing them to exceed
| specific limits, which do exist on Azure. So there 's a
| business dependency on this functionality today, it's not
| a hypothetical.
| ghaff wrote:
| That wasn't a value judgement on my part. I've made the
| exact same comment as you previously.
| igetspam wrote:
| Until you're into EC2-Other and then you have to follow
| various guides to figure out most of what that means. Even
| then, it's black box billing that Teams even struggle to
| explain. I spend a ridiculous amount on egress that's nearly
| impossible to track.
| the_jeremy wrote:
| If I'm using my own money on a personal project, I do not
| want "alerts". I want a maximum budget spend per X, where X
| is a _small_ increment of time, like an hour or day.
|
| Supporting hobbyist projects would absolutely lead to higher
| AWS adoption, at least in smaller companies.
| Tenoke wrote:
| >I don't get why you're so stressed out by AWS billing.
|
| Presumably because if you haven't used it and are digging
| deeper there's so many services with different types of
| billing that it can be hard to keep track.
|
| And who among us has not left something running way after it
| should've been shut down accidentally..
| danjac wrote:
| > once you've worked a bit seriously with AWS
|
| Kind of a chicken-and-egg situation, no? Unless you're on the
| company dime, learning to work with AWS entails that risk. A
| beginner simply won't know how to configure all of these
| things.
| RobRivera wrote:
| the risk is minimalized by reading the docs and proper
| planning.
|
| EDIT: got to love the holy downvoters
| danjac wrote:
| That's the point though. A beginner is going to make
| mistakes and should be able to learn in a safe
| environment. Think of a student on a tight budget at
| college or in a bootcamp, who has to learn AWS because
| it's on the curriculum.
| fleaaaa wrote:
| Exactly, it's pretty common for them to shoot their own
| foot with a couple of hundred dollar bill, just for one
| tiny instance with additional options that 'you have to
| do'.
|
| IMO AWS is deliberately make these things happen and
| reimburse it later with excuses. It's rather a strategy
| at this point it seems like.
| danjac wrote:
| I don't think it's so much a money-grabbing strategy as
| the problem that AWS is less a suite of unified services
| and more a litter of puppies fighting in a sack. With
| that kind of org chart it's difficult to have a unified,
| simple billing experience with good beginner training-
| wheels and on-ramps.
| geoduck14 wrote:
| >AWS is less a suite of unified services and more a
| litter of puppies fighting in a sack
|
| Can I quote you on this? I'm going to quote you.
| tenaciousDaniel wrote:
| Yep. I'm a novice at AWS. Last year I heard about RDS,
| and tried playing around with it.
|
| I thought I had shut it down because I clicked some
| button that _looked_ like it was the turn-off button. I
| let it go on for a few months, only to discover that I
| had been charged $1,500.
|
| The one thing that really pissed me off was how easy it
| was to set up vs how hard it was to take it back down. I
| can't remember the details, but basically you cannot
| simply turn off an RDS instance only in the UI (even
| though you can turn it on in the UI). You have to install
| the SDK and perform some (seemingly complex) commands.
|
| I tried explaining that I was a beginner, and that I made
| this mistake by accident, and that they could easily see
| that I had not actually used this instance at all or even
| put any data into the DB. But they wanted this huge list
| of things from me in order to refund it, like a super in-
| depth explanation of how it happened. Really shitty
| experience overall. So I canceled my AWS account and
| likely won't go back until I have a job that pays me to
| learn and use it.
| scrose wrote:
| You can 'Stop' a DB and 'Terminate' a DB through the UI.
| If you have deletion protection turned on, you can only
| stop the DB until it's turned off, which can also be done
| through the UI.
|
| You most likely stopped the DB, but the problem there is
| that AWS will automatically turn on the DB after 7 days.
| You also still get charged for storage for the time your
| DB is off.
|
| Sorry to hear that though. I know it's a really sucky
| situation to be in.
| tenaciousDaniel wrote:
| Ah yeah, I remember now. You could turn off the DB, but
| there was some other kind of scaffolding thing that I was
| still getting charged for. It seemed like an RDS-specific
| thing. They had to point me to a tutorial for turning
| that piece off.
| geoduck14 wrote:
| >You could turn off the DB, but there was some other kind
| of scaffolding thing that I was still getting charged
| for.
|
| It is a feature! /s
| Galanwe wrote:
| > I let it go on for a few months, only to discover that
| I had been charged $1,500.
|
| You get billed monthly, so really, letting it go for a
| few months is on you.
|
| > The one thing that really pissed me off was how easy it
| was to set up vs how hard it was to take it back down >
| You have to install the SDK and perform some (seemingly
| complex) commands.
|
| Hu, no it's not. Really it's litterally one click to
| shutdown an RDS instance, and always has been.
|
| > But they wanted this huge list of things from me in
| order to refund it, like a super in-depth explanation of
| how it happened.
|
| I mean that makes sense to me. They did reserve and
| partially used these resources for you, so it's only fair
| that you have to go through the trouble of explaining why
| they would let it go. If there's no downside everyone
| would just reserve a bunch of resources all the time.
|
| From your comment, it seems you didn't even bothered to
| answer their questions to get a refund, I would totally
| not hesitate to charge you if I was in AWS place.
| leeoniya wrote:
| > You get billed monthly, so really, letting it go for a
| few months is on you.
|
| right? what a moron, that bill could have been a surprise
| of only $500!
|
| /s
| tenaciousDaniel wrote:
| lol exactly. I've discussed this incident before on HN
| and had the same kind of "well tough luck but you
| deserved it" responses. They seem not to understand how
| off-putting it is to newcomers or students, to basically
| say "hey well YOU made a mistake, the trillion-dollar
| corporation SHOULD take your money, idiot!"
| geoduck14 wrote:
| >I've discussed this incident before on HN and had the
| same kind of "well tough luck but you deserved it"
| responses.
|
| Brush it off. I've worked with AWS pros who get lost in
| the billing. In my last job, we had a big "hackathon"
| where the objective was to reduce our AWS spend. Overall,
| we reduced our annual bill by a couple million dollars.
| tenaciousDaniel wrote:
| > From your comment
|
| Pretty stupid to just assume that I "didn't even bother"
| to answer their questions. I'm not going to write a novel
| explaining every minutia of my interactions with AWS.
|
| > letting it go is on you
|
| I never said it wasn't. If you pay attention, you'll see
| that the context of this comment is discussing the "it's
| on you" culture surrounding AWS and how hostile it is to
| newcomers.
| _wldu wrote:
| Doesn't the free tier cover a lot of what a student may
| want to do?
| lmz wrote:
| The free tier is just that, a tier (not a hard limit),
| and will not cover any use above the tier.
| triska wrote:
| I did not downvote this, but I have a comment: The risk
| is _not_ "minimalized by reading the docs and proper
| planning". To minimize something means to reduce it to
| the smallest possible amount, and I do not like to take
| any chances whatsoever when a huge excess bill is a
| possible outcome of a single misconfigured setting that
| can only be ruled out by reading hundreds if not
| thousands of pages of documentation and then following
| the documentation without mistake to the letter.
|
| There is a clear possible solution for reliably
| preventing any amount of unintended overpayment, and that
| would be to configure a hard billing limit that can
| _never_ be exceeded, _no matter what else is being
| configured_. All services that generate additional costs
| would simply have to stop or be removed if the configured
| limit is exceeded.
|
| That would truly minimize the risk, because any
| configuration error I make will then not lead to excess
| payment if I configure such a limit and the cloud
| provider respects it.
| pastage wrote:
| I do not have the time for alerts in my personal life,
| billing PTSD is not fun.
| Tenoke wrote:
| With employers I almost always use AWS, for side projects I
| almost always use hetzner for cheap servers. I don't even think
| you need to worry much about learning AWS unless you need it
| but if you do you can limit your budget, set alerts and hope
| they go off.
| danjac wrote:
| I'd be happy to never touch AWS unless a) I'm not the one
| paying for it and b) there is a genuine need. Unfortunately
| it's increasingly a job requirement.
| StratusBen wrote:
| [Disclosure] I'm Co-Founder and CEO of http://vantage.sh/ and
| was previously the lead PM on DigitalOcean's Droplet product as
| well as on the product team at AWS for container services.
|
| We try to help out a bit on this with Vantage which essentially
| gives you a DigitalOcean-esque view of your AWS costs. The
| first $2,500 in AWS costs are tracked for free which would
| seemingly cover your side-projects.
|
| It sounds like you've found your home on DigitalOcean, but I'd
| be curious if something like Vantage would potentially change
| your decision to build on AWS? In particular what you mention
| about runaway bills is something that Vantage sends alerts on
| in advance. We also show you a full inventory of your AWS
| resources and what they cost you.
| civilized wrote:
| What happens if your tracking doesn't end up matching the
| actual bill?
| StratusBen wrote:
| Vantage integrates at an AWS account level through a
| mechanism called a Cross account IAM role which allows us
| to ingest and process the raw data that AWS uses for its
| own billing systems (Cost and Usage Reports, Service APIs
| and Cost Explorer)
|
| We haven't seen a single case where we don't end up
| matching the actual AWS bill. In fact, with a release
| currently in BETA and rolling out in a week or two, we'll
| be providing _richer_ data _faster_ than AWS Cost Explorer
| provides.
|
| You can see a demo of that here (designs still need to be
| implemented)
| https://www.loom.com/share/dcb72a921f134e59b19a0dd3d3ab0e2f
| nlitened wrote:
| Yeah but do you guarantee that you will cover any real
| billing differences, or you're not sure enough to put
| your money where your mouth is?
| NathanKP wrote:
| Having seen firsthand the kind of devious folks who are
| out there constantly trying to do fraudulent activity on
| AWS, I don't think any small startup like Vantage would
| ever want to offer a bill insurance scheme. It would be
| ripe for exploitation, such as someone trying to spin up
| 1000 instances in the last couple minutes of the month
| and then say "Gotcha, your bill prediction didn't match
| up with the real bill after all!"
|
| At a more general level this may also be one of the most
| entitled asks I've ever seen in the 12+ years I've been
| on HN.
| mping wrote:
| This is crazy. You suggest that if AWS changes billing
| somehow, his startup should shoulder the cost? If you
| don't trust it don't use it.
| jeswin wrote:
| This is important, and will show confidence in your
| product.
|
| The requirement is insurance against not exceeding a hard
| limit. We're talking about a rare event, so the offering
| (vantage.sh) is not good for this specific usecase if it
| isn't absolutely foolproof.
| echelon wrote:
| StratusBen, this is where you could charge an incredible
| premium for your service.
|
| If you could underwrite price guarantees and show an
| insurance company or lender that your figures are right
| in most cases and that you've never lost over a certain
| amount, you could really hike the cost of your service
| and provide an incredible utility across the board.
|
| If you have spare bandwidth and can reliably do this, try
| building this.
| TriNetra wrote:
| While building CloudAlarm [0] (supports Azure as of now),
| we found that the usage data on Azure wasn't available for
| at least a day - in fact, they keep adding the data to it
| during the next day so technically two days would have
| passed when the actual usage data is available. I haven't
| gone deeper in AWS but they also gradually make the data
| available as per I've read. SO instantly alert we thought
| cannot be possible with usage and hence we chose a novel
| route - of 'New Resource' alarm - wherein you can get
| alerted for all resources created or for anything expensive
| than the tier you choose. The resources in your Azure
| subscription are available almost instantly via the API, so
| this was something a nice workaround we thought.
|
| 0: https://cloudalarm.in/
| Thaxll wrote:
| DO is very amateurish, exactly how to use secure resources with
| rbac or anything? I mean it took them 10 years to have a load
| balancer service lol.
|
| I would never use DO or Linode at work, those are for garage
| project over the week-end.
| sethammons wrote:
| DO does not have a good support channel. My droplet died. I
| couldn't reach it, and neither could the world. There was no
| emergency button for getting help. Just send in an email.
| After a small time window, I had to just delete and rebuild
| my droplet. Two days or so later, they got back to me and
| because I deleted it, they couldn't debug it. Two days to get
| back to me on a dead, non-reachable droplet. I don't think
| that is acceptable for anything running in production.
| mwcampbell wrote:
| But the whole point of treating cloud servers as cattle
| rather than pets is that when one dies, you can spin up
| another one, right? Ironically, that's one reason I prefer
| AWS over DO and the like, because AWS's EC2 auto-scaling
| and availability zones are great for this kind of
| resilience.
| tester756 wrote:
| I struggle to understand what's so interesting in
| X(AWS/Azure/GCP/Alibaba/SAP/Oracle...) Cloud that it appears
| basically daily on HN
|
| I see it like +-decade old (in mainstream) wrapper/apis over
| managing VMs/Infra while being proprietary as hard as possible
|
| What's the difference between this and javascript frameworks
| posts? (except js frameworks being OSS)
|
| How many years have to pass until "Cloud" stops being
| $hot_topic?
|
| Were "admin" topics (heh you know those guys that were
| predecessors of "DevOps") 10-15 years ago hot too?
| duiker101 wrote:
| I don't think we are even near the peak "Cloud" before it can
| slow down. We are just starting to see now entire dev envs in
| the cloud, and I'm pretty sure we will continue to go in that
| direction.
| dehrmann wrote:
| The original, main value proposition was that you don't need
| to physically manage your server anymore. Some products are
| just wrapped, managed versions of something you're already
| familiar with. Then there are more "original" offerings like
| DynamoDB, Bigquery, and Bigtable that bring a lot of value on
| their own and are significantly easier to operate at large
| scale than any open source equivalent.
| sofixa wrote:
| > I struggle to understand what's so interesting in
| X(AWS/Azure/GCP/Alibaba/SAP/Oracle...) Cloud that it appears
| basically daily on HN
|
| > I see it like +-decade old (in mainstream) wrapper/apis
| over managing VMs/Infra while being proprietary as hard as
| possible
|
| Don't know if serious or not, but nevertheless let me try.
| It's the global and near infinite scale, the enormous amounts
| of managed services you get behind those APIs. You need a
| database/message queue/object storage whatever at whatever
| scale? Have at it, and pay as you go. If you can't see the
| interest in that, i wonder what it is that you do.
|
| And IMHO there's nothing inherently complex about the APIs of
| AWS or GCP ( the only ones I've really used). They're as
| complex as the things they manage.
| throwdecro wrote:
| > I struggle to understand what's so interesting in
| X(AWS/Azure/GCP/Alibaba/SAP/Oracle...) Cloud...
|
| We're still afraid of the cloud spending all of our money.
| theamk wrote:
| We are talking about things we re using or want to use. And
| the usage of cloud is not going to go down anytime soon.
|
| For example, I think AWS spend is likely the biggest spend in
| my company after the payroll/offices, so it is pretty
| important topic, business-wise.
|
| And unlike JS frameworks which only matter to a subset of JS
| frontend developers, everyone can use cloud: JS or Java or
| Rust or C++ or C; frontent, backend, data science, ML,
| compilers, embedded.
| ManuelKiessling wrote:
| I feel you. I do take the risk - the leverage on automation and
| manageability that Terraform e.a. give me are just too good to
| pass, and only with a 100% API approach like AWS provides it
| can I play the 100% infrastructure-as-code game, and I simply
| won't play any other game anymore.
|
| Through the very same means, first thing I do with every new
| AWS-based project is setting up a cleanly organized Org with
| centralized billing, centralized IAM&Roles, centralized billing
| alarms, centralized SCP limitations(!!!) (as in "I will never
| run anything in Southeast Asia, so I disallow anything in
| Southeast Asia for all Org accounts), and very not-centralized
| resources per stage/subproject/vertical/whatever.
|
| Plus sensible service limits on everything that has a service
| limit (request on API gateway etc.).
|
| But as someone here said: your risk will remain > 0, you just
| have to accept that.
| whichquestion wrote:
| DigitalOcean has a terraform provider you can use for an IaC
| approach
| atmosx wrote:
| DO doesn't offer accounting in services (who accessed what
| and when), no IAM which is a huge problem and their APIs
| and services are a bit low on reliability. Spaces has
| extremely low rate limit and their communication API time
| outs often. The k8s service works overall but has some
| annoying hiccups that only support tickets will fix, the
| CDN returns random 503 errors. All droplets are shared and
| resource contention is a thing.
|
| That said, AWS support has its own issues, rarely solving
| the problem even when we pay for a TAM. Services like
| elasticache are hard to upgrade with zero downtime. Their
| solutions always involved spending inordinate amounts of
| money on open source clones with 1/10 of features and their
| good services (DynamoDB) will cost an arm and a leg.
|
| My 2 cents.
| TriNetra wrote:
| It's the same with Azure. On multiple occasion, I had databases
| created in tiers several times expensive than the one I use
| with my subscription. This wasn't a manual mistake; a sleeping
| app got awakened (may be I'd have hit the run button by
| mistake) and it ended up creating the database via the ORM
| framework. Since the ORM framework is only executing create
| database on the sql server, Azure goes with the default for
| tier which they had chosen as one with $250 or something
| monthly price. I've setup the budget alerts on Azure but these
| are threshold (% of budget consumed) based and they come every
| month so technically they aren't alert rather information which
| requires you to do the math whether or not you're in the
| budget. So you tend to ignore them after a while. Recently, we
| decided to build a simple solution ourselves [0] which gives
| alerts based on budgeted pace and not consumption.
|
| 0: https://cloudalarm.in/
| helsinkiandrew wrote:
| Nothing for me compares to the time I purchased 2 reserved EC2
| instances for about $5K on my personal account rather than
| companies. I can still remember that sinking feeling as I
| realized what I'd done.
|
| Amazon refunded the next day.
| stingraycharles wrote:
| It's incredibly easy to spend a lot of money on the cloud,
| indeed. I remember using Google Cloud's translate API on a
| bunch of documents -- it took several hours for the bill to pop
| up at $1500. This was a hobby / personal project of mine,
| Google did not refund it, because of course I should have read
| the pricing more carefully.
| dotancohen wrote:
| This is the advantage with AWS, they _will_ refund mistakes.
| I've seen it happen twice, and both times were resolved
| quickly with a rep on the phone.
|
| I've also once had an issue with my own personal account.
| Five minutes with a rep on the phone saved not my bank
| account, but my website and hosted services, because my
| credit card was cancelled and it would be another few months
| before I could get another.
| nobleach wrote:
| Amazon as a company tends to side with the customer. Their
| whole mantra is that it's not worth chasing after x* amount
| of dollars. Now repeat offenses? No, you're not getting
| away with using their services for free. (You mention
| twice, but I imagine 4 or 5 times, and they're going to
| fault you without escalating the issue)
|
| *within reason... you're not going to serve up an app all
| month long and skip out on a million dollar bill.
| dehrmann wrote:
| > Amazon as a company tends to side with the customer.
|
| Completely agree. Google might be learning parts of this
| with GCP, but historically, customer-obsessed isn't in
| Google's DNA.
| dncornholio wrote:
| Mistakes? How about the flaws of that what is AWS and there
| terrible, terrible pricing system that rewards them for your
| mistakes.
| fukmbas wrote:
| Mistake #1: using AWS
|
| lol
| zackmorris wrote:
| I view AWS as a study in doing everything the "bare hands" way.
| Here are some examples of the old sysadmin ways of doing things
| vs the modern "web" way:
|
| * regions -> self-balancing algorithms like RAFT
|
| * roles/permissions -> tokens
|
| * IP address filtering -> tokens
|
| * CPU clusters -> multicore/containerization/Actor model
|
| * S3 -> IPFS or similar content-addressable filesystems
|
| It's not just AWS having to deal with this stuff either:
|
| * CORS -> Subresource Integrity (SRI)
|
| * server languages (CGI) -> Server-Side Includes (SSI)
|
| * Javascript -> functional reactive, declarative and data-driven
| components within static HTML
|
| * async -> sandbox processes, fork/join, auto-parallelization
| (seen mostly in vector languages but extendable to higher-level
| functions)
|
| * CSS -> a formal inheritance spec (analogous to knowing set
| theory vs working around SQL errata)
|
| I could go on forever but I'll stop there. We are living at a
| very interesting time in the evolution of the web. I think that
| web dev has reached the point where desktop dev was in the
| mid-1990s and is ripe for disruption. No disruption will come
| from the big companies though, so this is your chance to do it
| from your parents' basement!
| StratusBen wrote:
| [Disclosure] I'm Co-Founder and CEO of http://vantage.sh/, a
| cloud cost platform for AWS. Previously I was a product manager
| at AWS and DigitalOcean.
|
| Since the author and so many people are commenting about AWS
| costs (and in particular, choosing cheaper EC2 instances and EBS
| volumes) I thought I'd mention that Vantage has recommendations
| that look to tell you for these exact things so you don't get
| tripped up / spend more than you have to.
|
| If you have "antiquated" EC2 instances or EBS volumes, Vantage
| will give you a recommendation for which instance to switch to
| and how much money you'll save.
|
| The first $2,500/month in AWS costs are also tracked for free so
| people get a lot of value out of the free tier and can save
| significant parts of their bills when developing on AWS.
| 7sidedmarble wrote:
| Respect that you are all about that grindset for your product
| in this thread, but it's also a little insane that you need a
| third party tool to make sense of what's going on in AWS.
|
| I'm a bit of a GCP fan, and while it's billing is also arcane,
| it think it is just a little bit easier to understand and
| better laid out. For bread and butter stuff like regular VPSs
| though, AWS is often a little cheaper. But GCPs other cloud
| offerings are occasionally very respectably priced.
| swyx wrote:
| every $X00 billion dollar business is big enough that third
| party tools will always be desired because the default
| experience wont be good enough for some part of the market.
|
| question is whether or not that part is big enough to warrant
| its own venture scale business, as with Vantage :)
| smoldesu wrote:
| I'd frankly just prefer to use a VPS. The fact that I need to
| have _a payment stack_ alongside my technology one is just
| ridiculous to me.
| hughrr wrote:
| Biggest mistake I've made:
|
| Shifting any non trivial infrastructure into AWS verbatim is
| always more expensive than running it yourself. You need to
| rearchitect it carefully around the PaaS services to make a cost
| saving or even break even.
|
| An extreme example of this is it cousin who works for a small dev
| company doing LOB stuff. They moved their SQL box into EC2 and
| it's costing more to run that single RDS instance than their
| entire legacy infra cost was per year.
|
| I'd still rather use AWS though. The biggest gain is not
| technology but not having to argue with several vendor sales
| teams or file a PO and wait for finance to approve it. All I do
| is click a button and the thing's there.
| Aeolun wrote:
| > The biggest gain is not technology but not having to argue
| with several vendor sales teams or file a PO and wait for
| finance to approve it. All I do is click a button and the
| thing's there.
|
| This is _so_ ridiculous. I have to argue endlessly (and again
| for every employee) with IT support and enterprise security to
| give them the ability to upload attachments on Teams.
|
| But giving that same person access to start a few $100/hour
| instances on AWS? No problem.
|
| The balance is completely out of whack once your infra is on
| AWS.
| thinkharderdev wrote:
| But that's the thing. A senior engineer is also costing a
| business ~$100/h (all in) so even if you accept that there
| will be a fair amount of waste (misconfigurations, devs
| spinning up boxes and then forgetting about them, PoC
| projects never torn down, etc) it can still be a net-positive
| proposition. People always want to compare the cost of
| compute/storage/bandwidth but that isn't really the value
| proposition. Of course you are going to pay more for cloud-
| hosted infra than for equivalent infra in a colo or on-prem
| DC. But I used to spend hundreds of hours a year doing random
| busy work to deal with on-prem infrastructure. Need a new
| server, put in a ticket and when the ticket gets no response
| follow up with emails and finally setup a meeting to discuss
| with the ops team. Need a new firewall rule, same deal.
| sethammons wrote:
| Have them build an s3 attachment service and just let finance
| know you are spending $X/yr and you have ideas for
| streamlining IT support that would eliminate the cost.
| rualca wrote:
| > Shifting any non trivial infrastructure into AWS verbatim is
| always more expensive than running it yourself.
|
| The free tier of AWS lambdas has enough room to do non-trivial
| applications for free, and in EC2 we can get t2.micro and
| t3.micro instances (2vCPU, 1GB RAM) with 750h/month for free,
| which pretty much means you can have the instance running the
| whole month for free.
|
| Depending on what you need to do, in the very least it's
| possible to run a system (or parts of it ) for free, which is
| hard to beat.
|
| Having said this, allowing a system architect to go nuts with
| AWS without being mindful of its cost is something that easily
| gets far too expensive far too fast. If all anyone wants is EC2
| and there's no need for global deployments then you'd be better
| off going with cloud providers such as Hetzner. A couple of
| minutes with a calculator and a napkin at hand is enough to
| arrive at the conclusion that AWS makes absolutely no sense,
| cost-wise.
| macpete42 wrote:
| I can confirm that: cloud helps to evade the incompetent sales
| and infrastructure teams in many companies. Saving money never
| works once your product scales out.
| AndrewDucker wrote:
| It's always more expensive to have someone else run your
| infrastructure than to do it yourself unless it's something you
| only use intermittently.
|
| If you need 5 seconds of compute time per day then running that
| as a Lambda makes perfect sense. If you need a database server
| that's available 24/7 then I can't see how hosting that on
| Amazon could be cheaper.
|
| (Unless you're employing a full time ops person to look after
| that one server, in which case you'll have to do your own
| maths.)
| helsinkiandrew wrote:
| Previous job, I ran website on RDS for about 7 years and only
| touched the control panel to restore from a backup when we'd
| screwed up the data, and to tell it when to upgrade. That was
| worth quite a lot in ops time and piece of mind.
| habibur wrote:
| Database crashing on a hosted service is such a rate event
| unless you mess it up yourself running rouge queries. No
| surprise there. It doesn't have to be AWS though. Can be
| Digital Ocean or anything else.
| hughrr wrote:
| Yep that. Lambda is a massive win for me personally. I have
| some scraping and processing stuff that runs daily. Costs me
| $0.60 a month to run it even outside of free tier which is
| less than a cheap DO or linode box and I don't have to look
| after the OS.
| AndrewDucker wrote:
| Same. Powershell script that collects links from Pinboard
| and posts them to my blog. It would be massively overkill
| to run a whole server for that. (And Microsoft charges me
| about PS0.15 per month)
| l33tman wrote:
| Well that 24/7 db server is not going to back up itself and
| maintain itself when the HW fails or the power goes out. This
| is not trivial/cheap to assure and maintain. Likewise with
| hosting your own S3. The value is not in the HW really (and
| the cloud providers know this when pricing).
|
| I have this setup on AWS and look at "bringing it home" every
| year but the nightmare of having to assure a good level of
| availability is not worth the saving of a few extra $1000 per
| year for us at least. Completely different issues not related
| to your HW can happen, your office internet simply goes down
| or power goes out while you're on vacation etc..
|
| At least there is major competition between a lot of cloud
| providers nowadays so nobody can get away with insane prices
| anymore. Though, would be cool to see some kind of
| standardised price comparision metric for medium/high
| complexity cloud setups. Sort of how you compare grocery
| prices, you have a standard purchase list.
| AnIdiotOnTheNet wrote:
| > Well that 24/7 db server is not going to back up itself
| and maintain itself when the HW fails or the power goes
| out.
|
| Uh, yeah, that's why you pay IT people to do that sort of
| thing. Hiring your own will almost certainly be less
| expensive than paying Amazon's people, and provides you
| more control and more options in the event of any problems.
| It's not like AWS never has problems, and when it does all
| you can do is twiddle your thumbs until someone else fixes
| it.
| Daishiman wrote:
| In what world do you live in where competent IT support
| for production infrastructure is cheap?
| AnIdiotOnTheNet wrote:
| The one that doesn't live in SV.
| hughrr wrote:
| It's cheaper for us to bring our own staff and licenses
| than pay for RDS at our scale :)
| WaxProlix wrote:
| Once you hit that point, AWS will almost always cut you
| some specific pricing deals to ensure that your pain
| point is competitively priced, since they want you in the
| ecosystem anyways.
| tpetry wrote:
| Wouldn't it make sense booki g typical SaaS on AWS marketplace?
| I mean you wouldn't have to talk to the billing department,
| just activate a SaaS within AWS and everything is put on your
| normal AWS bill?
| wly_cdgr wrote:
| Heh, I like how Amazon literally took the boost mechanic from
| arcade racing games for the CPU credits in T2/T3
| physicles wrote:
| Burst CPU and IOPS has bitten me a couple times over the years.
| In fact, it's basically the sole cause of nearly all our downtime
| in recent history. That's frustrating. I get that it's a
| technical solution to the problem of resource utilization at
| scale, but they could've spent some time making it easier to
| observe -- for example, rescale the CPU or IOPS graphs so that
| 100% is your max sustained budget, and anything over 100% eats
| into your quota.
| [deleted]
| tedk-42 wrote:
| Few easy ones as well:
|
| 1) Terminating instances that had ephemeral disks with stuff you
| needed while thinking the EBS volumes would remain
|
| 2) Leaving NAT gateways lying around or ELBs that do nothing and
| have no instances attached.
|
| 3) Public S3 buckets - arguably the most common one that can lead
| to security incidents
|
| 4) Debugging security groups/Network ACLs and straight up break
| networking for something without knowing it. Reverse of that
| would be you want to fix something quickly and open 0.0.0.0/0 to
| everyone and never get around to tightening up the firewall later
| on.
| jnieminen wrote:
| I was playing with the Azure "free" tier. Even I tried to be
| extremely careful with it, after a while noticed that I had
| left a storage blob for a VM hanging around and some external
| IPv4 address. I will continue to use Hetzner online for my own
| stuff instead running this on "public cloud".
| projectramo wrote:
| My biggest mistake: years ago I ended pushing personal
| credentials to GitHub at night and waking up to a several
| thousand dollar bill in the morning.
|
| Changed credentials and cancelled all the running instances only
| to find that I'd missed some.
|
| It was resolved by the afternoon.
| judge2020 wrote:
| Thankfully GitHub now runs secret scanning and AWS is a
| partner. If you did this today AWS will revoke the key before
| malicious scanners find it.
|
| https://docs.github.com/en/code-security/secret-scanning/abo...
| unglaublich wrote:
| But what mistakes did he make? Did he screw up the bill? Did he
| fail to keep services available? I only read facts about the ins
| and outs of AWS' billing and credits system.
| weird-eye-issue wrote:
| If you run out of CPU or IOPS burst balance then your system
| will suddenly slow to a crawl and it can easily cause downtime
| or in the case of background jobs it will cause long queues or
| never ending jobs. Learned that the hard way, a couple times.
|
| One time I optimized DB access which fixed the IOPS usage, and
| then that caused more CPU usage on the app servers which caused
| them to run out of CPU burst... Fun times. Switched from one
| burst issue to another.
| sethammons wrote:
| And that is scaling systems. Open up one bottleneck to
| discover the next. Rinse and repeat to gain experience.
| mfrye0 wrote:
| One of the biggest mistakes I made is not exploring spot
| instances and reserved instances earlier.
|
| I cut my bill by 70-80%% after paying full price for years...
|
| If you have an active web server or backend workers with fairly
| short jobs, spot instances will work for you.
| thanatos519 wrote:
| Does the 'cpu credits' stuff apply to spot instances too? I
| have been thinking of shortening my animation render time with
| spot instances, but it only makes sense if I can run every core
| at 100% for the entire life of the instance.
| calmlynarczyk wrote:
| This is more just "missed optimization opportunities in EC2" than
| a statement about mistakes in AWS as a whole.
|
| If you want to talk systemic AWS mistakes you can make, we
| accidentally created an infinite event loop between two Lambdas.
| Racked up a several-hundred-thousand dollar bill in a couple of
| hours. You can accidentally create this issue across lots of
| different AWS services if you don't verify you haven't created
| any loops between resources and don't configure scaling
| limitations where available. "Infinite" scaling is great until
| you do it when you didn't mean to.
|
| That being said, I think AWS (can't speak for other big
| providers) does offer a lot of value compared to bare-metal and
| self-hosting. Their paradigms for things like VPCs, load
| balancing, and permissions management are something you end up
| recreating in most every project anyways, so might as well
| railroad that configuration process. I've experienced how painful
| companies that tried to run their own infrastructure made things
| like DB backups and upgrades that it would be hard to go back to
| a non-managed DB service like RDS for anything other than a
| personal project.
|
| After so many years using AWS at work, I'd never consider
| anything besides Fargate or Lambda for compute solutions, except
| maybe Batch if you can't fit scheduled processes into Lambda's
| time/resource limitations. If you're just going to run VMs on
| EC2, you're better off with other providers that focus on simple
| VM hosting.
| itisit wrote:
| > Racked up a several-hundred-thousand dollar bill in a couple
| of hours.
|
| Not doubting you, but curious how you hit such a high figure.
| Can you walk through the math? Are we talking trillions of
| <10ms requests?
| mfrye0 wrote:
| > If you want to talk systemic AWS mistakes you can make, we
| accidentally created an infinite event loop between two
| Lambdas. Racked up a several-hundred-thousand dollar bill in a
| couple of hours.
|
| I did more or less the same thing, but with a 3rd party
| webhook. The bill almost killed my company.
| heurisko wrote:
| If you are able to share the story, what went wrong with the
| webhook?
| cutemonster wrote:
| > The bill almost killed my company.
|
| You had to pay although it was a mistake?
| vbezhenar wrote:
| You spent resources. Of course to have to pay.
| Uehreka wrote:
| The resource usage required to tank a small startup (that
| could've become a bigger customer later) is probably
| peanuts to Amazon. I'm not sure how often they do this
| (or whether they do it at all) but it would make business
| sense for them to occasionally grant "billing
| forgiveness" in serious situations.
| FpUser wrote:
| >"Racked up a several-hundred-thousand dollar bill in a couple
| of hours."
|
| This is enough to rent big server from Hetzner / OVH for like
| forever and have person looking after it with plenty of money
| left.
|
| >"I've experienced how painful companies that tried to run
| their own infrastructure made things like DB backups"
|
| I run businesses on rented dedicated servers. It had taken me a
| couple of days to create universal shell script that can create
| new server from the scratch and / or restore the state from
| backups / standby. I test this script every once in a while and
| so far had zero problems. And frankly excluding cases when I
| want to move stuff to a different server there was not a single
| time in many years when I had to use it for real recovery.
|
| I did deployments and managed some infrastructure on Azure /
| AWS for some clients and contrary to your experience I would
| never touch those with the wooden pole when I have a choice.
| Way more expensive and actually requires way more attention
| than dedicated servers.
|
| Sure there a cases when someone need "infinite scalability".
| Personally I have yet to find a client where my C++ servers
| deployed on real multicore CPU with plenty of RAM and array of
| SSD came anywhere close to being strained. Zero problems
| handling sustained rate of thousands of requests per second on
| mixed read / write load.
| calmlynarczyk wrote:
| I'm not saying it can't be done cheaper or more efficiently
| on simpler providers or even self-hosting, but you need the
| expertise and time to stand up the foundation of a secure
| platform yourself then. For example, AWS Secrets Manager is
| just there and ready to code against, as opposed to standing
| up a Vault service and working through all of the
| configuration oddities before you can even start integrating
| secrets management into an application. If you already have a
| configuration-in-a-box that you can scale up, then more power
| to you.
|
| Your use-case of running a web service that is written in a
| very efficient language like C++ is not something you see too
| much these days. While it would be nice if most devs could
| pump out services built on performant tech stacks, our
| industry isn't doing things that way for a reason. Even high-
| prestige companies with loads of talented engineers only
| build select parts of their systems using low-level
| languages.
| talolard wrote:
| I think your last paragraph is the sales pitch for AWS.
| Hiring that level of expertise doesn't scale. Easier and
| cheaper to hire 10x as many "developers" and pay the AWS bill
| than headhunt performance gurus that understand hardware and
| retain them .
| FpUser wrote:
| What expertise? My specialty is new product design. I am
| very far from being performance hardware guru. I just
| understand basics and do not swallow propaganda by loads.
| Dylan16807 wrote:
| Even if you're right, it's still cheaper to get a dozen
| dedicated servers than to get a huge pile of AWS servers.
|
| Bad performance means you need more servers, it doesn't
| mean you need instant scaling.
| rightbyte wrote:
| AWS is the solution looking for a problem, which happened to be
| modern web dev practices.
| tomnipotent wrote:
| Many companies want disaster recovery and multi-region
| deployments without the capital expenses required to deploy
| this themselves.
|
| I don't want to have to buy hardware from a vendor, find
| cabinet space, negotiate peering and power agreements, deal
| with 3am alerts for failed NICs, or hear about someone
| spending hours freeing up disk space while waiting on new
| drives to arrive.
|
| I want the benefit of all these things, but I'd rather pay a
| premium for it over time than deal with the upfront capital
| expenses.
| Salgat wrote:
| The problem is that not everyone wants to self-host, not
| everyone wants to manage hardware, and not everyone's tech
| scales in an extremely predictable and easy way. We launched
| a new tenant that required a bunch of new EC2s, databases,
| etc. Was trivial with AWS with terraform. If we did our own
| homegrown solution we would have had to have that hardware
| either ordered and waited on or have that hardware ready in
| reserve just burning cash doing nothing.
| Daishiman wrote:
| Things AWS solves for me that I've always wanted to have
| solved:
|
| * Database administration
|
| * Security best practices by default
|
| * Updated infrastructure
|
| * Automatic load balancing
|
| * Trivial credentials management
|
| * 2FA for all infra administration
|
| * Container image repositories
|
| * Distributed file systems
|
| I was and old-school bare-metal UNIX systems admin 15 years
| ago. Each of those things, in medium to large companies,
| would take a full-time sysadmin to keep it all up to date.
| billisonline wrote:
| > we accidentally created an infinite event loop between two
| Lambdas. Racked up a several-hundred-thousand dollar bill in a
| couple of hours
|
| May I ask how you dealt with this? Were you able to explain it
| to Amazon support and get some of these charges forgiven? Also,
| how would you recommend monitoring for this type of issue with
| Lambda?
|
| Btw, this reminds me a lot of one of my own early career screw-
| ups, where I had a batch job uploading images that was set up
| with unlimited retries. It failed halfway through, and the
| unlimited retries caused it to upload the same three images
| 100,000 times each. We emailed Cloudinary, the image CDN we
| were using, and they graciously forgave the costs we had
| incurred for my mistake.
| calmlynarczyk wrote:
| > May I ask how you dealt with this? Were you able to explain
| it to Amazon support and get some of these charges forgiven?
| Also, how would you recommend monitoring for this type of
| issue with Lambda?
|
| AWS support caught it before we did, so they did something on
| their end to throttle the Lambda invocations. We asked for
| billing forgiveness from them; last I heard that negotiation
| was still ongoing over a year after it occurred.
|
| Part of the problem was we had temporarily disabled our
| billing alarms at the time for some reason, which caused our
| team to miss this spike. We've enabled alerts on both billing
| and Lambda invocation counts to see if either go outside of
| normal thresholds. It still doesn't hard-stop this from
| occurring again, but we at least get proactively notified
| about it before it gets as bad as it did. I don't think we've
| ever found a solution to cut off resource usage if something
| like this is detected.
| BackBlast wrote:
| We use memory safe languages, type safe languages. AWS is not
| fundamentally billing safe.
|
| Just to give you nightmares. There's been DDoS in the news
| lately, I'm surprised nobody has yet leveraged those bot nets
| to bankrupt orgs they don't like who use cloud autoscaling
| services.
|
| I don't know how you monitor it, part of the issue is the
| sheer complexity. How do you know what to monitor? The
| billing page is probably the place to start - but it is too
| slow for many of these events.
|
| I guess you could start with the common problems. Keep
| watchdogs on the number of lambdas being evoked, or any
| resource you spin up or that has autoscaling utilization.
| Egress bandwidth is definitely another I'd watch.
|
| Dunno, just seems to me you'd need to watch every metric and
| report any spikes to someone who can eyeball the system.
|
| For me? I limit my exposure to AWS as much as I reasonably
| can. The possibilities combined with the known nightmare
| scenarios, with a "recourse" that isn't always effective
| doesn't make for good sleep at night.
| rileymat2 wrote:
| > There's been DDoS in the news lately, I'm surprised
| nobody has yet leveraged those bot nets to bankrupt orgs
| they don't like who use cloud autoscaling services.
|
| That's interesting because I seems like it would happen,
| but what is in it for the attacker, whrn under threat they
| can implement caps?
| sjtindell wrote:
| Could only be an attack of spite, can't really hold a
| ransom because the IPs of malicious traffic could be
| blocked or limits set after initial overspend. Perhaps if
| the botnet was big enough.
| BackBlast wrote:
| A severe enough bill can cause an organization to be
| instantly bankrupt. No opportunity to try to do something
| like caps.
|
| Regardless, turning on spending caps isn't a final
| solution to this particular attack. With caps the
| site/resources will hit the cap and go offline.
| Accomplishing what a DDoS generally tries to accomplish
| anyway.
|
| The only real solution is that you have to have a cheap
| way to filter out the attacking requests.
| ivanhoe wrote:
| Some people get paid to destroy competition, others just
| enjoy watching the world burn...
| Kiro wrote:
| Slightly OT: I love Forge but recently I've started using it for
| my non-PHP projects which feels... wrong. Are there any similar
| services that are more agnostic?
| jcims wrote:
| I feel like large enterprises primarily see AWS as a way to
| outsource capital expenses.
| StopHammoTime wrote:
| This is literally the main reason a lot of companies use AWS.
| In Australia, it is very hard for Government Departments to get
| capital expenses approved for infrastructure as it requires a
| lot of rigmarole.
|
| However, once you're in AWS its OpEx, who cares as long as you
| don't break the budget too soon before EOFY.
| jnieminen wrote:
| AWS and Azure are a permission to spend.
| lloydatkinson wrote:
| Are they really though? A serverless event driven
| architecture system I'm working on literally costs less than
| PS10 a month on Azure. Running full blown VMs instead of
| cheaper more appropriate technologies like containers or
| functions will always cost more.
| TriNetra wrote:
| As per our calculation for CloudAlarm [0], as we reach a
| few hundred users, it'd be cheaper to use a dedicated
| instance than serverless (Azure Functions) design. So it
| may vary from system to system depending the amount of work
| you perform for each user.
|
| 0: https://cloudalarm.in/ - btw, you may wish to have daily
| budgeted pace based alerts using it - to inform you when
| the usage spikes up (much faster than Azure's consumption
| threshold based alerts).
| thanatos519 wrote:
| So basically this ... <<Ah, I see you have the machine that
| goes ping. This is my favorite. You see we lease it back from
| the company we sold it to and that way it comes under the
| monthly current budget and not the capital account.>>
|
| https://www.youtube.com/watch?v=tKodtNFpzBA
| igammarays wrote:
| AWS is complexity-as-a-service. This is why, as a one-man
| company, I went baremetal[1]. One flat price, screaming fast
| performance, and massive scalability if you get a beefy enough
| machine[2]. I don't have time to fiddle with k8s, try to figure
| out AWS billing/performance tradeoffs, or deal with untraceable
| performance issues due to noisy neighbours and VM overhead. My
| disaster recovery plan is a simple DB dump script to S3, and I
| know I can get another baremetal server up and running in less
| than 20 minutes.
|
| [1] with IBM Cloud 1 year free startup credits
|
| [2] Let's Encrypt and StackOverflow run their entire databases on
| a single beefy baremetal machine.
| https://letsencrypt.org/2021/01/21/next-gen-database-servers...
| StreamBright wrote:
| What is blocking you from using just EC2?
| count wrote:
| A year of free startup credits, I'd guess.
| ufmace wrote:
| I tend to agree with this. AWS etc is nice if your scale is big
| enough that you need to run a big cloud of dozens of servers
| with complex interconnections for security etc. If a single
| plain old server with database etc on it will do the job fine,
| much better to stick with that.
| Daishiman wrote:
| No?
|
| A bare metal Postgres install needs optimization, and a
| _working_ backup and restore plan (you did test your backups,
| right?).
|
| That's half a day of work lost to get your system set up.
|
| Now your app keeps serious data and you want a read replica.
| How long does that take?
|
| Now you need a separate development environment. Here you go
| again, adding a few hours of work.
|
| Then you need to update your database version. Gotta read the
| changelog and make sure you did everything right, and do it
| in a reaonsable change window.
|
| You just racked up several day's worth of work, and for a DB
| instance with a similar amount of infra work done, the RDS
| solution is way cheaper and easier to provision.
|
| If your time is worth money, there's no reason to go bare
| metal.
| ufmace wrote:
| I still disagree and say yes.
|
| Why does my bare metal Postgres install need optimization?
| My sites mostly doesn't get much traffic, and it runs fine
| as-is. It'd be silly to try and optimize it without being
| able to measure what's actually slow.
|
| Backup systems should also be set up according to desired
| reliability. I have a 10-line bash script that pulls a DB
| dump, zips it, and sends it to S3. Under 5 minutes to
| install, including setting up a new AWS role and keypair
| for it, just have to add in some Ansible commands I already
| have set up, and set a cron job to run once a day.
|
| Read replicas are nice for some applications, but not
| needed for any of my current ones. I probably wouldn't want
| to set one up on bare-metal admittedly, but I'm not
| worrying about it until I need it.
|
| I don't see a need for a separate cloud deployment for a
| development environment for my current application either.
| Would be nice if I had multiple developers and testers
| working on it, but I don't now.
|
| Never needed to update the DB version, and the traffic is
| low enough that I don't need to really care about keeping
| reasonable change windows if I did.
|
| So nope, 10 minutes of work for a low-traffic application.
| Meanwhile, a AWS RDS setup is easy to start, but then you
| have to muck with security groups, VPCs, permissions, etc
| to get it working right. That's not necessarily easy if you
| don't already make use of that stuff.
| tomerbd wrote:
| Which scripting or which infra do you use for automatic
| installation/configuration of your server?
| chrisandchris wrote:
| Not OP, but I did the same and I use
|
| - Ansible for the low-level stuff (like network, mounts,
| iSCSI, configuration files) - Terraform for high-level stuff
| (like DB users)
|
| In my case, as I have several services that use a lot of RAM
| running, I couldn't afford The Cloud but can easily afford a
| colocation. I don't mind the maintenance (it's a couple hours
| each month) and I don't care much if services are down a few
| hours.
|
| If you need something running 24/7 with 99.9%, colocation
| will be more expensive just because of the human you need.
| candiddevmike wrote:
| Why wouldn't you use Ansible for the high level stuff? It
| can easily manage DBs and you wouldn't need another tool.
| chrisandchris wrote:
| As nijave said, the declarative style is the difference.
| I can read a terraform file and already know the exact
| state my system will have.
| nijave wrote:
| Not sure about op but you have to put forth quite a bit
| of effort to get declarative infra with Ansible. Some of
| it is declarative out of the box but a lot is imperative.
|
| The main difference, if I revoke a DB privilege, I have
| to add a line to Ansible with a REVOKE in most cases
| versus Terraform you just delete the config line and the
| tool realizes during its diff stage and performs the
| removal change (it's stateful and declarative)
| igammarays wrote:
| Laravel Forge
| FpUser wrote:
| On bare metal as well. Not a trace of doubt.
| icecap12 wrote:
| The comment on "complexity-as-a-service" resonates. IMHO, it's
| primarily because they want to make a product out of
| everything, including stuff companies build to manage their own
| AWS implementations. Instead of a simple list of products, its
| a complex list, with lots of nuances per each service offering.
| The other day, I was giving a high level summary of cloud
| technology to an intern; there was a point where I couldn't
| even find the AWS service I was telling her about from the
| product list, which annoyed me. Maybe that's more a comment
| about the marketing site though, but still, when your product
| catalog gets that big, its hard to avoid ridiculous levels of
| complexity.
| arbuge wrote:
| From that Let's Encrypt article: "We have a number of replicas
| of the database active at any given time, and we direct some
| read operations to replica database servers to reduce load on
| the primary."
| shreddit wrote:
| Their config costs around 230,000$, which i think is impressive
| for a single server
| pibefision wrote:
| +1 also it's easy to use Docker containers and Traeffik as
| reverse proxy to manage many services.
| lysecret wrote:
| Ok im going to admit to a mistake revolving around NAT gateways
| and Lambdas. So, i basically wanted to connect a Lambda to a
| Postgres / RDS database, for that I had to put into a private
| VPC, but the lambdas still had to talk to the world (a lot) so i
| just put a nat gateway around it no biggy. Well, end of the story
| on one day i produced 2000 Euro in cost for the Nat gateway haha
| nickjj wrote:
| My favorite billing mistake was forgetting to delete an unused
| elastic IP address and then realizing I was being charged $34 /
| month for 2 months just to have it exist while doing nothing.
|
| Edit: It's exactly $33.62 and I was mistaken on what caused it.
| It came from having a NAT Gateway just idling which is $0.045 per
| hour x 747 hours = $33.62 on us-east-1.
|
| I know it's not the biggest mistake ever, but these things creep
| up on you when you use CloudFormation and it continuously fails
| to delete resources so you're left having to manually trace
| through a bunch of resources. It's easy to leave things hanging.
| jrochkind1 wrote:
| unused Elastic IP pricing looks to me like $3.60/month on their
| pricing page. ($0.005 per hour). What am I missing to get to
| $34/month? (Or did you have 10 of em?)
|
| https://aws.amazon.com/ec2/pricing/on-demand/#Elastic_IP_Add...
| nickjj wrote:
| Thanks, I edited my post to correct it. It was a single NAT
| gateway that's $33.62 / month.
| jbverschoor wrote:
| Most common made mistake: assuming that your data is safe on an
| EC2 instance (ephemeral storage)
| arno1 wrote:
| Discover Akash Network!
|
| Censorship-resistant, permissionless, and self-sovereign, Akash
| Network is the world's first open source cloud.
|
| It's at the early stages, the amount of deployments is steadily
| growing!
|
| Soon GPU compute and persistent storage!
|
| As well as you can already become a provider and earn AKT tokens
| (which are neat, driven by the Cosmos based blockchain)
|
| https://akash.network
|
| https://akashlytics.com/price-compare
| daneel_w wrote:
| _" Technically they are a smidgen slower than Intel for certain
| workloads."_
|
| In my experience, after migrating several servers with quite
| varying workloads, they're _faster_ than Intel - and more than a
| smidgen. Just as is the general case with current AMD Ryzen vs
| Intel.
| sebazzz wrote:
| In summary: Either overprovisioning, or not realising every extra
| CPU cycle or I/O operation costs extra money.
|
| This is, of course, the real way "the cloud" makes money.
| Carefully tuned, it can no doubt be cheaper than do-it-yourself,
| however, it is also quite easy to make a lot of costs.
| gizdan wrote:
| Contrary to popular believe, the case for going to the Cloud
| isn't cost saving, it is flexibility and value for money. It'll
| likely cost you around the same if you run it in a DC, but you
| won't have features like auto scaling, increased security, and
| much more.
| luckylion wrote:
| About the same? Last I checked for our somewhat static work
| load on a bunch of webservers, AWS would be x10 in pricing.
| Not to mention that you need someone who has deep AWS
| knowledge and experience to manage your system, just like you
| need someone who manages your dedicated servers in a DC.
|
| It's great for workloads that fluctuate extremely, or require
| massive scaling in very short time. Not sure about the
| increased security. If you run your images on EC2, it's still
| up to you to not mess up the config.
| jimmaswell wrote:
| How many workloads actually fluctuate so extremely and
| unpredictably?
| sokoloff wrote:
| Lightsail is more reasonably priced for a lot of simple web
| serving use cases.
| fermentation wrote:
| It's also super easy to use. I have an instance hosting a
| game server for friends. I might be wasting money since
| the server sits idle about half the time though.
| maccard wrote:
| We're actively planning an aws workload right now, and with
| reserved instances for the baseline workload, the pricing
| is closer to 1.5-2x, but the cost savings of only needing
| to scale up for a couple of hours per week make up for
| that. Yes it would be cheaper to run out own infra for the
| baseload and burst into aws, but that adds operational load
| onto the development team, which defeats the purpose of
| going with AWS in the first place
| Viliam1234 wrote:
| From the perspective of a developer, flexibility is a double-
| edged weapon.
|
| Before cloud: we have database quota of a few gigabytes, and
| once in a few years we need to justify to management why the
| quota should be doubled.
|
| After cloud: whenever we add a new table, or a new column, or
| import lots of data, the invoice slightly increases, the
| management notices, and we need to justify the extra
| megabytes.
| steveBK123 wrote:
| On billing.. they will never do it, but on smaller accounts they
| could build trust by offering some sort of "prepaid" mode like
| cell phone services do at the low end.
|
| That is - you deposit $X in your account, and AWS nukes your live
| services if you breach it. The worst that ever happens is you are
| out sunk cost of the $X you had already deposited.
| noir_lord wrote:
| I nearly made myself a very nice footgun not long since.
|
| So MediaConvert (video transcoding), direct s3 upload to s3
| bucket, bucket fires event to my application, my application
| builds the job and submits it to media convert with the output
| bucket as the destination.
|
| Straight forward enough, unless you happen to be copying a config
| tired and put your input/output buckets as the same bucket...
|
| Fortunately previous-me was paranoid enough to have put in an if
| check and die if they where the same but otherwise that could
| have cost a lot of money.
| swyx wrote:
| why would MediaConvert not build that if check in? perhaps a
| good feature request for them.
| noir_lord wrote:
| Because you can write back to the same bucket at different
| prefixes if you want to.
|
| It's simply simpler to split the buckets in my case.
|
| I added further checks to not only check the bucket made
| sense but also that the inbound and outbound had the correct
| prefixes.
|
| So if another person does the same it'll catch both ways.
___________________________________________________________________
(page generated 2021-09-11 23:00 UTC)