[HN Gopher] Understanding AWS End of Service Life Is a Key FinOp...
___________________________________________________________________
Understanding AWS End of Service Life Is a Key FinOps
Responsibility
Author : noctarius
Score : 43 points
Date : 2024-04-18 12:05 UTC (10 hours ago)
(HTM) web link (www.fairwinds.com)
(TXT) w3m dump (www.fairwinds.com)
| noctarius wrote:
| Article by Mary Henry. I was shocked to see how much more the
| extended support (per hour) cost is for Kubernetes on AWS.
|
| Haven't had that situation myself on AWS yet, but ran into it a
| few times on Azure
|
| I can't remember to have paid extra on Azure though, but maybe we
| did. Certainly not 6x the price though.
|
| PS: not sure why it got flagged the first time, but I think
| because I used a different title. Sorry.
| qqtt wrote:
| AWS also recently ended support for Mysql 5, so if you had an
| RDS instance with that version running past the cutoff, your
| support costs ballooned exorbitantly.
| noctarius wrote:
| Seems like I'm a lucky one. Neither using RDS nor MySQL. But
| seriously, ouch. I mean, I get why they want people to
| migrate to supported versions but ...
| SteveNuts wrote:
| I wish we could implement this internally via chargebacks.
| The teams that refuse to upgrade their stuff _should_ be
| forced to pay for the externalities they cause.
| VectorLock wrote:
| Yup this one hit me hard.
| USE2-ExtendedSupport:Yr1-Yr2:MySQL5.7 sent my bill up 70%.
| hughesjj wrote:
| How long was it before the notice and you getting charged
| extra?
| res0nat0r wrote:
| We just got emails yesterday about the EKS price increase. It's
| another reason we're trying to move the main app to the vendors
| SaaS because I don't have enough time and resources to be a
| fulltime k8s admin. The ecosystem moves way too fast and
| upgrades/deprecation happens way too quickly to keep up and to
| have time test / plan / rollout proper upgrades without
| breaking our critical production workloads.
| chrisjj wrote:
| > running unsupported versions makes it harder to get help from a
| community that's currently focused on the latest version
|
| Great example of misuse of that simple word 'that'.
|
| Should be 'which'.
| TecoAndJix wrote:
| Always learning something new[1]:
|
| "The difference between which and that depends on whether the
| clause is restrictive or nonrestrictive.
|
| In a restrictive clause, use that.
|
| In a nonrestrictive clause, use which.
|
| Remember, which is as disposable as a sandwich wrapper. If you
| can remove the clause without destroying the meaning of the
| sentence, the clause is nonessential (another word for
| nonrestrictive), and you can use which.
|
| [1] https://www.grammarly.com/blog/which-vs-
| that/#:~:text=Which%....
| pas wrote:
| can you please explain the difference in semantics? what does
| this mean with 'that' and why that is
| inconsistent/incorrect/illogical compared to the meaning with
| 'which'? thanks!
| chrisjj wrote:
| https://writeanything.wordpress.com/2008/09/20/grammar-
| girl-...
| htrp wrote:
| This is also the right way to deprecate. Charge people an arm and
| a leg to keep things running (and eventually force them to
| migrate).
| noctarius wrote:
| True, but I guess it'll be a surprise to many. And,
| unfortunately, upgrading isn't always the easiest thing with
| deprecations and stuff
| solatic wrote:
| 100%. People are responsible for an ever-increasing amount of
| things; people will focus on business priorities and stuff that
| is working will be left the hell alone. As long as the bills
| are manageable and the business pays - the lights will be kept
| on _forever_. Passing increasing support costs to customers
| realigns interests between customer and provider without danger
| of user impact.
|
| And for Kubernetes, honestly, charging 6x for extended support
| is probably a bargain, considering the pace of change and
| difficulty of hiring engineers for unsexy maintenance work.
| mdaniel wrote:
| I do appreciate that the devil is always in the details, but
| I'll be straight that their new(?) "Upgrade insights" tab/api
| <https://docs.aws.amazon.com/eks/latest/userguide/cluster-
| ins...> goes a long way toward driving down the upgrade risk
| from a "well, what are we using that's going to get cut in
| the new version?"
|
| We just rolled off of their extended version and it was about
| 19 minutes to upgrade the control plane, no downtime, and
| then varying between 10 minutes and over an hour to upgrade
| the vpc-cni add-on. It seemed just completely random, and
| without any cancel button. We also had to manually patch
| kube-proxy container version, which OT1H, they did document,
| but OTOH, well, I didn't put those DaemonSets _on_ the Nodes
| so why do I suddenly have to manage its version? Weird
|
| Touching CNI is always a potential downtime inducing event,
| but for the most part it was manageable
| TheP1000 wrote:
| Agreed. I would imagine the previous approach of forced
| upgrades ended up burning lots of customers in worse ways than
| just their pocketbook.
| VectorLock wrote:
| Had this bite me for my small-scale personal AWS setup. Have an
| AWS account I run some personal sites on, a Mastodon instance,
| etc. Got some Billing Alarms I setup that my bill went from
| normally $100 to $180. Got a $75 charge for
| USE2-ExtendedSupport:Yr1-Yr2:MySQL5.7 I mean I'm very used to
| Amazon's ridiculous fee structure but even this one caught me for
| a loop.
| noctarius wrote:
| Ouch. Glad you had the alarm (and that it reacted "early
| enough"). Anyhow, I think you may not be along with that
| surprise.
| steelaz wrote:
| To be fair to AWS, they announced the deprecation of MySQL 5.7
| in January 2021, and many emails warned of this change
| throughout 2024.
| neilv wrote:
| Sounds like an entropy problem.
|
| https://www.youtube.com/watch?v=y8OnoxKotPQ
|
| > _Which we get from EKS -- our entropy chaos service._
| JohnMakin wrote:
| I'm fine with forcing upgrades this way - however, from an
| operations standpoint, it is an absolute nightmare.
|
| For one, depending on your situation/CRD's/automation, doing
| these upgrades in-place can be next to impossible. Updating an
| EKS minor version can only be done one version at a time - e.g.,
| if you want to go from 1.24 -> 1.28, you need to do 1.25, then
| 1.26, then 1.27, then 1.28. So teams without a lot of resources
| are probably in a tough spot depending on how far they are behind
| here. Often, it's _far_ more efficient to build an entirely new
| cluster from scratch and then cut over - which seems ridiculous.
|
| Why are upgrading EKS versions such a pain? Well, if you're using
| any cluster add-ons, for one, all those need to be upgraded to
| the correct versions, and the compatibility matrix there can be
| rough. Stuff often breaks at this stage. Care needs to be taken
| around PV's, the CNI, and god help you if you have some helm
| charts or CRD's that rely on some deprecated EKS API - even if
| the upstream repository has a fix for it, you will often find
| this yak-shaving nightmare of fixing all the stuff that breaks on
| upgrading that, and then whatever downstream services THAT
| service breaks - etc.
|
| What is the solution? I don't know. I'm not a kubernetes
| architect, but I work with it a lot. I understand there are
| security patches and improvements constantly, but the release
| cycle, at least from an infrastructure/operations perspective,
| IME places considerable strain on teams, to the point where I
| have literally seen a role in a company whose primary
| responsibility was upgrading EKS cluster versions.
|
| I have a sneaking suspicion this is to try to encourage people to
| migrate to more expensive managed container orchestration
| services.
| noctarius wrote:
| Didn't think of this suspicion beforehand, but doesn't sound
| like a total miss.
| rho138 wrote:
| I recently did the upgrade from 1.24->1.28 on a neglected
| cluster after testing the upgrade in a dev environment and it
| was honestly not that terrible. It really comes down to having
| the capability and man hours to manage the procedure. In
| reality the longest part was waiting for cluster nodes to
| upgrade to X version of k8s, but the complete upgrade only took
| 3 weeks of testing and a single 4 hour outage with no loss in
| processing over the period.
|
| Realistically those workloads being run would have been better
| suited in an horizontal-scaling EC2 deployment but that was a
| future goal that never came to fruition.
| JohnMakin wrote:
| Like I said, it depends on your situation. Sometimes a
| /v1/beta api gets deprecated and causes complete chaos for a
| deployment. Sometimes your IAC is resistant to these kinds of
| frequent changes. There's really a billion scenarios.
|
| For reference, I have done upgrades from 1.12 -> 1.28 and
| most of the time if get into a messy project and I can get
| away with it, I will just rebuild a cluster from scratch.
| cjk2 wrote:
| Yeah this. My average day when I go near EKS upgrades: Waltz
| in, fuck up the ALB ingress controller in some new and
| interesting way, spend all day bouncing AWS support tickets
| around, find out it was AWS's fault, find half the manifest
| YAML schema in the universe is now deprecated, sob into my now
| soaking wet trousers and wonder why the fuck I ended up doing
| this for living.
|
| Yesterday I spent 3 hours trying to fix something and find it's
| an indent error somewhere.
| watermelon0 wrote:
| EKS release cycle is related to Kubernetes release cycle. I'm
| not sure it's fair to expect AWS to freely support outdated K8s
| versions, that don't have upstream support.
|
| If K8s would be backwards compatible, upgrading would be a lot
| easier, and if they would support LTS releases, like other
| projects, manual upgrades would be needed only every X years.
|
| For example, the reason that you can use PostgreSQL with the
| same major version for 5 years on RDS is due to PostgreSQL
| actively supporting it, and minor versions are non-breaking and
| can be seamlessly applied (restart or failover to standby
| replica is still needed during upgrade).
| JohnMakin wrote:
| Completely understand why it is this way, and like I said I
| don't know the solution - unless AWS was able to or would
| want to fork Kubernetes in the same way that they did
| ElasticSearch, but that is understandable why they may not
| want to do that. Was mostly just griping that this process is
| a complete pain in the ass for tons of people (IME).
| pid-1 wrote:
| As K8s matures it's likely we will get some kind of LTS
| versioning scheme.
|
| Having new realeases so often for such a core infrastructure
| component is kinda insane unless it was explicitly architected
| to allow seeamless upgrades.
| noctarius wrote:
| I hope you're right. Apart from that, yes I think it's
| necessary.
| mdaniel wrote:
| There's a tiny bit of nuance there about "allow seamless
| upgrades" in that they do what I think is a fantastic job of
| version skew toleration between all the parts that interact
| (kubectl, kubelet, apiserver, etc). So that part, I think, is
| not the long pole in any such tent, especially because if the
| control-plane gets wiped out, kubelet will continue to manage
| the last state of affairs it knew about, and traffic will
| continue to flow to those pods
|
| The hairy bit is the rando junk that gets shoved _into_
| clutsers, without any sane packaging scheme to roll it up or
| back. I even recently had to learn the deep guts of the
| sh.helm.v1.foo secret because we accidentally left an old
| release in a cluster which no longer supported its
| apiVersion. No problem, says I, $(helm uninstall && helm
| install --version new-thing) but har-de-har-har helm uses
| that Secret to fully rehydrate the whole manifest of the
| release _before deleting it_ so when helm tries (effectively)
| kubectl delete thing /v1beta1/oldthing and pukes, well, no
| uninstall for you, even if those objects are already gone
| thebeardisred wrote:
| This is something most people don't realize is an aspect of Red
| Hat's value. Extended Lifecycle Support (ELS) + Extended Update
| Support (EUS) are available _just in case_ you _really_ can 't
| figure out how to migrate off of those Red Hat Enterprise Linux 6
| systems running on x86 (32 bit).
| https://access.redhat.com/support/policy/updates/errata
___________________________________________________________________
(page generated 2024-04-18 23:01 UTC)