[HN Gopher] Understanding AWS End of Service Life Is a Key FinOp...
       ___________________________________________________________________
        
       Understanding AWS End of Service Life Is a Key FinOps
       Responsibility
        
       Author : noctarius
       Score  : 43 points
       Date   : 2024-04-18 12:05 UTC (10 hours ago)
        
 (HTM) web link (www.fairwinds.com)
 (TXT) w3m dump (www.fairwinds.com)
        
       | noctarius wrote:
       | Article by Mary Henry. I was shocked to see how much more the
       | extended support (per hour) cost is for Kubernetes on AWS.
       | 
       | Haven't had that situation myself on AWS yet, but ran into it a
       | few times on Azure
       | 
       | I can't remember to have paid extra on Azure though, but maybe we
       | did. Certainly not 6x the price though.
       | 
       | PS: not sure why it got flagged the first time, but I think
       | because I used a different title. Sorry.
        
         | qqtt wrote:
         | AWS also recently ended support for Mysql 5, so if you had an
         | RDS instance with that version running past the cutoff, your
         | support costs ballooned exorbitantly.
        
           | noctarius wrote:
           | Seems like I'm a lucky one. Neither using RDS nor MySQL. But
           | seriously, ouch. I mean, I get why they want people to
           | migrate to supported versions but ...
        
             | SteveNuts wrote:
             | I wish we could implement this internally via chargebacks.
             | The teams that refuse to upgrade their stuff _should_ be
             | forced to pay for the externalities they cause.
        
           | VectorLock wrote:
           | Yup this one hit me hard.
           | USE2-ExtendedSupport:Yr1-Yr2:MySQL5.7 sent my bill up 70%.
        
             | hughesjj wrote:
             | How long was it before the notice and you getting charged
             | extra?
        
         | res0nat0r wrote:
         | We just got emails yesterday about the EKS price increase. It's
         | another reason we're trying to move the main app to the vendors
         | SaaS because I don't have enough time and resources to be a
         | fulltime k8s admin. The ecosystem moves way too fast and
         | upgrades/deprecation happens way too quickly to keep up and to
         | have time test / plan / rollout proper upgrades without
         | breaking our critical production workloads.
        
       | chrisjj wrote:
       | > running unsupported versions makes it harder to get help from a
       | community that's currently focused on the latest version
       | 
       | Great example of misuse of that simple word 'that'.
       | 
       | Should be 'which'.
        
         | TecoAndJix wrote:
         | Always learning something new[1]:
         | 
         | "The difference between which and that depends on whether the
         | clause is restrictive or nonrestrictive.
         | 
         | In a restrictive clause, use that.
         | 
         | In a nonrestrictive clause, use which.
         | 
         | Remember, which is as disposable as a sandwich wrapper. If you
         | can remove the clause without destroying the meaning of the
         | sentence, the clause is nonessential (another word for
         | nonrestrictive), and you can use which.
         | 
         | [1] https://www.grammarly.com/blog/which-vs-
         | that/#:~:text=Which%....
        
         | pas wrote:
         | can you please explain the difference in semantics? what does
         | this mean with 'that' and why that is
         | inconsistent/incorrect/illogical compared to the meaning with
         | 'which'? thanks!
        
           | chrisjj wrote:
           | https://writeanything.wordpress.com/2008/09/20/grammar-
           | girl-...
        
       | htrp wrote:
       | This is also the right way to deprecate. Charge people an arm and
       | a leg to keep things running (and eventually force them to
       | migrate).
        
         | noctarius wrote:
         | True, but I guess it'll be a surprise to many. And,
         | unfortunately, upgrading isn't always the easiest thing with
         | deprecations and stuff
        
         | solatic wrote:
         | 100%. People are responsible for an ever-increasing amount of
         | things; people will focus on business priorities and stuff that
         | is working will be left the hell alone. As long as the bills
         | are manageable and the business pays - the lights will be kept
         | on _forever_. Passing increasing support costs to customers
         | realigns interests between customer and provider without danger
         | of user impact.
         | 
         | And for Kubernetes, honestly, charging 6x for extended support
         | is probably a bargain, considering the pace of change and
         | difficulty of hiring engineers for unsexy maintenance work.
        
           | mdaniel wrote:
           | I do appreciate that the devil is always in the details, but
           | I'll be straight that their new(?) "Upgrade insights" tab/api
           | <https://docs.aws.amazon.com/eks/latest/userguide/cluster-
           | ins...> goes a long way toward driving down the upgrade risk
           | from a "well, what are we using that's going to get cut in
           | the new version?"
           | 
           | We just rolled off of their extended version and it was about
           | 19 minutes to upgrade the control plane, no downtime, and
           | then varying between 10 minutes and over an hour to upgrade
           | the vpc-cni add-on. It seemed just completely random, and
           | without any cancel button. We also had to manually patch
           | kube-proxy container version, which OT1H, they did document,
           | but OTOH, well, I didn't put those DaemonSets _on_ the Nodes
           | so why do I suddenly have to manage its version? Weird
           | 
           | Touching CNI is always a potential downtime inducing event,
           | but for the most part it was manageable
        
         | TheP1000 wrote:
         | Agreed. I would imagine the previous approach of forced
         | upgrades ended up burning lots of customers in worse ways than
         | just their pocketbook.
        
       | VectorLock wrote:
       | Had this bite me for my small-scale personal AWS setup. Have an
       | AWS account I run some personal sites on, a Mastodon instance,
       | etc. Got some Billing Alarms I setup that my bill went from
       | normally $100 to $180. Got a $75 charge for
       | USE2-ExtendedSupport:Yr1-Yr2:MySQL5.7 I mean I'm very used to
       | Amazon's ridiculous fee structure but even this one caught me for
       | a loop.
        
         | noctarius wrote:
         | Ouch. Glad you had the alarm (and that it reacted "early
         | enough"). Anyhow, I think you may not be along with that
         | surprise.
        
         | steelaz wrote:
         | To be fair to AWS, they announced the deprecation of MySQL 5.7
         | in January 2021, and many emails warned of this change
         | throughout 2024.
        
       | neilv wrote:
       | Sounds like an entropy problem.
       | 
       | https://www.youtube.com/watch?v=y8OnoxKotPQ
       | 
       | > _Which we get from EKS -- our entropy chaos service._
        
       | JohnMakin wrote:
       | I'm fine with forcing upgrades this way - however, from an
       | operations standpoint, it is an absolute nightmare.
       | 
       | For one, depending on your situation/CRD's/automation, doing
       | these upgrades in-place can be next to impossible. Updating an
       | EKS minor version can only be done one version at a time - e.g.,
       | if you want to go from 1.24 -> 1.28, you need to do 1.25, then
       | 1.26, then 1.27, then 1.28. So teams without a lot of resources
       | are probably in a tough spot depending on how far they are behind
       | here. Often, it's _far_ more efficient to build an entirely new
       | cluster from scratch and then cut over - which seems ridiculous.
       | 
       | Why are upgrading EKS versions such a pain? Well, if you're using
       | any cluster add-ons, for one, all those need to be upgraded to
       | the correct versions, and the compatibility matrix there can be
       | rough. Stuff often breaks at this stage. Care needs to be taken
       | around PV's, the CNI, and god help you if you have some helm
       | charts or CRD's that rely on some deprecated EKS API - even if
       | the upstream repository has a fix for it, you will often find
       | this yak-shaving nightmare of fixing all the stuff that breaks on
       | upgrading that, and then whatever downstream services THAT
       | service breaks - etc.
       | 
       | What is the solution? I don't know. I'm not a kubernetes
       | architect, but I work with it a lot. I understand there are
       | security patches and improvements constantly, but the release
       | cycle, at least from an infrastructure/operations perspective,
       | IME places considerable strain on teams, to the point where I
       | have literally seen a role in a company whose primary
       | responsibility was upgrading EKS cluster versions.
       | 
       | I have a sneaking suspicion this is to try to encourage people to
       | migrate to more expensive managed container orchestration
       | services.
        
         | noctarius wrote:
         | Didn't think of this suspicion beforehand, but doesn't sound
         | like a total miss.
        
         | rho138 wrote:
         | I recently did the upgrade from 1.24->1.28 on a neglected
         | cluster after testing the upgrade in a dev environment and it
         | was honestly not that terrible. It really comes down to having
         | the capability and man hours to manage the procedure. In
         | reality the longest part was waiting for cluster nodes to
         | upgrade to X version of k8s, but the complete upgrade only took
         | 3 weeks of testing and a single 4 hour outage with no loss in
         | processing over the period.
         | 
         | Realistically those workloads being run would have been better
         | suited in an horizontal-scaling EC2 deployment but that was a
         | future goal that never came to fruition.
        
           | JohnMakin wrote:
           | Like I said, it depends on your situation. Sometimes a
           | /v1/beta api gets deprecated and causes complete chaos for a
           | deployment. Sometimes your IAC is resistant to these kinds of
           | frequent changes. There's really a billion scenarios.
           | 
           | For reference, I have done upgrades from 1.12 -> 1.28 and
           | most of the time if get into a messy project and I can get
           | away with it, I will just rebuild a cluster from scratch.
        
         | cjk2 wrote:
         | Yeah this. My average day when I go near EKS upgrades: Waltz
         | in, fuck up the ALB ingress controller in some new and
         | interesting way, spend all day bouncing AWS support tickets
         | around, find out it was AWS's fault, find half the manifest
         | YAML schema in the universe is now deprecated, sob into my now
         | soaking wet trousers and wonder why the fuck I ended up doing
         | this for living.
         | 
         | Yesterday I spent 3 hours trying to fix something and find it's
         | an indent error somewhere.
        
         | watermelon0 wrote:
         | EKS release cycle is related to Kubernetes release cycle. I'm
         | not sure it's fair to expect AWS to freely support outdated K8s
         | versions, that don't have upstream support.
         | 
         | If K8s would be backwards compatible, upgrading would be a lot
         | easier, and if they would support LTS releases, like other
         | projects, manual upgrades would be needed only every X years.
         | 
         | For example, the reason that you can use PostgreSQL with the
         | same major version for 5 years on RDS is due to PostgreSQL
         | actively supporting it, and minor versions are non-breaking and
         | can be seamlessly applied (restart or failover to standby
         | replica is still needed during upgrade).
        
           | JohnMakin wrote:
           | Completely understand why it is this way, and like I said I
           | don't know the solution - unless AWS was able to or would
           | want to fork Kubernetes in the same way that they did
           | ElasticSearch, but that is understandable why they may not
           | want to do that. Was mostly just griping that this process is
           | a complete pain in the ass for tons of people (IME).
        
         | pid-1 wrote:
         | As K8s matures it's likely we will get some kind of LTS
         | versioning scheme.
         | 
         | Having new realeases so often for such a core infrastructure
         | component is kinda insane unless it was explicitly architected
         | to allow seeamless upgrades.
        
           | noctarius wrote:
           | I hope you're right. Apart from that, yes I think it's
           | necessary.
        
           | mdaniel wrote:
           | There's a tiny bit of nuance there about "allow seamless
           | upgrades" in that they do what I think is a fantastic job of
           | version skew toleration between all the parts that interact
           | (kubectl, kubelet, apiserver, etc). So that part, I think, is
           | not the long pole in any such tent, especially because if the
           | control-plane gets wiped out, kubelet will continue to manage
           | the last state of affairs it knew about, and traffic will
           | continue to flow to those pods
           | 
           | The hairy bit is the rando junk that gets shoved _into_
           | clutsers, without any sane packaging scheme to roll it up or
           | back. I even recently had to learn the deep guts of the
           | sh.helm.v1.foo secret because we accidentally left an old
           | release in a cluster which no longer supported its
           | apiVersion. No problem, says I, $(helm uninstall  && helm
           | install --version new-thing) but har-de-har-har helm uses
           | that Secret to fully rehydrate the whole manifest of the
           | release _before deleting it_ so when helm tries (effectively)
           | kubectl delete thing /v1beta1/oldthing and pukes, well, no
           | uninstall for you, even if those objects are already gone
        
       | thebeardisred wrote:
       | This is something most people don't realize is an aspect of Red
       | Hat's value. Extended Lifecycle Support (ELS) + Extended Update
       | Support (EUS) are available _just in case_ you _really_ can 't
       | figure out how to migrate off of those Red Hat Enterprise Linux 6
       | systems running on x86 (32 bit).
       | https://access.redhat.com/support/policy/updates/errata
        
       ___________________________________________________________________
       (page generated 2024-04-18 23:01 UTC)