[HN Gopher] Kubernetes Needs an LTS
       ___________________________________________________________________
        
       Kubernetes Needs an LTS
        
       Author : todsacerdoti
       Score  : 110 points
       Date   : 2023-12-04 13:01 UTC (9 hours ago)
        
 (HTM) web link (matduggan.com)
 (TXT) w3m dump (matduggan.com)
        
       | sofixa wrote:
       | This, like a recent LTS discussion I saw for a different tool,
       | ignores one tiny little detail that makes the whole discussion
       | kind of moot.
       | 
       | LTS doesn't mean it's immune to bugs or security vulnerabilities.
       | It just means that the major release is updated and supported
       | longer - but you still need to be able to apply patches and
       | security fixes to that major release. Yes, it's easier to go from
       | 1.20.1 to 1.20.5 than to 1.21, because there's less chance of
       | breakage and less things that will change, but the process is
       | pretty much the same - check for breaking changes, read
       | changelogs, apply everything. The risk is less, might be slightly
       | faster, but fundamentally, it's the same process. If the process
       | is too heavy and takes you too long, having it be _slightly_
       | faster won 't be a gamechanger.
       | 
       | So LTS brings slight advantages to the operator, while adding
       | potentially significant complexity to the developer (generally
       | backporting fixes into years old versions isn't fun).
       | 
       | The specific proposed LTS falvour is also hardcore, without an
       | upgrade path to the next LTS. The exact type of org that needs an
       | LTS will be extremely reluctant to having to redo everything, 2
       | years later, with potentially drastic breaking changes making
       | that change very hard.
        
         | x86x87 wrote:
         | That's not how LTS is supposed to work. You should be able to
         | uograde effortlessly with minimum risk.
         | 
         | If you're at a point where a patch for LTS looks like an
         | upgrade to the new version you've screwed up LTS.
         | 
         | Also, getting to the point of having an LTS and actually
         | providing the support is expensive. You need experts that can
         | backport security fixes and know the product inside out.
        
           | sofixa wrote:
           | > That's not how LTS is supposed to work. You should be able
           | to uograde effortlessly with minimum risk.
           | 
           | How do you do that on something as complex and with as many
           | moving parts as Kubernetes? And how do you as an operator
           | update that many things without checking there's no breaking
           | changes in the patch?
        
             | x86x87 wrote:
             | bingo! how do you do it? and do you want that kind of
             | complexity to begin with?
        
               | sofixa wrote:
               | Don't ask me, I'm firmly in the HashiCorp Nomad camp:
               | https://atodorov.me/2021/02/27/why-you-should-take-a-
               | look-at... (Note: quite old, some things are no longer
               | true most notably around Nomad downsides)
        
               | rigrassm wrote:
               | I'm with you, Nomad is highly underrated!
        
             | Rantenki wrote:
             | We upgrade our distros pretty much fearlessly, all the
             | time. While I have had breakage from Kernel upgrades,
             | they've been very rare (and generally related to third
             | party closed drivers). Kubernetes is _not_ more complicated
             | than the Linux kernel, but it is much more dangerous to
             | upgrade in place.
        
               | eddythompson80 wrote:
               | > Kubernetes is _not_ more complicated than the Linux
               | kernel, but it is much more dangerous to upgrade in
               | place.
               | 
               | eh, the kernel is an incredibly mature project with 1
               | machine scope. The kernel also has decades of operating
               | systems research and literature to build on. Kubernetes
               | in comparison is new, distributed and exploring uncharted
               | territory in terms of feature set and implementation.
               | Sometimes bad decisions are made, and it's fair to not
               | want to live with them forever.
               | 
               | The kernel project looks very different today than it did
               | in 1999.
               | 
               | There is a happy medium though, especially that
               | Kubernetes is kinda far from it.
        
             | jen20 wrote:
             | My answer is simple: don't. Use something far simpler and
             | with fewer moving parts than Kubernetes, and something
             | where crucial parts of the ecosystem required to make
             | things even basically work are not outsourced to third
             | party projects.
             | 
             | Nomad is a good solution.
        
           | freedomben wrote:
           | I don't see anywhere that GP said an LTS patch would take
           | effort. They said the upgrade path to the next LTS would.
           | 
           | If you are talking about upgrade from LTS to LTS, can you
           | give an example project where that is effortless? And if so,
           | how do they manage to innovate and modernize without ever
           | breaking backwards compatibility?
        
             | x86x87 wrote:
             | Here: "it's easier to go from 1.20.1 to 1.20.5 than to
             | 1.21, because there's less chance of breakage and less
             | things that will change, but the process is pretty much the
             | same"
             | 
             | LTS to LTS is another story. But the point is that
             | L=LongTerm so in theory you're only going to do this
             | exercise twice in a decade.
             | 
             | > manage to innovate and modernize without ever breaking
             | backwards yeah. fuck backwards compatibility. that is for
             | suckers. how about stopping the madness for a second and
             | thinking about what you are building when you build it?
        
               | pixl97 wrote:
               | > in theory you're only going to do this exercise twice
               | in a decade.
               | 
               | So I've seen things like this in corporations many times
               | and it typically works like this...
               | 
               | Well trained team sets up environment. Over time team
               | members leave and only less senior members remain. They
               | are capable of patching the system and keeping it
               | running. Eventually the number of staff even capable of
               | patching the system diminishes. System reaches end of
               | life and vendor demands upgrading. System falls out of
               | security compliance and everything around it is an
               | organizational exception in one way or another.
               | Eventually at massive cost from outside contractors the
               | system gets upgraded and the cycle begins all over again.
               | 
               | Not being able to upgrade these systems is about the lack
               | of and loss of capable internal staff.
        
         | Karellen wrote:
         | > but the process is pretty much the same - check for breaking
         | changes,
         | 
         | Unless you're relying on buggy behaviour, there should be no
         | breaking changes in an LTS update.
         | 
         | (...of course, there's no guarantee that you're not relying on
         | buggy (or, at least, accidental) behaviour. People relying on
         | `memcpy(3)` working as expected when the ranges overlap, simply
         | because it happened to do so with historic versions of the
         | `libc` implementation they most commonly happened to test with,
         | is one example. But see also obxkcd https://xkcd.com/1172/ and
         | https://www.hyrumslaw.com/ )
        
           | sofixa wrote:
           | > Unless you're relying on buggy behaviour, there should be
           | no breaking changes in an LTS update.
           | 
           | Or a security vulnerability has forced a breaking change. Or
           | any other issue, which is why you _have_ to check.
        
             | Karellen wrote:
             | > Or a security vulnerability has forced a breaking change.
             | 
             | Theoretically, I suppose?
             | 
             | Do you have a historic example in mind?
             | 
             | I've been running Debian "stable" in its various
             | incarnations on servers for over a decade, and I can't
             | remember any time any service on any installation I've run
             | had such an issue. But my memory is pretty bad, so I might
             | have missed one. (Or even a dozen!) But I have `unattented-
             | upgrades` installed on all my live servers right now, and
             | don't lose a wink of sleep over it.
        
               | sofixa wrote:
               | Yes, I have an example in mind -
               | https://askubuntu.com/questions/1376118/ubuntu-20-04-lts-
               | una...
               | 
               | Yes, it's Ubuntu, but doesn't matter - sometimes security
               | fixes require a breaking change and there's nothing that
               | can be done to avoid it.
        
               | natbennett wrote:
               | This happens _all the time_ on systems that are running
               | hundreds of thousands of apps across hundreds of
               | customers.
               | 
               | The worst one I know: for a while basically all Cloud
               | Foundry installations were stuck behind a patch release
               | because the routing component upgraded their Go version
               | and that Go version included an allegedly non-breaking-
               | change that caused it to reject requests with certain
               | kinds of malformed headers.
               | 
               | The Spring example app has a header with the specific
               | problem impacted. And the vast majority of Cloud Foundry
               | apps are Spring apps, many of which got started by
               | copying the Spring example app.
               | 
               | So upgrading CF past this patch release required a code
               | change to the apps running on the platform. Which the
               | people running Cloud Foundry generally can't get --
               | there's usually a team of like 12 people running them and
               | then 1000s of app devs.
        
               | toast0 wrote:
               | OpenSSL isn't necessarily the best at LTS, but 1.0.1
               | released a series of changes to how they handled
               | ephemeral diffie hellman generation, which could be
               | hooked in earlier releases, but not in later releases.
               | 
               | For the things I was doing on the hooks, it became clear
               | that I needed to make changes and get them added
               | upstream, rather than doing it in hooks, but that meant
               | we were running OpenSSL with local patches in the interim
               | of upstream accepting and releasing my changes. If you're
               | not willing to run a locally patched security critical
               | dependency, it puts you between a rock and a hard place.
        
           | natbennett wrote:
           | It's impossible to avoid the occasional breaking change in an
           | LTS, especially for software like this. Security fixes are
           | inherently breaking changes-- just for users we don't like.
        
           | natbobc wrote:
           | Comparing a single function to an entire ecosystem is crazy.
           | Making an LTS imposes a skew of compatibility and support to
           | all downstream vendors as well as the core team. The core
           | team has done a great job on keeping GAed resources stable
           | across releases. Understand there's more to it than that but
           | you should be regularly upgrading your dependencies as par-
           | four the course not swallowing an elephant every 2 years or
           | whenever a CVE forces your hand. The book Accelerate
           | highlights this quite succinctly.
        
             | Karellen wrote:
             | * https://en.wiktionary.org/wiki/par_for_the_course
        
       | barryrandall wrote:
       | No open source package that's given away for free needs to or
       | should pursue LTS releases. People who want LTS need a
       | commercially-supported distribution so that they can pay people
       | to maintain LTS versions of the product they're using.
        
         | waynesonfire wrote:
         | Maybe my team of 15 engineers that manage the k8s stack can do
         | it.
        
         | watermelon0 wrote:
         | Not saying that companies shouldn't pay for extended support,
         | but many other open source software have LTS releases with
         | multi-year support (e.g. Ubuntu/Debian 5 years for LTS
         | releases, and Node.js for 2.5 years.)
         | 
         | Additionally, I think one of the major reason for LTS is that
         | K8s (and related software) regularly introduces breaking
         | changes. Out of all the software that we use at work, K8s
         | probably takes the most development time to upgrade.
        
           | ses1984 wrote:
           | People pay for longer versions of that, called extended
           | support, ubuntu provides a cut down version for free.
        
         | airocker wrote:
         | Maybe GKE and EKS should make LTS versions.
        
       | master_crab wrote:
       | Yes K8s needs LTS.
       | 
       | AWS released LTS version of EKS last month. 1.23 is on Extended
       | support until next year for free. But later versions will cost
       | money.
       | 
       | https://aws.amazon.com/blogs/containers/amazon-eks-extended-...
        
         | oneplane wrote:
         | So in other words: no it doesn't. The CNCF does its thing, and
         | if you want something else, you can give money to AWS or Azure
         | or GCP and have your cake and eat it too.
         | 
         | I'd rather not see the resources in the Kubernetes project
         | being re-directed to users who are in a situation where they
         | aren't able to do a well-known action at planned intervals two
         | or three times per year.
        
       | JohnMakin wrote:
       | Not on the rough checklist -
       | 
       | - pray to god whatever helm chart dependencies you have didn't
       | break horribly, and if they did, that there's a patch available
        
         | starttoaster wrote:
         | There are tools to tell you if any of your deployed
         | infrastructure uses a deprecated API. I mean, even if the tool
         | didn't exist you could view the deprecation guide, scroll
         | through the kubernetes versions you'll be upgrading through,
         | and inspect your cluster for any objects using a Kind defined
         | by any of those APIs. It's a burden but when is maintaining
         | infrastructure not a burden?
         | https://kubernetes.io/docs/reference/using-api/deprecation-g...
        
           | JohnMakin wrote:
           | CRD's can break in weird ways that aren't always picked up by
           | a tool like that.
        
       | yrro wrote:
       | For comparison, Red Hat denote every other major release of
       | OpenShift (4.8, 4.10, 4.12) as Extended Update Support releases,
       | with 2 years of support.
        
         | blcknight wrote:
         | 2 years is still not very long for large enterprises, they'll
         | be conservative on the uptake - 3-6 months after it gets
         | released, so they only have maybe 12-18 months before they're
         | planning the next upgrade. It's better than vanilla kube, but
         | compare that to RHEL where you get 10 years of maintenance, and
         | can often extend that even further.
        
       | gtirloni wrote:
       | Kubernetes LTS goes by different names: AWS EKS, Azure AKS,
       | Google GKE, SUSE Rancher, etc.
        
         | watermelon0 wrote:
         | Not sure about the others, but AWS EKS support closely follows
         | upstream.
         | 
         | Only recently they released paid extended support in preview,
         | which extends support for an additional year.
        
           | Arnavion wrote:
           | EKS still supports 1.24+ as part of "standard support".
           | Upstream only supports 1.26+ (upstream's policy is "latest 3
           | versions").
        
             | sciurus wrote:
             | EKS also added support for 1.24 seven months after the
             | upstream release date.
             | 
             | https://endoflife.date/amazon-eks
        
         | mardifoufs wrote:
         | Aks is usually like one version behind, and they deprecate
         | older versions every 3 months IIRC. In a one year window of
         | versions, that's not necessarily long term at all. The upgrade
         | process is surprisingly kind of okay though, which is already a
         | lot better than what I usually expect from Azure.
        
         | Kelkonosemmel wrote:
         | GKE does Auto upgrade and enforces them.
         | 
         | You can't get a really old k8s version on gke
        
         | emb3dded wrote:
         | AKS has LTS: https://azure.microsoft.com/en-
         | us/updates/generally-availabl...
        
       | superkuh wrote:
       | Why don't you just put your Kubernetes in Kubernetes? Containers
       | solve everything re: dependency management, right?
        
         | dilyevsky wrote:
         | We actually did just that with just apiserver+sqlite-based
         | storage and it works pretty well. Ofc you have the same problem
         | with an outer cluster now but if you're careful with locking
         | your entire state into the "inner" cluster you can escape that
        
         | geodel wrote:
         | Fully agree. They can follow Cloud Native deployment best
         | practices for guidance.
        
         | nonameiguess wrote:
         | I think you're saying this sarcastically, but every Kubernetes
         | component except the kubelet already can, and usually does in
         | most distros, run as a Pod managed by Kubernetes itself. Edit
         | the manifest, systemctl restart kubelet, and there you go, you
         | just upgraded. Other than kube-proxy, which has a dependency on
         | iptables, I don't even think any other component has any
         | external dependencies, and the distroless containers they run
         | in are pretty close to scratch. Running them in Pods is more
         | for convenience than dependency management.
         | 
         | The actual issue that needs to be addressed when upgrading is
         | API deprecations and incompatibility. For better or worse, when
         | people try to analogize this to the way Red Hat or Debian
         | works, Kubernetes is not Linux. It may not always perfectly
         | achieve the goal, but Linux development operates on the strict
         | rule that no changes can ever break userspace. Kubernetes, in
         | contrast, flat out tells you that generally available APIs will
         | stay available, but if you use beta or alpha APIs, those may or
         | may not exist in future releases. Use them at your own peril.
         | In practice, this has meant virtually every cluster in
         | existence and virtually every cluster management tool created
         | by third parties, uses at least beta APIs. Nobody heeded the
         | warning. So upgrades break stuff all the time.
         | 
         | Should they have not ever released beta APIs at all? Maybe.
         | They actually did start turning them off by default a few
         | releases back, but the cultural norms are already there now.
         | Everybody just turns them back on so Prometheus and cert-
         | manager and Istio and what not will continue working.
        
           | airocker wrote:
           | minikube does exactly that! Runs new versions of kubernetes
           | inside a single Docker container.
        
       | ckdarby wrote:
       | It forces keeping up to date in a reasonable period of time. It
       | is a feature, not a bug.
       | 
       | I agree with comments in this thread that LTS should be a
       | commercial bought option only.
        
       | edude03 wrote:
       | I know it rarely happens in practice, but I disagree than
       | Kubernetes needs an LTS since the Kubernetes cluster itself
       | should be a "cattle not a pet" and thus, you should "just" spin
       | up a new cluster with the same version as your upgrade strategy.
        
         | dilyevsky wrote:
         | Works great for clusters that dont have any storage,
         | loadbalancers, any configuration drift...
        
         | simiones wrote:
         | The whole point of Kubernetes is to move all of the machine
         | management work to one place - the Kubernetes cluster nodes -
         | so that your services are cattle, not pets.
         | 
         | But there is no equivalent system for whole clusters. To
         | transition from one cluster to another, you have to handle
         | everything that k8s gives you without relying on k8s for it
         | (restarting services, rerouting traffic, persistent storage
         | etc). If you can do all that without k8s, why use k8s in the
         | first place?
         | 
         | So no, in practice clusters, just like any other stateful
         | system, are not really cattle, they are pets you care for and
         | carefully manage. Individual cluster nodes can come and go, but
         | the whole cluster is pretty sacred.
        
           | Kelkonosemmel wrote:
           | Global load balancer and traffic shifting is a real thing.
           | 
           | But my preferred way is in place as it's easy: upgrade one
           | Ctrl plane node after the other, than upgrade every node or
           | create a new one and move your workload.
           | 
           | The power of k8s is of course shit when you only move legacy
           | garbage into the cloud as those are not ha at all.
           | 
           | But lucky enough people slowly learn how to write apps cloud
           | native.
        
         | oneplane wrote:
         | That's what we do and it's great. It doesn't even apply to just
         | Kubernetes, you should be able to swap out most components of
         | your infrastructure without manual work, that's why we even
         | invented all this modern software, or even hardware for that
         | matter: redundant power supplies! If we can do that, why is
         | everyone so scared of doing the same for other systems?
        
       | mschuster91 wrote:
       | As someone working in managing a _bunch_ of various Kubernetes
       | clusters - on-prem and EKS - I agree a bit. Managing Kubernetes
       | versions can be an utter PITA, especially keeping all of the
       | various addons and integrations one needs to keep in sync with
       | the current Kubernetes version.
       | 
       | But: most of that can be mitigated by keeping a common structure
       | and baseline templates. You only need to validate your common
       | structure against a QA cluster and then roll out necessary
       | changes onto the production cluster... but most organizations
       | don't bother and let every team roll their own k8s, pipelines and
       | whatnot. This _will_ lead to tons of issues inevitably.
       | 
       | Asking for a Kubernetes LTS is in many cases just papering over
       | organizational deficiencies.
        
       | HPsquared wrote:
       | To borrow from reliability engineering, software failures in
       | practice can approximate a "bathtub curve".
       | 
       | That is: an initial high failure rate (teething problems), a low
       | failure rate for most of the lifespan (when it's actively
       | maintained), then gradually increasing failure rate (in hardware
       | this is called wear-out).
       | 
       | Unlike hardware, software doesn't wear out but the interfaces
       | gradually shift and become obsolete. It's a kind of "gradually
       | increasing risk of fatal incompatibility". Something like that.
       | 
       | I wonder if anyone has done large-scale analysis of this type.
       | Could maybe count CVEs, but that's just one type of failure.
        
       | kevin_nisbet wrote:
       | From my perspective as a former developer on a kubernetes
       | distribution that no longer exists.
       | 
       | The model seems to largely be, the CNCF/Kubernetes authors have
       | done a good job of writing clear expectations for the lifetime
       | for their releases. But there are customers who for various
       | reasons want extended support windows.
       | 
       | This doesn't prevent the distribution from offering or selling
       | extended support windows, so the customers of those distributions
       | can put the pressure on those distribution authors. This is
       | something we offered as a reason to use our distribution, that we
       | can backport security fixes or other significant fixes to older
       | versions of kubernetes. This was especially prevalent for the
       | customers we focussed on, which were lots of clusters installed
       | in places without remote access.
       | 
       | This created a lot of work for us though, as whenever a big
       | security announcement came out, I'd need to triage on whether we
       | needed a backport. Even our extended support windows were in
       | tension with customers, who wanted even longer windows, or would
       | open support cases on releases out of support for more than a
       | year.
       | 
       | So I think the question really should be, should LTS be left to
       | the distributions, many of which will select not to offer longer
       | releases than upstream, but allow for some more commercial or
       | narrow offerings where it's important enough to a customer to pay
       | for it. Or whether it should be the responsibility of the
       | Kubernetes authors and in that case what do you give up in
       | project velocity with more work to do on offering and supporting
       | LTS.
       | 
       | I personally resonate with the argument that this can be left
       | with the distributors, and if it's important enough customers to
       | seek out, they can pay for it through their selected
       | distribution, or switching distributions.
       | 
       | But many customers lose out, because they're selecting
       | distributions that don't offer this service, because it is time
       | consuming and difficult to do.
        
         | edude03 wrote:
         | Maybe more importantly, you could get a distribution to support
         | you but what about upstream projects? It'd be a big lift (if
         | not impossible) to get projects like cert-manager cilium
         | whatever to adopt the longer release cycle as well.
         | 
         | Is it normal for a distribution to also package upstream
         | projects that customers want?
        
           | kevin_nisbet wrote:
           | > It'd be a big lift (if not impossible) to get projects like
           | cert-manager cilium whatever to adopt the longer release
           | cycle as well.
           | 
           | It's a great point which probably should be part of the
           | discussion. Say even if Kubernetes project offered LTS, how
           | would that play into every other project that is pulled
           | together.
           | 
           | > Is it normal for a distribution to also package upstream
           | projects that customers want?
           | 
           | I suspect it differs by distribution. The distribution I
           | worked on included a bunch of other projects, but it was also
           | pretty niche.
        
           | baby_souffle wrote:
           | > Maybe more importantly, you could get a distribution to
           | support you but what about upstream projects? It'd be a big
           | lift (if not impossible) to get projects like cert-manager
           | cilium whatever to adopt the longer release cycle as well.
           | 
           | Exactly this. I see a lot of parallels between k8s releases
           | and OS releases. Even if you're paying microsoft for patches
           | to windows XP, I'm not seeing any of that and the python
           | runtime that most of my software relies on also isn't seeing
           | their cut so... I guess upgrade to at least python 3.10 and
           | then call me back?
           | 
           | I would prefer to see the conversation turn more to "what can
           | be done to reduce reluctance to upgrading? How can we make
           | k8s upgrades painless so there's minimal incentive to stick
           | with a long out of date release?"
        
         | sgift wrote:
         | > But many customers lose out, because they're selecting
         | distributions that don't offer this service, because it is time
         | consuming and difficult to do.
         | 
         | Sure, but if they really need that service they will gravitate
         | to distributions that do provide it, so, I think, no harm done
         | here. It's to me like JDK distributions. Some give you six
         | month, some give free LTS and others give you LTS with a
         | support contract. LTS with backports is work, someone has to
         | pay for it, so let those who really need it pay. Everyone else
         | can enjoy the new features.
         | 
         | tl;dr: I'm with you in the camp that you can leave it to the
         | distributors.
        
         | JohnFen wrote:
         | As an industry, we need to get back to having security releases
         | separate from other sorts of releases. There are tons of people
         | who don't want to, or can't, take every feature release that
         | comes down the pike (particularly since feature updates happen
         | so insanely often these days), and this would be a huge win for
         | them.
        
       | simiones wrote:
       | Honestly, rather than an LTS, I think k8s needs a much better
       | upgrade process. Right now it is really poorly supported, without
       | even the ability to jump between versions.
       | 
       | If you want to migrate from 1.24 to 1.28, you need to upgrade to
       | 1.25, then 1.26, then 1.27, and only then can you go to 1.28.
       | This alone is a significant impediment to the normal way a
       | project upgrades (lag behind, then jump to latest), and would
       | need to be fixed before any discussion of an actual LTS process.
        
       | Kelkonosemmel wrote:
       | That's the big advantage of kubernetes:
       | 
       | You need to keep updating it!
       | 
       | No legacy 10 year old garbage in the corner just because managers
       | don't understand that this is shit.
       | 
       | And for me it's the perfect excuse NOT having those garbage
       | discussions.
       | 
       | And honestly k8s doesn't change that much anyway
        
       | FridgeSeal wrote:
       | I disagree.
       | 
       | Software is a garden that needs to be tended. LTS (and to a
       | lesser extent requirements for large amounts of backwards
       | compatibility) arguments are the path the ossification and orgs
       | running 10+ year out of date, unsupported legacy garbage that
       | nobody wants to touch, and nobody can migrate off because it's so
       | out of whack.
       | 
       | Don't do this. Tend your garden. Do your upgrades and releases
       | frequently, ensure that everything in your stack is well
       | understood and don't let any part of your stack ossify and "crust
       | over".
       | 
       | Upgrades (even breaking ones) are easier to handle when you do
       | them early and often. If you let them pile up, and then have to
       | upgrade all at once because something finally gave way, then
       | you're simply inflicting unnecessary pain on yourself.
        
         | MuffinFlavored wrote:
         | > long-term support
         | 
         | > out of date, unsupported legacy garbage
        
           | jen20 wrote:
           | You can probably remove the "unsupported" part without
           | changing the overall correctness of the statement.
        
         | iwontberude wrote:
         | Kubernetes doesn't improve sufficiently to justify the broken
         | compatibility anymore. Projects slow down and become mature.
         | This isn't a bad thing.
        
           | FridgeSeal wrote:
           | > Projects slow down and become mature. This isn't a bad
           | thing
           | 
           | I completely agree, but that's also not-incompatible with my
           | argument.
        
         | baby_souffle wrote:
         | > Don't do this. Tend your garden. Do your upgrades and
         | releases frequently, ensure that everything in your stack is
         | well understood and don't let any part of your stack ossify and
         | "crust over".
         | 
         | You can't see me, but I'm violently nodding in agreement.
         | Faithfully adhering to these best practices isn't always
         | possible, though; management gonna manage how they manage and
         | now how your ops team wants them to manage.
         | 
         | > Upgrades (even breaking ones) are easier to handle when you
         | do them early and often. If you let them pile up, and then have
         | to upgrade all at once because something finally gave way, then
         | you're simply inflicting unnecessary pain on yourself.
         | 
         | How different could things be if k8s had a "you don't break
         | userland!" policy akin to the way the Linux kernel operates? Is
         | there a better balance between new stuff replacing the old and
         | never shipping new stuff that would make more cluster operators
         | more comfortable with upgrades?
        
         | mfer wrote:
         | Consider where people would use something for a long time and
         | want to keep it relatively stable for long periods? Planes,
         | trains, and automobiles are just a few of the examples. How
         | should over the air updates for these kinds of things work?
         | Where are all the places that k8s is being used?
         | 
         | If we only think of open source in datacenters we limit our
         | thinking to just one of the many places it's used.
        
           | oneplane wrote:
           | Or, they could use something that is not Kubernetes. If you
           | are working with a system that is super static, Podman comes
           | to mind.
        
           | samus wrote:
           | You don't need Kubernetes for over-the-air updates.
           | Kubernetes is also not suitable for that. It scales up
           | alright, but not down. As TA expounds, there are simply too
           | many moving parts that require expertise to operate. And
           | that's fine. Not every software has to be able to accommodate
           | all use cases.
        
             | Palomides wrote:
             | they run kubernetes on fighter jets now
        
         | incahoots wrote:
         | I agree on the very principal of what you're laying out here,
         | but the reality is often rare if not at all in tandem with
         | principals and "best practices"
         | 
         | Manufacturing comes to mind, shuttering a machine down to apply
         | patches monthly is going to piss off the graph babysitters,
         | especially if the business is a 24/7 operation, and most are
         | currently.
         | 
         | In an ideal world there would be times every month to do proper
         | machine maintenance, but that doesn't translate to big money
         | gains for the glutton of shareholders who don't understand
         | anything, let alone understand that maintenance prolongs
         | processes as opposed to running everything ragged.
        
           | dwattttt wrote:
           | You can also run your car for longer if you never stop to
           | take it to a mechanic too.
        
             | Volundr wrote:
             | I don't think the comparison to taking your car to a
             | mechanic is a good one. When I take my car to a mechanic
             | they are returning it to a stock configuration. They don't
             | need to update to a new, slightly different power steering
             | this month, new brakes next, then injectors....
        
               | dwattttt wrote:
               | Certainly. But it's being compared to not taking systems
               | down to do maintenance at all.
               | 
               | > if the business is a 24/7 operation
               | 
               | > In an ideal world there would be times every month to
               | do proper machine maintenance
        
         | wouldbecouldbe wrote:
         | Kubernetes feels like Javascript has reached the sysadmins, new
         | updates, libraries, build tools every week.
         | 
         | It's mainly good for keeping high paid people employed, not
         | keeping your servers stable.
         | 
         | I ran a cluster 1.5 years in production, took me so much
         | energy. Especially that one night where digital ocean managed
         | cluster forced an update that crashed all servers; and there
         | was no sane way to fix it.
         | 
         | I'm back to stability with old school VPS; it just works. Every
         | now and then you run a few patches. Simple & fast deploys; what
         | a blessing.
        
         | jl6 wrote:
         | That's great if you have control over all the moving parts, but
         | a lot of real-world (i.e. not exclusively software-based) orgs
         | have interfaces to components and external entities that aren't
         | so amenable to change. Maybe you can upgrade your cluster
         | without anybody noticing. Maybe you're a wizard and upgrades
         | never go wrong for you.
         | 
         | More likely, you will be constrained by a patchwork quilt of
         | contracts with customers and suppliers, or by regulatory and
         | statutory requirements, or technology risk policies, and to get
         | approval you'll need to schedule end-to-end testing with a
         | bunch of stakeholders whose incentives aren't necessarily
         | aligned to yours.
         | 
         | That all adds up to $$$, and that's why there's a demand for
         | stability and LTS editions.
        
         | Waterluvian wrote:
         | Sometimes I get this feeling that a lot of developers kind of
         | want to be in their code all the time, tending to it. But it's
         | just not a good use of my time. I want it to work so I can
         | ignore it and move on to the next best use of my time.
         | 
         | I trial upgraded a years old Django project today to 5.0 and it
         | took almost zero work. I hadn't touched versions (other than
         | patch)in over a year. That's the way I want it. Admittedly this
         | was less about an LTS and more about sensible design with
         | upgrading in mind.
        
         | maximinus_thrax wrote:
         | To paraphrase someone else's reaction, I'm violently shaking my
         | head in disagreement. What you're saying only works when you
         | have 100% full control of everything (including the customer
         | data). As someone who spent years in the enterprise space, what
         | you're saying is 'Ivory Tower Architecture'.
         | 
         | LTS is a commitment. That is all. If someone is uncomfortable
         | with such a commitment, then that's fine. But what LTS does is
         | that it tells everyone (including paying customers) that the
         | formats/schemas/APIs/etc.. in that version will be supported
         | for a very long time and I don't have to think about it or
         | budget too much for its maintenance. I would go the extra mile
         | here and say that offline format should be supported FOREVER.
         | None of that LTS bs for offline data, ever.
         | 
         | Re-reading your comment gives me chills and reinforces my
         | belief that I will never pay money to Google (they have a
         | similar gung ho attitude against 'legacy garbage') or have any
         | parts of my business depend on stuff which reserves the liberty
         | of breaking shit early and often.
        
       | airocker wrote:
       | This is sorely needed. It causes service disruption for weeks
       | because of weird upgrade schemes in GKE (or any other vendor)
       | which works on top of Kubernetes. There are too many options with
       | non-intuitive defaults related to how the control plane and the
       | node pools will be repaired or upgraded. Clusters get upgraded
       | without our knowledge and breaks service arbitrarily. Plus, if
       | you are doing some infrastructure level changes, you have to put
       | in extreme effort to keep upgrading.
       | 
       | IMHO , infrastructure is too low level for frequent updates.
       | Older versions need LTS.
        
         | MuffinFlavored wrote:
         | how much of what you said is part of the complexity that comes
         | with Kubernetes in general?
         | 
         | at the end of the day it's about cramming a ton of
         | functionality into something like YAML is it not?
        
           | airocker wrote:
           | Our cluster has reasonable complexity and yamls work well so
           | far. Maybe we will see what you are saying if we scale
           | further. It allows us to maintain a much more complex cluster
           | with fewer developers. And the software is well
           | tested/reliable.
           | 
           | But not having LTS is really difficult. It is impossible to
           | keep rewriting things. And all related components: terraform,
           | helm, GKE etc also update too quickly for us to manage with a
           | small team. We do some lower level infrastructure work so
           | these problems bites us harder.
        
       | tomjen3 wrote:
       | Lets Encrypt will only give you a cert that is good for 3 months,
       | because they want you to automate the update.
       | 
       | I don't think K8S should create an LTS, I think they should make
       | it dirt simple to update.
        
       | rafaelturk wrote:
       | In my opinion if you need a stable production enviroment;
       | MicroK8s stands out as the Kubernetes LTS choice. Canonical
       | managed to create a successfull robust and stable Kubernetes
       | ecosystem. Despite its name, MicroK8s isn't exactly 'micro'--it's
       | more accurately a highly stable and somewhat opinionated version
       | of Kubernetes.
        
       | nezirus wrote:
       | The cynical voice inside me says it works as intended. The
       | purpose of k8s is not to help you run your
       | business/project/whatever, but a way to ascend to DevOps Nirvana.
       | That means never-ending cycle of upgrades for the purpose of
       | upgrading.
       | 
       | I guess too many people are using k8s where they should have used
       | something simpler. It's fashionable to follow the "best
       | practices" of FAANGs, but I'm not sure that's healthy for vast
       | majority of other companies, which are simply not on the same
       | scale and don't have armies of engineers (guardians of the holy
       | "Platform")
        
       | oneplane wrote:
       | No it doesn't.
       | 
       | If you can't keep up you have two options:
       | 
       | 1. Pay someone else to do it for you (effectively an LTS)
       | 
       | 2. Don't use it
       | 
       | Software is imperfect, processes are imperfect. An LTS doesn't
       | fix that, it just pushes problems forward. If you are in a
       | situation where you need a frozen software product, Kubernetes
       | simply doesn't fit the use case and that's okay.
       | 
       | I suppose it's pretty much all about expectations and managing
       | those instead of trying to hide mis-matches, bad choices and
       | ineptitude. (most LTS use cases) It's essentially x509
       | certificate management all over again; if you can't do it right
       | automatically, that's not the certificate lifetime's fault, it's
       | the implementors fault.
       | 
       | As for option 1: that can take many shapes, including abstracting
       | away K8S entirely, replacing entire clusters instead of
       | 'upgrading' them, or having someone do the actual manual upgrade.
       | But in a world with control loops and automated reconciliation,
       | adding a manual process seems a bit like missing the forest for
       | the trees. I for one have not seen a successful use of K8S where
       | it was treated like an application that you periodically manually
       | patch. Not because it's not possible to do, but because it's a
       | symptom of a certain company culture.
        
       | mountainriver wrote:
       | There have been numerous proposals for LTS releases within the
       | k8s community. I'm curious if someone could chime in as to why
       | those haven't succeeded
        
       | b7we5b7a wrote:
       | Perhaps it's because I work in a small software shop and we do
       | only B2B, but 99% of our applications consist of a frontend (JS
       | served by an nginx image), a middleware (RoR, C#, Rust), nginx
       | ingress and cert-manager. Sometimes we have PersistentVolumes,
       | for 1 project we have CronJobs. SQL DBs are provisioned via the
       | cloud provider. We monitor via Grafana Cloud, and haven't felt
       | the need for more complex tools yet (yes, we're about to deploy
       | NetworkPolicies and perform other small changes to harden the
       | setup a bit).
       | 
       | In my experience:
       | 
       | - AKS is the simplest to update: select "update cluster and
       | nodes", click ok, wait ~15m (though I will always remember
       | vividly the health probe path change for LBs in 1.24 - perhaps a
       | giant red banner would have been a good idea in this case)
       | 
       | - EKS requires you to manually perform all the steps AKS does for
       | you, but it's still reasonably easy
       | 
       | - All of this can be easily scripted
       | 
       | I totally agree with the other comments here: LTS releases would
       | doom the project to support >10y-old releases just because
       | managers want to "create value", but don't want to spend a couple
       | weeks a year to care for the stuff _they use in production_.
       | Having reasonably up-to-date, maintainable infrastructure IS
       | value to the business.
        
       | cyrnel wrote:
       | Good overview! I'd personally rather have better tooling for
       | upgrades. Recently the API changes have been minimal, but the
       | real problem is the mandatory node draining that causes
       | downtime/disruption.
       | 
       | In theory, there's nothing stopping you from just updating the
       | kubelet binary on every node. It will generally inherit the
       | existing pods. Nomad even supports this[1]. But apparently there
       | are no guarantees about this working between versions. And in
       | fact some past upgrades have broken the way kubelet stores its
       | own state, preventing this trick.
       | 
       | All I ask is for this informal trick to be formalized in the e2e
       | tests. I'd write a KEP but I'm too busy draining nodes!
       | 
       | [1]: https://developer.hashicorp.com/nomad/docs/upgrade
        
         | op00to wrote:
         | 100% as someone who used to support Kubernetes commercially,
         | long term support is an engineering nightmare with kube. My
         | customers who could upgrade easily were more stable and easily
         | supported. The customers that couldn't handle an upgrade were
         | the exact opposite - long support cases, complex request
         | processes for troubleshooting information, the deck was
         | probably stacked against them from the start.
         | 
         | Anyway, make upgrades less scary and more routine and the risk
         | bubbles away.
        
       | voytec wrote:
       | > The Kubernetes project currently lacks enough active
       | contributors to adequately respond to all issues and PRs.
       | 
       | > This bot triages issues according to the following rules: ...
       | 
       | > /close
        
       | rdsubhas wrote:
       | Many comments here disagreeing about LTS just because it won't be
       | up-to-date - are missing a critical point.
       | 
       | When people run a rolling release on their server, their original
       | intent is "Yay I'll force myself to be up-to-date". Reality is,
       | they get conflicts on installed 3rd party software in each
       | upgrade. What ends up happening is, they get frozen on some point
       | in time, without even security patches for a long time.
       | 
       | k8s is like an OS, it's not just the core components, there is
       | <ingress/ gateway/ mesh/ overlays/ operators/ admission
       | controllers/ 3rd party integrations like vault, autoscalers,
       | etc>. Something or the other breaks with each rolling release.
       | I've grown really tired of the way GKE v1.25 pretends to be a
       | "minor" automated upgrade, when it removes or changes god knows
       | how many APIs.
       | 
       | This is what is happening in kubernetes land. The broken upgrade
       | fatigue is real, but it's hampered by wishful thinking.
        
       ___________________________________________________________________
       (page generated 2023-12-04 23:00 UTC)