[HN Gopher] Kubernetes Needs an LTS
___________________________________________________________________
Kubernetes Needs an LTS
Author : todsacerdoti
Score : 110 points
Date : 2023-12-04 13:01 UTC (9 hours ago)
(HTM) web link (matduggan.com)
(TXT) w3m dump (matduggan.com)
| sofixa wrote:
| This, like a recent LTS discussion I saw for a different tool,
| ignores one tiny little detail that makes the whole discussion
| kind of moot.
|
| LTS doesn't mean it's immune to bugs or security vulnerabilities.
| It just means that the major release is updated and supported
| longer - but you still need to be able to apply patches and
| security fixes to that major release. Yes, it's easier to go from
| 1.20.1 to 1.20.5 than to 1.21, because there's less chance of
| breakage and less things that will change, but the process is
| pretty much the same - check for breaking changes, read
| changelogs, apply everything. The risk is less, might be slightly
| faster, but fundamentally, it's the same process. If the process
| is too heavy and takes you too long, having it be _slightly_
| faster won 't be a gamechanger.
|
| So LTS brings slight advantages to the operator, while adding
| potentially significant complexity to the developer (generally
| backporting fixes into years old versions isn't fun).
|
| The specific proposed LTS falvour is also hardcore, without an
| upgrade path to the next LTS. The exact type of org that needs an
| LTS will be extremely reluctant to having to redo everything, 2
| years later, with potentially drastic breaking changes making
| that change very hard.
| x86x87 wrote:
| That's not how LTS is supposed to work. You should be able to
| uograde effortlessly with minimum risk.
|
| If you're at a point where a patch for LTS looks like an
| upgrade to the new version you've screwed up LTS.
|
| Also, getting to the point of having an LTS and actually
| providing the support is expensive. You need experts that can
| backport security fixes and know the product inside out.
| sofixa wrote:
| > That's not how LTS is supposed to work. You should be able
| to uograde effortlessly with minimum risk.
|
| How do you do that on something as complex and with as many
| moving parts as Kubernetes? And how do you as an operator
| update that many things without checking there's no breaking
| changes in the patch?
| x86x87 wrote:
| bingo! how do you do it? and do you want that kind of
| complexity to begin with?
| sofixa wrote:
| Don't ask me, I'm firmly in the HashiCorp Nomad camp:
| https://atodorov.me/2021/02/27/why-you-should-take-a-
| look-at... (Note: quite old, some things are no longer
| true most notably around Nomad downsides)
| rigrassm wrote:
| I'm with you, Nomad is highly underrated!
| Rantenki wrote:
| We upgrade our distros pretty much fearlessly, all the
| time. While I have had breakage from Kernel upgrades,
| they've been very rare (and generally related to third
| party closed drivers). Kubernetes is _not_ more complicated
| than the Linux kernel, but it is much more dangerous to
| upgrade in place.
| eddythompson80 wrote:
| > Kubernetes is _not_ more complicated than the Linux
| kernel, but it is much more dangerous to upgrade in
| place.
|
| eh, the kernel is an incredibly mature project with 1
| machine scope. The kernel also has decades of operating
| systems research and literature to build on. Kubernetes
| in comparison is new, distributed and exploring uncharted
| territory in terms of feature set and implementation.
| Sometimes bad decisions are made, and it's fair to not
| want to live with them forever.
|
| The kernel project looks very different today than it did
| in 1999.
|
| There is a happy medium though, especially that
| Kubernetes is kinda far from it.
| jen20 wrote:
| My answer is simple: don't. Use something far simpler and
| with fewer moving parts than Kubernetes, and something
| where crucial parts of the ecosystem required to make
| things even basically work are not outsourced to third
| party projects.
|
| Nomad is a good solution.
| freedomben wrote:
| I don't see anywhere that GP said an LTS patch would take
| effort. They said the upgrade path to the next LTS would.
|
| If you are talking about upgrade from LTS to LTS, can you
| give an example project where that is effortless? And if so,
| how do they manage to innovate and modernize without ever
| breaking backwards compatibility?
| x86x87 wrote:
| Here: "it's easier to go from 1.20.1 to 1.20.5 than to
| 1.21, because there's less chance of breakage and less
| things that will change, but the process is pretty much the
| same"
|
| LTS to LTS is another story. But the point is that
| L=LongTerm so in theory you're only going to do this
| exercise twice in a decade.
|
| > manage to innovate and modernize without ever breaking
| backwards yeah. fuck backwards compatibility. that is for
| suckers. how about stopping the madness for a second and
| thinking about what you are building when you build it?
| pixl97 wrote:
| > in theory you're only going to do this exercise twice
| in a decade.
|
| So I've seen things like this in corporations many times
| and it typically works like this...
|
| Well trained team sets up environment. Over time team
| members leave and only less senior members remain. They
| are capable of patching the system and keeping it
| running. Eventually the number of staff even capable of
| patching the system diminishes. System reaches end of
| life and vendor demands upgrading. System falls out of
| security compliance and everything around it is an
| organizational exception in one way or another.
| Eventually at massive cost from outside contractors the
| system gets upgraded and the cycle begins all over again.
|
| Not being able to upgrade these systems is about the lack
| of and loss of capable internal staff.
| Karellen wrote:
| > but the process is pretty much the same - check for breaking
| changes,
|
| Unless you're relying on buggy behaviour, there should be no
| breaking changes in an LTS update.
|
| (...of course, there's no guarantee that you're not relying on
| buggy (or, at least, accidental) behaviour. People relying on
| `memcpy(3)` working as expected when the ranges overlap, simply
| because it happened to do so with historic versions of the
| `libc` implementation they most commonly happened to test with,
| is one example. But see also obxkcd https://xkcd.com/1172/ and
| https://www.hyrumslaw.com/ )
| sofixa wrote:
| > Unless you're relying on buggy behaviour, there should be
| no breaking changes in an LTS update.
|
| Or a security vulnerability has forced a breaking change. Or
| any other issue, which is why you _have_ to check.
| Karellen wrote:
| > Or a security vulnerability has forced a breaking change.
|
| Theoretically, I suppose?
|
| Do you have a historic example in mind?
|
| I've been running Debian "stable" in its various
| incarnations on servers for over a decade, and I can't
| remember any time any service on any installation I've run
| had such an issue. But my memory is pretty bad, so I might
| have missed one. (Or even a dozen!) But I have `unattented-
| upgrades` installed on all my live servers right now, and
| don't lose a wink of sleep over it.
| sofixa wrote:
| Yes, I have an example in mind -
| https://askubuntu.com/questions/1376118/ubuntu-20-04-lts-
| una...
|
| Yes, it's Ubuntu, but doesn't matter - sometimes security
| fixes require a breaking change and there's nothing that
| can be done to avoid it.
| natbennett wrote:
| This happens _all the time_ on systems that are running
| hundreds of thousands of apps across hundreds of
| customers.
|
| The worst one I know: for a while basically all Cloud
| Foundry installations were stuck behind a patch release
| because the routing component upgraded their Go version
| and that Go version included an allegedly non-breaking-
| change that caused it to reject requests with certain
| kinds of malformed headers.
|
| The Spring example app has a header with the specific
| problem impacted. And the vast majority of Cloud Foundry
| apps are Spring apps, many of which got started by
| copying the Spring example app.
|
| So upgrading CF past this patch release required a code
| change to the apps running on the platform. Which the
| people running Cloud Foundry generally can't get --
| there's usually a team of like 12 people running them and
| then 1000s of app devs.
| toast0 wrote:
| OpenSSL isn't necessarily the best at LTS, but 1.0.1
| released a series of changes to how they handled
| ephemeral diffie hellman generation, which could be
| hooked in earlier releases, but not in later releases.
|
| For the things I was doing on the hooks, it became clear
| that I needed to make changes and get them added
| upstream, rather than doing it in hooks, but that meant
| we were running OpenSSL with local patches in the interim
| of upstream accepting and releasing my changes. If you're
| not willing to run a locally patched security critical
| dependency, it puts you between a rock and a hard place.
| natbennett wrote:
| It's impossible to avoid the occasional breaking change in an
| LTS, especially for software like this. Security fixes are
| inherently breaking changes-- just for users we don't like.
| natbobc wrote:
| Comparing a single function to an entire ecosystem is crazy.
| Making an LTS imposes a skew of compatibility and support to
| all downstream vendors as well as the core team. The core
| team has done a great job on keeping GAed resources stable
| across releases. Understand there's more to it than that but
| you should be regularly upgrading your dependencies as par-
| four the course not swallowing an elephant every 2 years or
| whenever a CVE forces your hand. The book Accelerate
| highlights this quite succinctly.
| Karellen wrote:
| * https://en.wiktionary.org/wiki/par_for_the_course
| barryrandall wrote:
| No open source package that's given away for free needs to or
| should pursue LTS releases. People who want LTS need a
| commercially-supported distribution so that they can pay people
| to maintain LTS versions of the product they're using.
| waynesonfire wrote:
| Maybe my team of 15 engineers that manage the k8s stack can do
| it.
| watermelon0 wrote:
| Not saying that companies shouldn't pay for extended support,
| but many other open source software have LTS releases with
| multi-year support (e.g. Ubuntu/Debian 5 years for LTS
| releases, and Node.js for 2.5 years.)
|
| Additionally, I think one of the major reason for LTS is that
| K8s (and related software) regularly introduces breaking
| changes. Out of all the software that we use at work, K8s
| probably takes the most development time to upgrade.
| ses1984 wrote:
| People pay for longer versions of that, called extended
| support, ubuntu provides a cut down version for free.
| airocker wrote:
| Maybe GKE and EKS should make LTS versions.
| master_crab wrote:
| Yes K8s needs LTS.
|
| AWS released LTS version of EKS last month. 1.23 is on Extended
| support until next year for free. But later versions will cost
| money.
|
| https://aws.amazon.com/blogs/containers/amazon-eks-extended-...
| oneplane wrote:
| So in other words: no it doesn't. The CNCF does its thing, and
| if you want something else, you can give money to AWS or Azure
| or GCP and have your cake and eat it too.
|
| I'd rather not see the resources in the Kubernetes project
| being re-directed to users who are in a situation where they
| aren't able to do a well-known action at planned intervals two
| or three times per year.
| JohnMakin wrote:
| Not on the rough checklist -
|
| - pray to god whatever helm chart dependencies you have didn't
| break horribly, and if they did, that there's a patch available
| starttoaster wrote:
| There are tools to tell you if any of your deployed
| infrastructure uses a deprecated API. I mean, even if the tool
| didn't exist you could view the deprecation guide, scroll
| through the kubernetes versions you'll be upgrading through,
| and inspect your cluster for any objects using a Kind defined
| by any of those APIs. It's a burden but when is maintaining
| infrastructure not a burden?
| https://kubernetes.io/docs/reference/using-api/deprecation-g...
| JohnMakin wrote:
| CRD's can break in weird ways that aren't always picked up by
| a tool like that.
| yrro wrote:
| For comparison, Red Hat denote every other major release of
| OpenShift (4.8, 4.10, 4.12) as Extended Update Support releases,
| with 2 years of support.
| blcknight wrote:
| 2 years is still not very long for large enterprises, they'll
| be conservative on the uptake - 3-6 months after it gets
| released, so they only have maybe 12-18 months before they're
| planning the next upgrade. It's better than vanilla kube, but
| compare that to RHEL where you get 10 years of maintenance, and
| can often extend that even further.
| gtirloni wrote:
| Kubernetes LTS goes by different names: AWS EKS, Azure AKS,
| Google GKE, SUSE Rancher, etc.
| watermelon0 wrote:
| Not sure about the others, but AWS EKS support closely follows
| upstream.
|
| Only recently they released paid extended support in preview,
| which extends support for an additional year.
| Arnavion wrote:
| EKS still supports 1.24+ as part of "standard support".
| Upstream only supports 1.26+ (upstream's policy is "latest 3
| versions").
| sciurus wrote:
| EKS also added support for 1.24 seven months after the
| upstream release date.
|
| https://endoflife.date/amazon-eks
| mardifoufs wrote:
| Aks is usually like one version behind, and they deprecate
| older versions every 3 months IIRC. In a one year window of
| versions, that's not necessarily long term at all. The upgrade
| process is surprisingly kind of okay though, which is already a
| lot better than what I usually expect from Azure.
| Kelkonosemmel wrote:
| GKE does Auto upgrade and enforces them.
|
| You can't get a really old k8s version on gke
| emb3dded wrote:
| AKS has LTS: https://azure.microsoft.com/en-
| us/updates/generally-availabl...
| superkuh wrote:
| Why don't you just put your Kubernetes in Kubernetes? Containers
| solve everything re: dependency management, right?
| dilyevsky wrote:
| We actually did just that with just apiserver+sqlite-based
| storage and it works pretty well. Ofc you have the same problem
| with an outer cluster now but if you're careful with locking
| your entire state into the "inner" cluster you can escape that
| geodel wrote:
| Fully agree. They can follow Cloud Native deployment best
| practices for guidance.
| nonameiguess wrote:
| I think you're saying this sarcastically, but every Kubernetes
| component except the kubelet already can, and usually does in
| most distros, run as a Pod managed by Kubernetes itself. Edit
| the manifest, systemctl restart kubelet, and there you go, you
| just upgraded. Other than kube-proxy, which has a dependency on
| iptables, I don't even think any other component has any
| external dependencies, and the distroless containers they run
| in are pretty close to scratch. Running them in Pods is more
| for convenience than dependency management.
|
| The actual issue that needs to be addressed when upgrading is
| API deprecations and incompatibility. For better or worse, when
| people try to analogize this to the way Red Hat or Debian
| works, Kubernetes is not Linux. It may not always perfectly
| achieve the goal, but Linux development operates on the strict
| rule that no changes can ever break userspace. Kubernetes, in
| contrast, flat out tells you that generally available APIs will
| stay available, but if you use beta or alpha APIs, those may or
| may not exist in future releases. Use them at your own peril.
| In practice, this has meant virtually every cluster in
| existence and virtually every cluster management tool created
| by third parties, uses at least beta APIs. Nobody heeded the
| warning. So upgrades break stuff all the time.
|
| Should they have not ever released beta APIs at all? Maybe.
| They actually did start turning them off by default a few
| releases back, but the cultural norms are already there now.
| Everybody just turns them back on so Prometheus and cert-
| manager and Istio and what not will continue working.
| airocker wrote:
| minikube does exactly that! Runs new versions of kubernetes
| inside a single Docker container.
| ckdarby wrote:
| It forces keeping up to date in a reasonable period of time. It
| is a feature, not a bug.
|
| I agree with comments in this thread that LTS should be a
| commercial bought option only.
| edude03 wrote:
| I know it rarely happens in practice, but I disagree than
| Kubernetes needs an LTS since the Kubernetes cluster itself
| should be a "cattle not a pet" and thus, you should "just" spin
| up a new cluster with the same version as your upgrade strategy.
| dilyevsky wrote:
| Works great for clusters that dont have any storage,
| loadbalancers, any configuration drift...
| simiones wrote:
| The whole point of Kubernetes is to move all of the machine
| management work to one place - the Kubernetes cluster nodes -
| so that your services are cattle, not pets.
|
| But there is no equivalent system for whole clusters. To
| transition from one cluster to another, you have to handle
| everything that k8s gives you without relying on k8s for it
| (restarting services, rerouting traffic, persistent storage
| etc). If you can do all that without k8s, why use k8s in the
| first place?
|
| So no, in practice clusters, just like any other stateful
| system, are not really cattle, they are pets you care for and
| carefully manage. Individual cluster nodes can come and go, but
| the whole cluster is pretty sacred.
| Kelkonosemmel wrote:
| Global load balancer and traffic shifting is a real thing.
|
| But my preferred way is in place as it's easy: upgrade one
| Ctrl plane node after the other, than upgrade every node or
| create a new one and move your workload.
|
| The power of k8s is of course shit when you only move legacy
| garbage into the cloud as those are not ha at all.
|
| But lucky enough people slowly learn how to write apps cloud
| native.
| oneplane wrote:
| That's what we do and it's great. It doesn't even apply to just
| Kubernetes, you should be able to swap out most components of
| your infrastructure without manual work, that's why we even
| invented all this modern software, or even hardware for that
| matter: redundant power supplies! If we can do that, why is
| everyone so scared of doing the same for other systems?
| mschuster91 wrote:
| As someone working in managing a _bunch_ of various Kubernetes
| clusters - on-prem and EKS - I agree a bit. Managing Kubernetes
| versions can be an utter PITA, especially keeping all of the
| various addons and integrations one needs to keep in sync with
| the current Kubernetes version.
|
| But: most of that can be mitigated by keeping a common structure
| and baseline templates. You only need to validate your common
| structure against a QA cluster and then roll out necessary
| changes onto the production cluster... but most organizations
| don't bother and let every team roll their own k8s, pipelines and
| whatnot. This _will_ lead to tons of issues inevitably.
|
| Asking for a Kubernetes LTS is in many cases just papering over
| organizational deficiencies.
| HPsquared wrote:
| To borrow from reliability engineering, software failures in
| practice can approximate a "bathtub curve".
|
| That is: an initial high failure rate (teething problems), a low
| failure rate for most of the lifespan (when it's actively
| maintained), then gradually increasing failure rate (in hardware
| this is called wear-out).
|
| Unlike hardware, software doesn't wear out but the interfaces
| gradually shift and become obsolete. It's a kind of "gradually
| increasing risk of fatal incompatibility". Something like that.
|
| I wonder if anyone has done large-scale analysis of this type.
| Could maybe count CVEs, but that's just one type of failure.
| kevin_nisbet wrote:
| From my perspective as a former developer on a kubernetes
| distribution that no longer exists.
|
| The model seems to largely be, the CNCF/Kubernetes authors have
| done a good job of writing clear expectations for the lifetime
| for their releases. But there are customers who for various
| reasons want extended support windows.
|
| This doesn't prevent the distribution from offering or selling
| extended support windows, so the customers of those distributions
| can put the pressure on those distribution authors. This is
| something we offered as a reason to use our distribution, that we
| can backport security fixes or other significant fixes to older
| versions of kubernetes. This was especially prevalent for the
| customers we focussed on, which were lots of clusters installed
| in places without remote access.
|
| This created a lot of work for us though, as whenever a big
| security announcement came out, I'd need to triage on whether we
| needed a backport. Even our extended support windows were in
| tension with customers, who wanted even longer windows, or would
| open support cases on releases out of support for more than a
| year.
|
| So I think the question really should be, should LTS be left to
| the distributions, many of which will select not to offer longer
| releases than upstream, but allow for some more commercial or
| narrow offerings where it's important enough to a customer to pay
| for it. Or whether it should be the responsibility of the
| Kubernetes authors and in that case what do you give up in
| project velocity with more work to do on offering and supporting
| LTS.
|
| I personally resonate with the argument that this can be left
| with the distributors, and if it's important enough customers to
| seek out, they can pay for it through their selected
| distribution, or switching distributions.
|
| But many customers lose out, because they're selecting
| distributions that don't offer this service, because it is time
| consuming and difficult to do.
| edude03 wrote:
| Maybe more importantly, you could get a distribution to support
| you but what about upstream projects? It'd be a big lift (if
| not impossible) to get projects like cert-manager cilium
| whatever to adopt the longer release cycle as well.
|
| Is it normal for a distribution to also package upstream
| projects that customers want?
| kevin_nisbet wrote:
| > It'd be a big lift (if not impossible) to get projects like
| cert-manager cilium whatever to adopt the longer release
| cycle as well.
|
| It's a great point which probably should be part of the
| discussion. Say even if Kubernetes project offered LTS, how
| would that play into every other project that is pulled
| together.
|
| > Is it normal for a distribution to also package upstream
| projects that customers want?
|
| I suspect it differs by distribution. The distribution I
| worked on included a bunch of other projects, but it was also
| pretty niche.
| baby_souffle wrote:
| > Maybe more importantly, you could get a distribution to
| support you but what about upstream projects? It'd be a big
| lift (if not impossible) to get projects like cert-manager
| cilium whatever to adopt the longer release cycle as well.
|
| Exactly this. I see a lot of parallels between k8s releases
| and OS releases. Even if you're paying microsoft for patches
| to windows XP, I'm not seeing any of that and the python
| runtime that most of my software relies on also isn't seeing
| their cut so... I guess upgrade to at least python 3.10 and
| then call me back?
|
| I would prefer to see the conversation turn more to "what can
| be done to reduce reluctance to upgrading? How can we make
| k8s upgrades painless so there's minimal incentive to stick
| with a long out of date release?"
| sgift wrote:
| > But many customers lose out, because they're selecting
| distributions that don't offer this service, because it is time
| consuming and difficult to do.
|
| Sure, but if they really need that service they will gravitate
| to distributions that do provide it, so, I think, no harm done
| here. It's to me like JDK distributions. Some give you six
| month, some give free LTS and others give you LTS with a
| support contract. LTS with backports is work, someone has to
| pay for it, so let those who really need it pay. Everyone else
| can enjoy the new features.
|
| tl;dr: I'm with you in the camp that you can leave it to the
| distributors.
| JohnFen wrote:
| As an industry, we need to get back to having security releases
| separate from other sorts of releases. There are tons of people
| who don't want to, or can't, take every feature release that
| comes down the pike (particularly since feature updates happen
| so insanely often these days), and this would be a huge win for
| them.
| simiones wrote:
| Honestly, rather than an LTS, I think k8s needs a much better
| upgrade process. Right now it is really poorly supported, without
| even the ability to jump between versions.
|
| If you want to migrate from 1.24 to 1.28, you need to upgrade to
| 1.25, then 1.26, then 1.27, and only then can you go to 1.28.
| This alone is a significant impediment to the normal way a
| project upgrades (lag behind, then jump to latest), and would
| need to be fixed before any discussion of an actual LTS process.
| Kelkonosemmel wrote:
| That's the big advantage of kubernetes:
|
| You need to keep updating it!
|
| No legacy 10 year old garbage in the corner just because managers
| don't understand that this is shit.
|
| And for me it's the perfect excuse NOT having those garbage
| discussions.
|
| And honestly k8s doesn't change that much anyway
| FridgeSeal wrote:
| I disagree.
|
| Software is a garden that needs to be tended. LTS (and to a
| lesser extent requirements for large amounts of backwards
| compatibility) arguments are the path the ossification and orgs
| running 10+ year out of date, unsupported legacy garbage that
| nobody wants to touch, and nobody can migrate off because it's so
| out of whack.
|
| Don't do this. Tend your garden. Do your upgrades and releases
| frequently, ensure that everything in your stack is well
| understood and don't let any part of your stack ossify and "crust
| over".
|
| Upgrades (even breaking ones) are easier to handle when you do
| them early and often. If you let them pile up, and then have to
| upgrade all at once because something finally gave way, then
| you're simply inflicting unnecessary pain on yourself.
| MuffinFlavored wrote:
| > long-term support
|
| > out of date, unsupported legacy garbage
| jen20 wrote:
| You can probably remove the "unsupported" part without
| changing the overall correctness of the statement.
| iwontberude wrote:
| Kubernetes doesn't improve sufficiently to justify the broken
| compatibility anymore. Projects slow down and become mature.
| This isn't a bad thing.
| FridgeSeal wrote:
| > Projects slow down and become mature. This isn't a bad
| thing
|
| I completely agree, but that's also not-incompatible with my
| argument.
| baby_souffle wrote:
| > Don't do this. Tend your garden. Do your upgrades and
| releases frequently, ensure that everything in your stack is
| well understood and don't let any part of your stack ossify and
| "crust over".
|
| You can't see me, but I'm violently nodding in agreement.
| Faithfully adhering to these best practices isn't always
| possible, though; management gonna manage how they manage and
| now how your ops team wants them to manage.
|
| > Upgrades (even breaking ones) are easier to handle when you
| do them early and often. If you let them pile up, and then have
| to upgrade all at once because something finally gave way, then
| you're simply inflicting unnecessary pain on yourself.
|
| How different could things be if k8s had a "you don't break
| userland!" policy akin to the way the Linux kernel operates? Is
| there a better balance between new stuff replacing the old and
| never shipping new stuff that would make more cluster operators
| more comfortable with upgrades?
| mfer wrote:
| Consider where people would use something for a long time and
| want to keep it relatively stable for long periods? Planes,
| trains, and automobiles are just a few of the examples. How
| should over the air updates for these kinds of things work?
| Where are all the places that k8s is being used?
|
| If we only think of open source in datacenters we limit our
| thinking to just one of the many places it's used.
| oneplane wrote:
| Or, they could use something that is not Kubernetes. If you
| are working with a system that is super static, Podman comes
| to mind.
| samus wrote:
| You don't need Kubernetes for over-the-air updates.
| Kubernetes is also not suitable for that. It scales up
| alright, but not down. As TA expounds, there are simply too
| many moving parts that require expertise to operate. And
| that's fine. Not every software has to be able to accommodate
| all use cases.
| Palomides wrote:
| they run kubernetes on fighter jets now
| incahoots wrote:
| I agree on the very principal of what you're laying out here,
| but the reality is often rare if not at all in tandem with
| principals and "best practices"
|
| Manufacturing comes to mind, shuttering a machine down to apply
| patches monthly is going to piss off the graph babysitters,
| especially if the business is a 24/7 operation, and most are
| currently.
|
| In an ideal world there would be times every month to do proper
| machine maintenance, but that doesn't translate to big money
| gains for the glutton of shareholders who don't understand
| anything, let alone understand that maintenance prolongs
| processes as opposed to running everything ragged.
| dwattttt wrote:
| You can also run your car for longer if you never stop to
| take it to a mechanic too.
| Volundr wrote:
| I don't think the comparison to taking your car to a
| mechanic is a good one. When I take my car to a mechanic
| they are returning it to a stock configuration. They don't
| need to update to a new, slightly different power steering
| this month, new brakes next, then injectors....
| dwattttt wrote:
| Certainly. But it's being compared to not taking systems
| down to do maintenance at all.
|
| > if the business is a 24/7 operation
|
| > In an ideal world there would be times every month to
| do proper machine maintenance
| wouldbecouldbe wrote:
| Kubernetes feels like Javascript has reached the sysadmins, new
| updates, libraries, build tools every week.
|
| It's mainly good for keeping high paid people employed, not
| keeping your servers stable.
|
| I ran a cluster 1.5 years in production, took me so much
| energy. Especially that one night where digital ocean managed
| cluster forced an update that crashed all servers; and there
| was no sane way to fix it.
|
| I'm back to stability with old school VPS; it just works. Every
| now and then you run a few patches. Simple & fast deploys; what
| a blessing.
| jl6 wrote:
| That's great if you have control over all the moving parts, but
| a lot of real-world (i.e. not exclusively software-based) orgs
| have interfaces to components and external entities that aren't
| so amenable to change. Maybe you can upgrade your cluster
| without anybody noticing. Maybe you're a wizard and upgrades
| never go wrong for you.
|
| More likely, you will be constrained by a patchwork quilt of
| contracts with customers and suppliers, or by regulatory and
| statutory requirements, or technology risk policies, and to get
| approval you'll need to schedule end-to-end testing with a
| bunch of stakeholders whose incentives aren't necessarily
| aligned to yours.
|
| That all adds up to $$$, and that's why there's a demand for
| stability and LTS editions.
| Waterluvian wrote:
| Sometimes I get this feeling that a lot of developers kind of
| want to be in their code all the time, tending to it. But it's
| just not a good use of my time. I want it to work so I can
| ignore it and move on to the next best use of my time.
|
| I trial upgraded a years old Django project today to 5.0 and it
| took almost zero work. I hadn't touched versions (other than
| patch)in over a year. That's the way I want it. Admittedly this
| was less about an LTS and more about sensible design with
| upgrading in mind.
| maximinus_thrax wrote:
| To paraphrase someone else's reaction, I'm violently shaking my
| head in disagreement. What you're saying only works when you
| have 100% full control of everything (including the customer
| data). As someone who spent years in the enterprise space, what
| you're saying is 'Ivory Tower Architecture'.
|
| LTS is a commitment. That is all. If someone is uncomfortable
| with such a commitment, then that's fine. But what LTS does is
| that it tells everyone (including paying customers) that the
| formats/schemas/APIs/etc.. in that version will be supported
| for a very long time and I don't have to think about it or
| budget too much for its maintenance. I would go the extra mile
| here and say that offline format should be supported FOREVER.
| None of that LTS bs for offline data, ever.
|
| Re-reading your comment gives me chills and reinforces my
| belief that I will never pay money to Google (they have a
| similar gung ho attitude against 'legacy garbage') or have any
| parts of my business depend on stuff which reserves the liberty
| of breaking shit early and often.
| airocker wrote:
| This is sorely needed. It causes service disruption for weeks
| because of weird upgrade schemes in GKE (or any other vendor)
| which works on top of Kubernetes. There are too many options with
| non-intuitive defaults related to how the control plane and the
| node pools will be repaired or upgraded. Clusters get upgraded
| without our knowledge and breaks service arbitrarily. Plus, if
| you are doing some infrastructure level changes, you have to put
| in extreme effort to keep upgrading.
|
| IMHO , infrastructure is too low level for frequent updates.
| Older versions need LTS.
| MuffinFlavored wrote:
| how much of what you said is part of the complexity that comes
| with Kubernetes in general?
|
| at the end of the day it's about cramming a ton of
| functionality into something like YAML is it not?
| airocker wrote:
| Our cluster has reasonable complexity and yamls work well so
| far. Maybe we will see what you are saying if we scale
| further. It allows us to maintain a much more complex cluster
| with fewer developers. And the software is well
| tested/reliable.
|
| But not having LTS is really difficult. It is impossible to
| keep rewriting things. And all related components: terraform,
| helm, GKE etc also update too quickly for us to manage with a
| small team. We do some lower level infrastructure work so
| these problems bites us harder.
| tomjen3 wrote:
| Lets Encrypt will only give you a cert that is good for 3 months,
| because they want you to automate the update.
|
| I don't think K8S should create an LTS, I think they should make
| it dirt simple to update.
| rafaelturk wrote:
| In my opinion if you need a stable production enviroment;
| MicroK8s stands out as the Kubernetes LTS choice. Canonical
| managed to create a successfull robust and stable Kubernetes
| ecosystem. Despite its name, MicroK8s isn't exactly 'micro'--it's
| more accurately a highly stable and somewhat opinionated version
| of Kubernetes.
| nezirus wrote:
| The cynical voice inside me says it works as intended. The
| purpose of k8s is not to help you run your
| business/project/whatever, but a way to ascend to DevOps Nirvana.
| That means never-ending cycle of upgrades for the purpose of
| upgrading.
|
| I guess too many people are using k8s where they should have used
| something simpler. It's fashionable to follow the "best
| practices" of FAANGs, but I'm not sure that's healthy for vast
| majority of other companies, which are simply not on the same
| scale and don't have armies of engineers (guardians of the holy
| "Platform")
| oneplane wrote:
| No it doesn't.
|
| If you can't keep up you have two options:
|
| 1. Pay someone else to do it for you (effectively an LTS)
|
| 2. Don't use it
|
| Software is imperfect, processes are imperfect. An LTS doesn't
| fix that, it just pushes problems forward. If you are in a
| situation where you need a frozen software product, Kubernetes
| simply doesn't fit the use case and that's okay.
|
| I suppose it's pretty much all about expectations and managing
| those instead of trying to hide mis-matches, bad choices and
| ineptitude. (most LTS use cases) It's essentially x509
| certificate management all over again; if you can't do it right
| automatically, that's not the certificate lifetime's fault, it's
| the implementors fault.
|
| As for option 1: that can take many shapes, including abstracting
| away K8S entirely, replacing entire clusters instead of
| 'upgrading' them, or having someone do the actual manual upgrade.
| But in a world with control loops and automated reconciliation,
| adding a manual process seems a bit like missing the forest for
| the trees. I for one have not seen a successful use of K8S where
| it was treated like an application that you periodically manually
| patch. Not because it's not possible to do, but because it's a
| symptom of a certain company culture.
| mountainriver wrote:
| There have been numerous proposals for LTS releases within the
| k8s community. I'm curious if someone could chime in as to why
| those haven't succeeded
| b7we5b7a wrote:
| Perhaps it's because I work in a small software shop and we do
| only B2B, but 99% of our applications consist of a frontend (JS
| served by an nginx image), a middleware (RoR, C#, Rust), nginx
| ingress and cert-manager. Sometimes we have PersistentVolumes,
| for 1 project we have CronJobs. SQL DBs are provisioned via the
| cloud provider. We monitor via Grafana Cloud, and haven't felt
| the need for more complex tools yet (yes, we're about to deploy
| NetworkPolicies and perform other small changes to harden the
| setup a bit).
|
| In my experience:
|
| - AKS is the simplest to update: select "update cluster and
| nodes", click ok, wait ~15m (though I will always remember
| vividly the health probe path change for LBs in 1.24 - perhaps a
| giant red banner would have been a good idea in this case)
|
| - EKS requires you to manually perform all the steps AKS does for
| you, but it's still reasonably easy
|
| - All of this can be easily scripted
|
| I totally agree with the other comments here: LTS releases would
| doom the project to support >10y-old releases just because
| managers want to "create value", but don't want to spend a couple
| weeks a year to care for the stuff _they use in production_.
| Having reasonably up-to-date, maintainable infrastructure IS
| value to the business.
| cyrnel wrote:
| Good overview! I'd personally rather have better tooling for
| upgrades. Recently the API changes have been minimal, but the
| real problem is the mandatory node draining that causes
| downtime/disruption.
|
| In theory, there's nothing stopping you from just updating the
| kubelet binary on every node. It will generally inherit the
| existing pods. Nomad even supports this[1]. But apparently there
| are no guarantees about this working between versions. And in
| fact some past upgrades have broken the way kubelet stores its
| own state, preventing this trick.
|
| All I ask is for this informal trick to be formalized in the e2e
| tests. I'd write a KEP but I'm too busy draining nodes!
|
| [1]: https://developer.hashicorp.com/nomad/docs/upgrade
| op00to wrote:
| 100% as someone who used to support Kubernetes commercially,
| long term support is an engineering nightmare with kube. My
| customers who could upgrade easily were more stable and easily
| supported. The customers that couldn't handle an upgrade were
| the exact opposite - long support cases, complex request
| processes for troubleshooting information, the deck was
| probably stacked against them from the start.
|
| Anyway, make upgrades less scary and more routine and the risk
| bubbles away.
| voytec wrote:
| > The Kubernetes project currently lacks enough active
| contributors to adequately respond to all issues and PRs.
|
| > This bot triages issues according to the following rules: ...
|
| > /close
| rdsubhas wrote:
| Many comments here disagreeing about LTS just because it won't be
| up-to-date - are missing a critical point.
|
| When people run a rolling release on their server, their original
| intent is "Yay I'll force myself to be up-to-date". Reality is,
| they get conflicts on installed 3rd party software in each
| upgrade. What ends up happening is, they get frozen on some point
| in time, without even security patches for a long time.
|
| k8s is like an OS, it's not just the core components, there is
| <ingress/ gateway/ mesh/ overlays/ operators/ admission
| controllers/ 3rd party integrations like vault, autoscalers,
| etc>. Something or the other breaks with each rolling release.
| I've grown really tired of the way GKE v1.25 pretends to be a
| "minor" automated upgrade, when it removes or changes god knows
| how many APIs.
|
| This is what is happening in kubernetes land. The broken upgrade
| fatigue is real, but it's hampered by wishful thinking.
___________________________________________________________________
(page generated 2023-12-04 23:00 UTC)