hngopher.com

       [HN Gopher] What would a Kubernetes 2.0 look like
       ___________________________________________________________________
        
       What would a Kubernetes 2.0 look like
        
       Author : Bogdanp
       Score  : 121 points
       Date   : 2025-06-19 12:00 UTC (10 hours ago)
        
 (HTM) web link (matduggan.com)
 (TXT) w3m dump (matduggan.com)
        
       | zdw wrote:
       | Related to this, a 2020 take on the topic from the MetalLB dev:
       | https://blog.dave.tf/post/new-kubernetes/
        
         | jauntywundrkind wrote:
         | 152 comments on _A Better Kubernetes, from the Ground Up,_
         | https://news.ycombinator.com/item?id=25243159
        
       | zug_zug wrote:
       | Meh, imo this is wrong.
       | 
       | What Kubernetes is missing most is a 10 year track record of
       | simplicity/stability. What it needs most to thrive is a better
       | reputation of being hard to foot-gun yourself with.
       | 
       | It's just not a compelling business case to say "Look at what you
       | can do with kubernetes, and you only need a full-time team of 3
       | engineers dedicated to this technology at tho cost of a million a
       | year to get bin-packing to the tune of $40k."
       | 
       | For the most part Kubernetes is becoming the common-tongue,
       | despite all the chaotic plugins and customizations that interact
       | with each other in a combinatoric explosion of
       | complexity/risk/overhead. A 2.0 would be what I'd propose if I
       | was trying to kill kuberenetes.
        
         | candiddevmike wrote:
         | Kubernetes is what happens when you need to support everyone's
         | wants and desires within the core platform. The abstraction
         | facade breaks and ends up exposing all of the underlying pieces
         | because someone needs feature X. So much of Kubernetes'
         | complexity is YAGNI (for most users).
         | 
         | Kubernetes 2.0 should be a boring pod scheduler with some RBAC
         | around it. Let folks swap out the abstractions if they need it
         | instead of having everything so tightly coupled within the core
         | platform.
        
           | echelon wrote:
           | Kubernetes doesn't need a flipping package manager or charts.
           | It needs to do one single job well: workload scheduling.
           | 
           | Kubernetes clusters shouldn't be bespoke and weird with
           | behaviors that change based on what flavor of plugins you
           | added. That is antithetical to the principal of the workloads
           | you're trying to manage. You should be able to headshot the
           | whole thing with ease.
           | 
           | Service discovery is just one of many things that should be a
           | different layer.
        
             | KaiserPro wrote:
             | > Service discovery is just one of many things that should
             | be a different layer.
             | 
             | hard agree. Its like jenkins, good idea, but its not
             | portable.
        
               | 12_throw_away wrote:
               | > Its like jenkins
               | 
               | Having regretfully operated both k8s and Jenkins, I fully
               | agree with this, they have some deep DNA in common.
        
           | sitkack wrote:
           | Kubernetes is when you want to sell complexity because
           | complexity makes money and and naturally gets you vendor
           | lockin even while being ostensibly vendor neutral. Never
           | interrupt the customer while they are foot gunning
           | themselves.
           | 
           | Swiss Army Buggy Whips for Everyone!
        
             | wredcoll wrote:
             | Not really. Kubernetes is still wildly simpler than what
             | came before, especially accounting for the increased
             | capabilities.
        
               | cogman10 wrote:
               | Yup. Having migrated from a puppet + custom scripts
               | environment and terraform + custom scripts. K8S is a
               | breath of fresh air.
               | 
               | I get that it's not for everyone, I'd not recommend it
               | for everyone. But once you start getting a pretty diverse
               | ecosystem of services, k8s solves a lot of problems while
               | being pretty cheap.
               | 
               | Storage is a mess, though, and something that really
               | needs to be addressed. I typically recommend people
               | wanting persistence to not use k8s.
        
               | mdaniel wrote:
               | > Storage is a mess, though, and something that really
               | needs to be addressed. I typically recommend people
               | wanting persistence to not use k8s.
               | 
               | I have actually come to wonder if this is actually an
               | _AWS_ problem, and not a _Kubernetes_ problem. I mention
               | this because the CSI controllers seem to behave sanely,
               | but they are only as good as the requests being fulfilled
               | by the IaaS control plane. I secretly suspect that EBS
               | just wasn 't designed for such a hot-swap world
               | 
               | Now, I posit this because I haven't had to run clusters
               | in Azure nor GCP to know if my theory has legs
               | 
               | I guess the counter-experiment would be to forego the AWS
               | storage layer and try Ceph or Longhorn but no company
               | I've ever worked at wants to blaze trails about that, so
               | they just build up institutional tribal knowledge about
               | treating PVCs with kid gloves
        
               | wredcoll wrote:
               | Honestly this just feels like kubernetes just solving the
               | easy problems and ignoring the hard bits. You notice the
               | pattern a lot after a certain amount of time watching new
               | software being built.
        
               | mdaniel wrote:
               | Apologies, but what influence does Kubernetes have over
               | the way AWS deals with attach and detach behavior of EBS?
               | 
               | Or is your assertion that Kubernetes should be its own
               | storage provider and EBS can eat dirt?
        
               | wredcoll wrote:
               | I was tangenting, but yes, kube providing no storage
               | systems has led to a lot of annoying 3rd party ones
        
               | oceanplexian wrote:
               | > Yup. Having migrated from a puppet + custom scripts
               | environment and terraform + custom scripts. K8S is a
               | breath of fresh air.
               | 
               | Having experience in both the former and the latter (in
               | big tech) and then going on to write my own controllers
               | and deal with fabric and overlay networking problems, not
               | sure I agree.
               | 
               | In 2025 engineers need to deal with persistence, they
               | need storage, they need high performance networking, they
               | need HVM isolation and they need GPUs. When a philosophy
               | starts to get in the way of solving real problems and
               | your business falls behind, that philosophy will be left
               | on the side of the road. IMHO it's destined to go the way
               | as OpenStack when someone builds a simpler, cleaner
               | abstraction, and it will take the egos of a lot of people
               | with it when it does.
        
               | wredcoll wrote:
               | > simpler, cleaner abstraction
               | 
               | My life experience so far is that "simpler and cleaner"
               | tends to be mostly achieved by ignoring the harder bits
               | of actually dealing with the real world.
               | 
               | I used kubernete's (lack of) support for storage as an
               | example elsewhere, it's the same sort of thing where you
               | can look really clever and elegant if you just ignore the
               | 10% of the problem space that's actually hard.
        
               | KaiserPro wrote:
               | the fuck it is.
               | 
               | The problem is k8s is both a orchestration system and a
               | service provider.
               | 
               | Grid/batch/tractor/cube are all much much more simple to
               | run at scale. More over they can support complex
               | dependencies. (but mapping storage is harder)
               | 
               | but k8s fucks around with DNS and networking, disables
               | swap.
               | 
               | Making a simple deployment is fairly simple.
               | 
               | But if you want _any_ kind of ci/cd you need flux, any
               | kind of config management you need helm.
        
               | JohnMakin wrote:
               | > But if you want _any_ kind of ci/cd you need flux, any
               | kind of config management you need helm.
               | 
               | Absurdly wrong on both counts.
        
               | jitl wrote:
               | K8s has swap now. I am managing a fleet of nodes with
               | 12TB of NVMe swap each. Each container gets (memory limit
               | / node memory) * (total swap) swap limit. No way to
               | specify swap demand on the pod spec yet so needs to be
               | managed "by hand" with taints or some other correlation.
        
               | wredcoll wrote:
               | What does swap space get you? I always thought of it as a
               | it of an anachronism to be honest.
        
               | mdaniel wrote:
               | The comment you replied to cited 12TB of NVMe, and I can
               | assure you that 12TB of ECC RAM costs way more than NVMe
        
           | selcuka wrote:
           | > Let folks swap out the abstractions if they need it instead
           | of having everything so tightly coupled within the core
           | platform.
           | 
           | Sure, but then one of those third party products (say, X)
           | will catch up, and everyone will start using it. Then job ads
           | will start requiring "10 year of experience in X". Then X
           | will replace the core orchestrator (K8s) with their own
           | implementation. Then we'll start seeing comments like "X is a
           | horribly complex, bloated platform which should have been
           | just a boring orchestrator" on HN.
        
       | dijit wrote:
       | Honestly; make some blessed standards easier to use and maintain.
       | 
       | Right now running K8S on anything other than cloud providers and
       | toys (k3s/minikube) is disaster waiting to happen unless you're a
       | really seasoned infrastructure engineer.
       | 
       | Storage/state is decidedly not a solved problem, debugging
       | performance issues in your longhorn/ceph deployment is just
       | _pain_.
       | 
       | Also, I don't think we should be removing YAML, we should instead
       | get better at using it as an ILR (intermediate language
       | representation) and generating the YAML that we want instead of
       | trying to do some weird in-place generation (Argo/Helm
       | templating) - Kubernetes sacrificed a lot of simplicity to be
       | eventually consistent with manifests, and our response was to
       | ensure we use manifests as little as possible, which feels
       | incredibly bizzare.
       | 
       | Also, the design of k8s networking feels like it fits ipv6 really
       | well, but it seems like nobody has noticed somehow.
        
         | lawn wrote:
         | k3s isn't a toy though.
        
           | dijit wrote:
           | * Uses Flannel bi-lateral NAT for SDN
           | 
           | * Uses local-only storage provider by default for PVC
           | 
           | * Requires entire cluster to be managed by k3s, meaning no
           | freebsd/macos/windows node support
           | 
           | * Master TLS/SSL Certs not rotated (and not talked about).
           | 
           | k3s is _very much_ a toy - a nice toy though, very fun to
           | play with.
        
         | zdc1 wrote:
         | I like YAML since anything can be used to read/write it. Using
         | Python / JS / yq to generate and patch YAML on-the-fly is quite
         | nifty as part of a pipeline.
         | 
         | My main pain point is, and always has been, helm templating.
         | It's not aware of YAML or k8s schemas and puts the onus of
         | managing whitespace and syntax onto the chart developer. It's
         | pure insanity.
         | 
         | At one point I used a local Ansible playbook for some
         | templating. It was great: it could load resource template YAMLs
         | into a dict, read separately defined resource configs, and then
         | set deeply nested keys in said templates and spit them out as
         | valid YAML. No helm `indent` required.
        
           | pm90 wrote:
           | yaml is just not maintainable if you're managing lots of apps
           | for eg a midsize company or larger. Upgrades become
           | manual/painful.
        
       | jcastro wrote:
       | For the confusion around verified publishing, this is something
       | the CNCF encourages artifact authors and their projects to set
       | up. Here are the instructions for verifying your artifact:
       | 
       | https://artifacthub.io/docs/topics/repositories/
       | 
       | You can do the same with just about any K8s related artifact. We
       | always encourage projects to go through the process but sometimes
       | they need help understanding that it exists in the first place.
       | 
       | Artifacthub is itself an incubating project in the CNCF, ideas
       | around making this easier for everyone are always welcome,
       | thanks!
       | 
       | (Disclaimer: CNCF Staff)
        
         | calcifer wrote:
         | > We always encourage projects to go through the process but
         | sometimes they need help understanding that it exists in the
         | first place.
         | 
         | Including ingress-nginx? Per OP, it's not marked as verified.
         | If even the official components don't bother, it's hard to
         | recommend it to third parties.
        
       | johngossman wrote:
       | Not a very ambitious wishlist for a 2.0 release. Everyone I talk
       | to complains about the complexity of k8s in production, so I
       | think the big question is could you do a 2.0 with sufficient
       | backward compatibility that it could be adopted incrementally and
       | make it simpler. Back compat almost always mean complexity
       | increases as the new system does its new things _and_ all the old
       | ones.
        
         | herval wrote:
         | The question is always what part of that complexity can be
         | eliminated. Every "k8s abstraction" I've seen to date either
         | only works for a very small subset of stuff (eg the heroku-like
         | wrappers) or eventually develops a full blown dsl that's as
         | complex as k8s (and now you have to learn that job-specific
         | dsl)
        
           | mdaniel wrote:
           | Relevant: _Show HN: Canine - A Heroku alternative built on
           | Kubernetes_ - https://news.ycombinator.com/item?id=44292103 -
           | June, 2025 (125 comments)
        
             | herval wrote:
             | yep, that's the latest of a long lineage of such projects
             | (one of which I worked on myself). Ohers include kubero,
             | dokku, porter, kr0, etc. There was a moment back in 2019
             | where every big tech company was trying to roll out their
             | own K8s DSL (I know of Twitter, Airbnb, WeWork, etc).
             | 
             | For me, the only thing that really changed was LLMs -
             | chatgpt is exceptional at understanding and generating
             | valid k8s configs (much more accurately than it can do
             | coding). It's still complex, but it feels I have a second
             | brain to look at it now
        
       | mrweasel wrote:
       | What I would add is "sane defaults", as in unless you pick
       | something different, you get a good enough load
       | balancer/network/persistent storage/whatever.
       | 
       | I'd agree that YAML isn't a good choice, but neither is HCL. Ever
       | tried reading Terraform, yeah, that's bad too. Inherently we need
       | a better way to configure Kubernetes clusters and changing out
       | the language only does so much.
       | 
       | IPv6, YES, absolutely. Everything Docker, container and
       | Kubernetes should have been IPv6 only internal from the start.
       | Want IPv4? That should be handle by a special case ingress
       | controller.
        
         | zdw wrote:
         | Sane defaults is in conflict with "turning you into a customer
         | of cloud provider managed services".
         | 
         | The longer I look at k8s, the more I see it "batteries not
         | included" around storage, networking, etc, with the result
         | being that the batteries come with a bill attached from AWS,
         | GCP, etc. K8s is less of an open source project, and more as a
         | way encourage dependency on these extremely lucrative gap
         | filler services from the cloud providers.
        
           | JeffMcCune wrote:
           | Except you can easily install calico, istio, and ceph on used
           | hardware in your garage and get an experience nearly
           | identical to every hyper scaler using entirely free open
           | source software.
        
             | zdw wrote:
             | Having worked on on-prem K8s deployments, yes, you can do
             | this. But getting it to production grade is very different
             | than a garage-quality proof of concept.
        
               | mdaniel wrote:
               | I think OP's point was: but how much of that production
               | grade woe is the fault of _Kubernetes_ versus, sure,
               | turns out booting up an PaaS from scratch is hard as
               | nails. I think that k8s pluggable design also blurs that
               | boundary in most people 's heads. I can't think of the
               | last time the _control plane_ shit itself, versus
               | everyone and their cousin has a CLBO story for the
               | component controllers installed on top of k8s
        
               | zdw wrote:
               | CLBO?
        
               | mdaniel wrote:
               | Crash Loop Back Off
        
         | ChocolateGod wrote:
         | I find it easier to read Terraform/HCL over YAML for the simple
         | fact that it does't rely me trying to process invisible
         | characters.
        
       | tayo42 wrote:
       | > where k8s is basically the only etcd customer left.
       | 
       | Is that true. No one is really using it?
       | 
       | I think one thing k8s would need is some obvious answer for
       | stateful systems(at scale, not mysql at a startup). I think there
       | are some ways to do it? Where I work there is basically
       | everything on k8s, then all the databases on their own crazy
       | special systems to support they insist its impossible and costs
       | to much. I work in the worst of all worlds now supporting this.
       | 
       | re: comments about k8s should just schedule pods. mesos with
       | aurora or marathon was basically that. If people wanted that
       | those would have done better. The biggest users of mesos switched
       | to k8s
        
         | haiku2077 wrote:
         | I had to go deep down the etcd rabbit hole several years ago.
         | The problems I ran into:
         | 
         | 1. etcd did an fsync on every write and required all nodes to
         | complete a write to report a write as successful. This was not
         | configurable and far higher a guarantee than most use cases
         | actually need - most Kubernetes users are fine with snapshot +
         | restore an older version of the data. But it really severely
         | impacts performance.
         | 
         | 2. At the time, etcd had a hard limit of 8GB. Not sure if this
         | is still there.
         | 
         | 3. Vanilla etcd was overly cautious about what to do if a
         | majority of nodes went down. I ended up writing a wrapper
         | program to automatically recover from this in most cases, which
         | worked well in practice.
         | 
         | In conclusion there was no situation where I saw etcd used that
         | I wouldn't have preferred a highly available SQL DB. Indeed,
         | k3s got it right using sqlite for small deployments.
        
           | nh2 wrote:
           | For (1), I definitely want my production HA databases to
           | fsync every write.
           | 
           | Of course configurability is good (e.g. for automated fasts
           | tests you don't need it), but safe is a good default here,
           | and if somebody sets up a Kubernetes cluster, they can and
           | should afford enterprise SSDs where fsync of small data is
           | fast and reliable (e.g. 1000 fsyncs/second).
        
             | haiku2077 wrote:
             | > I definitely want my production HA databases to fsync
             | every write.
             | 
             | I didn't! Our business DR plan only called for us to
             | restore to an older version with short downtime, so fsync
             | on every write on every node was a reduction in performance
             | for no actual business purpose or benefit. IIRC we modified
             | our database to run off ramdisk and snapshot every few
             | minutes which ran way better and had no impact on our
             | production recovery strategy.
             | 
             | > if somebody sets up a Kubernetes cluster, they can and
             | should afford enterprise SSDs where fsync of small data is
             | fast and reliable
             | 
             | At the time one of the problems I ran into was that public
             | cloud regions in southeast asia had significantly worse
             | SSDs that couldn't keep up. This was on one of the big
             | three cloud providers.
             | 
             | 1000 fsyncs/second is a tiny fraction of the real world
             | production load we required. An API that only accepts 1000
             | writes a second is very slow!
             | 
             | Also, plenty of people run k8s clusters on commodity
             | hardware. I ran one on an old gaming PC with a budget SSD
             | for a while in my basement. Great use case for k3s.
        
           | dilyevsky wrote:
           | 1 and 2 can be overridden via flag. 3 is practically the
           | whole point of the software
        
             | haiku2077 wrote:
             | With 3 I mean that in cases where there was an
             | unambiguously correct way to recover from the situation,
             | etcd did not automatically recover. My wrapper program
             | would always recover from thise situations. (It's been a
             | number of years and the exact details are hazy now,
             | though.)
        
               | dilyevsky wrote:
               | If the majority of quorum is truly down, then you're
               | down. That is by design. There's no good way to recover
               | from this without potentially losing state so the system
               | correctly does nothing at this point. Sure you can force
               | it into working state with external intervention but
               | that's up to you
        
               | haiku2077 wrote:
               | Like I said I'm hazy on the details, this was a small
               | thing I did a long time ago. But I do remember our on-
               | call having to deal with a lot of manual repair of etcd
               | quorum, and I noticed the runbook to fix it had no steps
               | that needed any human decision making, so I made that
               | wrapper program to automate the recovery. It wasn't
               | complex either, IIRC it was about one or two pages of
               | code, mostly logging.
        
         | dilyevsky wrote:
         | That is decisively not true. A number of very large companies
         | use etcd directly for various needs
        
       | rwmj wrote:
       | Make there be one, sane way to install it, and make that method
       | work if you just want to try it on a single node or single VM
       | running on a laptop.
        
         | mdaniel wrote:
         | My day job makes this request of my team right now, and yet
         | when trying to apply this logic to a container _and_ cloud-
         | native control plane, there are a lot more devils hiding in
         | those details. Use MetalLB for everything, even if NLBs are
         | available? Use Ceph for storage even if EBS is available?
         | Definitely don 't use Ceph on someone's 8GB laptop. I can keep
         | listing "yes, but" items that make doing such a thing
         | impossible to troubleshoot because there's not one consumer
         | 
         | So, to circle back to your original point: rke2 (Apache 2) is a
         | fantastic, airgap-friendly, intelligence community approved
         | distribution, and pairs fantastic with rancher desktop (also
         | Apache 2). It's not the _kubernetes_ part of that story which
         | is hard, it 's the "yes, but" part of the lego build
         | 
         | -
         | https://github.com/rancher/rke2/tree/v1.33.1%2Brke2r1#quick-...
         | 
         | - https://github.com/rancher-sandbox/rancher-desktop/releases
        
       | fatbird wrote:
       | How many places are running k8s without OpenShift to wrap it and
       | manage a lot of the complexity?
        
         | jitl wrote:
         | I've never used OpenShift nor do I know anyone irl who uses it.
         | Sample from SF where most people I know are on AWS or GCP.
        
           | coredog64 wrote:
           | You can always go for the double whammy and run ROSA: RedHat
           | OpenShift on AWS
        
         | raincom wrote:
         | Openshift, if IBM and Redhat want to milk the license and
         | support contracts. There are other vendors that sell k8s:
         | Rancher, for instance. SuSe bought Rancher.
        
       | Melatonic wrote:
       | MicroVM's
        
       | geoctl wrote:
       | I would say k8s 2.0 needs: 1. gRPC/proto3-based APIs to make
       | controlling k8s clusters easier using any programming language
       | not just practically Golang as is the case currently and this can
       | even make dealing with k8s controllers easier and more
       | manageable, even though it admittedly might actually complicates
       | things at the API server-side when it comes CRDs. 2. PostgreSQL
       | or pluggable storage backend by default instead of etcd. 3. Clear
       | identity-based, L7-aware ABAC-based access control interface that
       | can be implemented by CNIs for example. 4. Applying userns by
       | default 5. Easier pluggable per-pod CRI system where microVMs and
       | container-based runtimes can easily co-exist based on the
       | workload type.
        
         | jitl wrote:
         | All the APIs, including CRDs, already have a well described
         | public & introspectable OpenAPI schema you can use to generate
         | clients. I use the TypeScript client generated and maintained
         | by Kubernetes organization. I don't see what advantage adding a
         | binary serialization wire format has. I think gRPC makes sense
         | when there's some savings to be had with latency, multiplexing,
         | streams etc but control plane things like Kubernetes don't seem
         | to me like it's necessary.
        
           | geoctl wrote:
           | I haven't used CRDs myself for a few years now (probably
           | since 2021), but I still remember developing CRDs was an ugly
           | and hairy experience to say the least, partly due to the
           | flaws of Golang itself (e.g. no traits like in Rust, no
           | macros, no enums, etc...). With protobuf you can easily
           | compile your definitions to any language with clear enum,
           | oneof implementations, you can use the standard protobuf
           | libraries to do deepCopy, merge, etc... for you and you can
           | also add basic validations in the protobuf definitions and so
           | on. gRPC/protobuf will basically allow you to develop k8s
           | controllers very easily in any language.
        
             | mdaniel wrote:
             | CRDs are not tied to golang in any way whatsoever;
             | <https://www.redhat.com/en/blog/writing-custom-controller-
             | pyt...> and <https://metacontroller.github.io/metacontrolle
             | r/guide/create...> are two concrete counter-examples, with
             | the latter being the most "microservices" extreme. You can
             | almost certainly implement them in bash if you're trying to
             | make the front page of HN
        
               | geoctl wrote:
               | I never said that CRDs are tied to Golang, I said that
               | the experience of compiling CRDs, back then gen-
               | controller or whatever is being used these days, to
               | Golang types was simply ugly partly due to the flaws of
               | the language itself. What I mean is that gRPC can
               | standardize the process of compiling both k8s own
               | resource definitions as well as CRDs to make the process
               | of developing k8s controllers in any language simply much
               | easier. However this will probably complicate the logic
               | of the API server trying to understand and decode the
               | binary-based protobuf resource serialized representations
               | compared to the current text-based JSON representations.
        
           | znpy wrote:
           | > have a well described public & introspectable OpenAPI
           | schema you can use to generate clients.
           | 
           | Last time I tried loading the openapi schema in the swagger
           | ui on my work laptop (this was ~3-4 years ago, and i had an
           | 8th gen core i7 with 16gb ram) it hang my browser, leading to
           | tab crash.
        
             | mdaniel wrote:
             | Loading it in what? I just slurped the 1.8MB openapi.json
             | for v1.31 into Mockroon and it fired right up instantly
        
           | ofrzeta wrote:
           | I think the HTTP API with OpenAPI schema is part of what's so
           | great about Kubernetes and also a reason for its success.
        
         | dilyevsky wrote:
         | 1. The built-in types are already protos. Imo gRPC wouldn't be
         | a good fit - actually will make the system harder to use. 2.
         | Already can be achieved today via kine[0] 3. Couldn't you build
         | this today via regular CNI? Cilium NetworkPolicies and others
         | basically do this already
         | 
         | 4,5 probably don't require 2.0 - can be easily added within
         | existing API via KEP (cri-o already does userns configuration
         | based on annotations)
         | 
         | [0] - https://github.com/k3s-io/kine
        
           | geoctl wrote:
           | Apart from 1 and 3, probably everything else can be added
           | today if the people in charge have the will to do that, and
           | that's assuming that I am right and these points are actually
           | that important to be standardized. However the big
           | enterprise-tier money in Kubernetes is made from dumbing down
           | the official k8s interfaces especially those related to
           | access control (e.g. k8s own NetworkPolicy compared to
           | Istio's access control related resources).
        
       | pm90 wrote:
       | Hard disagree with replacing yaml with HCL. Developers find HCL
       | very confusing. It can be hard to read. Does it support imports
       | now? Errors can be confusing to debug.
       | 
       | Why not use protobuf, or similar interface definition languages?
       | Then let users specify the config in whatever language they are
       | comfortable with.
        
         | geoctl wrote:
         | You can very easily build and serialize/deserialize HCL, JSON,
         | YAML or whatever you can come up with outside Kubernetes from
         | the client-side itself (e.g. kubectl). This has actually
         | nothing to do with Kubernetes itself at all.
        
           | Kwpolska wrote:
           | There aren't that many HCL serialization/deserialization
           | tools. Especially if you aren't using Go.
        
         | dilyevsky wrote:
         | Maybe you know this but Kubernetes interface definitions are
         | already protobufs (except for crds)
        
           | cmckn wrote:
           | Sort of. The hand-written go types are the source of truth
           | and the proto definitions are generated from there, solely
           | for the purpose of generating protobuf serializers for the
           | hand-written go types. The proto definition is used more as
           | an intermediate representation than an "API spec". Still
           | useful, but the ecosystem remains centered on the go types
           | and their associated machinery.
        
             | dilyevsky wrote:
             | Given that i can just take generated.proto and ingest it in
             | my software then marshal any built-in type and apply it via
             | standard k8s api, why would I even need all the boilerplate
             | crap from apimachinery? Perfectly happy with existing
             | rest-y semantics - full grpc would be going too far
        
         | dangus wrote:
         | Confusing? Here I am working on the infrastructure side
         | thinking that I'm working with a a baby configuration language
         | for dummies who can't code when I use HCL/Terraform.
         | 
         | The idea that someone who works with JavaScript all day might
         | find HCL confusing seems hard to imagine to me.
         | 
         | To be clear, I am talking about the syntax and data types in
         | HCL, not necessarily the way Terraform processes it, which I
         | admit can be confusing/frustrating. But Kubernetes wouldn't
         | have those pitfalls.
        
           | mdaniel wrote:
           | orly, what structure does this represent?
           | outer {         example {           foo = "bar"         }
           | example {           foo = "baz"         }       }
           | 
           | it reminds me of the insanity of toml                 [lol]
           | [[whut]]       foo = "bar"       [[whut]]       foo = "baz"
           | 
           | only at least with toml I can $(python3.13 -c 'import
           | tomllib, sys; print(tomllib.loads(sys.stdin.read()))') to
           | find out, but with hcl too bad
        
           | icedchai wrote:
           | Paradoxically, the simplicity itself can be part of the
           | confusion: the anemic "for loop" syntax, crazy conditional
           | expressions to work around that lack of "if" statements,
           | combine this with "count" and you can get some weird stuff.
           | It becomes a flavor all its own.
        
         | znpy wrote:
         | > Hard disagree with replacing yaml with HCL.
         | 
         | I see some value instead. Lately I've been working on Terraform
         | code to bring up a whole platform in half a day (aws sub-
         | account, eks cluster, a managed nodegroup for karpenter,
         | karpenter deployment, ingress controllers, LGTM stack,
         | public/private dns zones, cert-manager and a lot more) and I
         | did everything in Terraform, including Kubernetes resources.
         | 
         | What I appreciated about creating Kubernetes resources (and
         | helm deployments) in HCL is that it's typed and has a schema,
         | so any ide capable of talking to an LSP (language server
         | protocol - I'm using GNU Emacs with terraform-ls) can provide
         | meaningful auto-completion as well proper syntax checking (I
         | don't need to apply something to see it fail, emacs (via the
         | language server) can already tell me what I'm writing is
         | wrong).
         | 
         | I really don't miss having to switch between my ide and the
         | Kubernetes API reference to make sure I'm filling each field
         | correctly.
        
           | wredcoll wrote:
           | But... so do yaml and json documents?
        
           | NewJazz wrote:
           | I do something similar except with pulumi, and as a result I
           | don't need to learn HCL, and I can rely on the excellent
           | language servers for e.g. Typescript or python.
        
         | vanillax wrote:
         | Agree HCL is terrible. K8s YAML is fine. I have yet to hit a
         | use case that cant be solved with its types. If you are doing
         | too much perhaps a config map is the wrong choice.
        
           | ofrzeta wrote:
           | It's just easier to shoot yourself in the foot with no proper
           | type support (or enforcement) in YAML. I've seen Kubernetes
           | updates fail when the version field was set to 1.30 and it
           | got interpreted as a float 1.3. Sure, someone made a mistakes
           | but the config language should/could stop you from making
           | them.
        
         | dochne wrote:
         | My main beef with HCL is a hatred for how it implemented for
         | loops.
         | 
         | Absolutely loathsome syntax IMO
        
           | mdaniel wrote:
           | It was mentioned pretty deep in another thread, but this is
           | just straight up user hostile                 variable
           | "my_list" { default = ["a", "b"] }       resource whatever
           | something {         for_each = var.my_list       }
           | The given "for_each" argument value is unsuitable: the
           | "for_each" argument must be a map, or set of strings, and you
           | have provided a value of type tuple.
        
         | acdha wrote:
         | > Developers find HCL very confusing. It can be hard to read.
         | Does it support imports now? Errors can be confusing to debug.
         | 
         | This sounds a lot more like "I resented learning something new"
         | than anything about HCL, or possibly confusing learning HCL
         | simultaneously with something complex as a problem with the
         | config language rather than the problem domain being
         | configured.
        
           | aduwah wrote:
           | Issue is that you do not want a dev learning hcl. Same as you
           | don't want your SRE team learning next and react out of
           | necessity.
           | 
           | The ideal solution would be to have an abstraction that is
           | easy to use and does not require learning a whole new concept
           | (especially an ugly one as hcl). Also learning hcl is simply
           | just the tip of the iceberg, with sinking into the
           | dependencies between components and outputs read from a bunch
           | of workspaces etc. It is simply wasted time to have the devs
           | keeping up with all the terraform heap that SREs manage and
           | keep evolving under the hood. The same dev time better spent
           | creating features.
        
             | acdha wrote:
             | Why? If they can't learn HCL, they're not going to be a
             | successful developer.
             | 
             | If your argument is instead that they shouldn't learn
             | infrastructure, then the point is moot because that applies
             | equally to every choice (knowing how to indent YAML doesn't
             | mean they know what to write). That's also wrong as an
             | absolute position but for different reasons.
        
               | aduwah wrote:
               | You don't see my point. The ideal solution would be
               | something that can be learned easily by both the dev and
               | infra side without having to boil the ocean on one side
               | or the other. Something like protobuf was mentioned above
               | which is a better idea than hcl imho
        
       | darkwater wrote:
       | I totally dig the HCL request. To be honest I'm still mad at
       | Github that initially used HCL for Github Actions and then
       | ditched it for YAML when they went stable.
        
         | carlhjerpe wrote:
         | I detest HCL, the module system is pathetic. It's not
         | composable at all and you keep doing gymnastics to make sure
         | everything is known at plan time (like using lists where you
         | should use dictionaries) and other anti-patterns.
         | 
         | I use Terranix to make config.tf.json which means I have the
         | NixOS module system that's composable enough to build a Linux
         | distro at my fingertips to compose a great Terraform
         | "state"/project/whatever.
         | 
         | It's great to be able to run some Python to fetch some data,
         | dump it in JSON, read it with Terranix, generate config.tf.json
         | and then apply :)
        
           | jitl wrote:
           | What's the list vs dictionary issue in Terraform? I use a lot
           | of dictionaries (maps in tf speak), terraform things like
           | for_each expect a map and throw if handed a list.
        
             | carlhjerpe wrote:
             | Internally a lot of modules cast dictionaries to lists of
             | the same length because the keys of the dict might not be
             | known at plan time or something. The "Terraform AWS VPC
             | module does this internally for many things.
             | 
             | I couldn't tell you exactly, but modules always end up
             | either not exposing enough or exposing too much. If I were
             | to write my module with Terranix I can easily replace any
             | value in any resource from the module I'm importing using
             | "resource.type.name.parameter = lib.mkForce
             | "overridenValue";" without having to expose that parameter
             | in the module "API".
             | 
             | The nice thing is that it generates
             | "Terraform"(config.tf.json) so the supremely awesome state
             | engine and all API domain knowledge bound in providers work
             | just the same and I don't have to reach for something as
             | involved as Pulumi.
             | 
             | You can even mix Terranix with normal HCL since
             | config.tf.json is valid in the same project as HCL. A great
             | way to get started is to generate your provider config and
             | other things where you'd reach to Terragrunt/friends. Then
             | you can start making options that makes resources at your
             | own pace.
             | 
             | The terraform LSP sadly doesn't read config.tf.json yet so
             | you'll get warnings regarding undeclared locals and such
             | but for me it's worth it, I generally write tf/tfnix with
             | the provider docs open and the language (Nix and HCL) are
             | easy enough to write without full LSP.
             | 
             | https://terranix.org/ says it better than me, but by doing
             | it with Nix you get programatical access to the biggest
             | package library in the world to use at your discretion
             | (Build scripts to fetch values from weird places, run
             | impure scripts with null_resource or it's replacements) and
             | an expressive functional programming language where you can
             | do recursion and stuff, you can use derivations to run any
             | language to transform strings with ANY tool.
             | 
             | It's like Terraform "unleashed" :) Forget "dynamic" blocks,
             | bad module APIs and hacks (While still being able to use
             | existing modules too if you feel the urge).
        
               | mdaniel wrote:
               | Sounds like the kustomize mental model: take code you
               | potentially don't control, apply patches to it until it
               | behaves like you wish, apply
               | 
               | If the documentation and IDE story for kustomize was
               | better, I'd be its biggest champion
        
               | carlhjerpe wrote:
               | You can run Kustomize in a Nix derivation with inputs
               | from Nix and apply the output using Terranix and the
               | kubectl provider, gives you a very nice reproducible way
               | to apply Kubernetes resources with the Terraform state
               | engine, I like how Terraform makes managing the lifecycle
               | of CRUD with cascading changes and replacements which
               | often is pretty optimal-ish at least.
               | 
               | And since it's Terraform you can create resources using
               | any provider in the registry to create resources
               | according to your Kubernetes objects too, it can
               | technically replace things like external-dns and similar
               | controllers that create stuff in other clouds, but in a
               | more "static configuration" way.
               | 
               | Edit: This works nicely with Gitlab Terraform state
               | hosting thingy as well.
        
               | jitl wrote:
               | I think Pulumi is in a similar spot, you get a real
               | programming language (of your choice) and it gets to use
               | the existing provider ecosystem. You can use the
               | programming language composition facilities to work
               | around the plan system if necessary, although their plans
               | allow more dynamic stuff than Terraform.
               | 
               | The setup with Terranix sounds cool! I am pretty
               | interested in build system type things myself, I recently
               | wrote a plan/apply system too that I use to manage SQL
               | migrations.
               | 
               | I want learn nix, but I think that like Rust, it's just a
               | bit too wide/deep for me to approach on my own time
               | without a tutor/co-worker or forcing function like a work
               | project to push me through the initial barrier.
        
               | carlhjerpe wrote:
               | Yep it's similar, but you bring all your dependencies
               | with you through Nix rather than a language specific
               | package manager.
               | 
               | Try using something like devenv.sh initially just to
               | bring tools into $PATH in a distro agnostic & mostly-ish
               | MacOS compatible way (so you can guarantee everyone has
               | the same versions of EVERYTHING you need to build your
               | thing).
               | 
               | Learn the language basics after it brings you value
               | already, then learn about derivations and then the module
               | system which is this crazy composable multilayer
               | recursive magic merging type system implemented on top of
               | Nix, don't be afraid to clone nixpkgs and look inside.
               | 
               | Nix derivations are essentially Dockerfiles on steroids,
               | but Nix language brings /nix/store paths into the
               | container, sets environment variables for you and runs
               | some scripts, and all these things are hashed so if any
               | input changes it triggers automatic cascading rebuilds,
               | but also means you can use a binary cache as a kind of
               | "memoization" caching thingy which is nice.
               | 
               | It's a very useful tool, it's very non-invasive on your
               | system (other than disk space if you're not managing
               | garbage collection) and you can use it in combination
               | with other tools.
               | 
               | Makes it very easy to guarantee your DevOps scripts runs
               | exactly your versions of all CLI tools and build systems
               | and whatever even if the final piece isn't through Nix.
               | 
               | Look at "pgroll" for Postgres migrations :)
        
               | jitl wrote:
               | pgroll seems neat but I ended up writing my own tools for
               | this one because I need to do somewhat unique shenanigans
               | like testing different sharding and resource allocation
               | schemes in Materialize.com (self hosted). I have 480
               | source input schemas (postgres input schemas described
               | here if you're curious, the materialize stuff is brand
               | new https://www.notion.com/blog/the-great-re-shard) and
               | manage a bunch of different views & indexes built on top
               | of those; create a bunch of different copies of the
               | views/indexes striped across compute nodes, like right
               | now I'm testing 20 schemas per whole-aws-instance node,
               | versus 4 schemas per quarter-aws-node, M/N*Y with
               | different permutations of N and Y. With the plan/apply
               | model I just need to change a few lines in TypeScript and
               | get the minimal changes to all downstream dependencies
               | needed to roll it out.
        
               | Groxx wrote:
               | Internally... in what? Not HCL itself, I assume? Also I'm
               | not seeing much that implies HCL has a "plan time"...
               | 
               | I'm not familiar with HCL so I'm struggling to find much
               | here that would be conclusive, but a lot of this thread
               | sounds like "HCL's features that YAML does not have are
               | sub-par and not sufficient to let me only use HCL" and...
               | yeah, you usually can't use YAML that way either, so I'm
               | not sure why that's all that much of a downside?
               | 
               | I've been idly exploring config langs for a while now,
               | and personally I tend to just lean towards JSON5 because
               | comments are absolutely required... but support isn't
               | anywhere near as good or automatic as YAML :/ HCL has
               | been on my interest-list for a while, but I haven't gone
               | deep enough into it to figure out any real opinion.
        
       | mdaniel wrote:
       | > Allow etcd swap-out
       | 
       | From your lips to God's ears. And, as they correctly pointed out,
       | this work is already done, so I just do not understand the
       | holdup. Folks can continue using etcd if it's their favorite, but
       | _mandating_ it is weird. And I can already hear the
       | butwhataboutism yet there is already a CNCF certification process
       | and a whole subproject just for testing Kubernetes itself, so do
       | they believe in the tests or not?
       | 
       | > The Go templates are tricky to debug, often containing complex
       | logic that results in really confusing error scenarios. The error
       | messages you get from those scenarios are often gibberish
       | 
       | And they left off that it is crazypants to use a _textual_
       | templating language for a _whitespace sensitive, structured_ file
       | format. But, just like the rest of the complaints, it 's not like
       | we don't already have replacements, but the network effect is
       | very real and very hard to overcome
       | 
       | That barrier of "we have nicer things, but inertia is real"
       | applies to so many domains, it just so happens that helm impacts
       | a much larger audience
        
       | jonenst wrote:
       | What about kustomize and kpt ? I'm using them (instead of helm)
       | but but:
       | 
       | * kpt is still not 1.0
       | 
       | * both kustomize and kpt require complex setups to
       | programatically generate configs (even for simple things like
       | replicas = replicasx2)
        
       | jitl wrote:
       | I feel like I'm already living in the Kubernetes 2.0 world
       | because I manage my clusters & its applications with Terraform.
       | 
       | - I get HCL, types, resource dependencies, data structure
       | manipulation for free
       | 
       | - I use a single `tf apply` to create the cluster, its underlying
       | compute nodes, related cloud stuff like S3 buckets, etc; as well
       | as all the stuff running on the cluster
       | 
       | - We use terraform modules for re-use and de-duplication,
       | including integration with non-K8s infrastructure. For example,
       | we have a module that sets up a Cloudflare ZeroTrust tunnel to a
       | K8s service, so with 5 lines of code I can get a unique public
       | HTTPS endpoint protected by SSO for _whatever_ running in K8s.
       | The module creates a Deployment running cloudflared as well as
       | configures the tunnel in the Cloudflare API.
       | 
       | - Many infrastructure providers ship signed well documented
       | Terraform modules, and Terraform does reasonable dependency
       | management for the modules & providers themselves with lockfiles.
       | 
       | - I can compose Helm charts just fine via the Helm terraform
       | provider if necessary. Many times I see Helm charts that are just
       | "create namespace, create foo-operator deployment, create custom
       | resource from chart values" (like Datadog). For these I opt to
       | just install the operator & manage the CRD from terraform
       | directly or via a thin Helm pass through chat that just echos
       | whatever HCL/YAML I put in from Terraform values.
       | 
       | Terraform's main weakness is orchestrating the apply process
       | itself, similar to k8s with YAML or whatever else. We use
       | Spacelift for this.
        
         | ofrzeta wrote:
         | In a way it's redundant to have the state twice: once in
         | Kubernetes itself and once in the Terraform state. This can
         | lead to problems when resources are modified through mutating
         | webhooks or similar. Then you need to mark your properties as
         | "computed fields" or something like that. So I am not a fan of
         | managing applications through TF. Managing clusters might be
         | fine, though.
        
       | moomin wrote:
       | Let me add one more: give controllers/operators a defined
       | execution order. Don't let changes flow both ways. Provide better
       | ways for building things that don't step on everyone else's toes.
       | Make whatever replaces helm actually maintain stuff rather than
       | just splatting it out.
        
         | clvx wrote:
         | This is a hard no for me. This is the whole thing about
         | reconciliation loop. You can just push something to the
         | api/etcd and eventually it will become ready when all the
         | dependencies exist. Now, rejecting manifests because crd's
         | don't exist yet is a different discussion. I'm down to have a
         | cache of manifests to be deployed waiting for crd but if the
         | crd isn't deployed, then a garbage collection alike tool
         | removes them from cache. This is what fluxcd and argocd already
         | do in a way but I would like to Have it natively.
        
       | recursivedoubts wrote:
       | please make it look like old heroku for us normies
        
       | dzonga wrote:
       | I thought this would be written along the lines of an lllm going
       | through your code - spinning up a railway file. then say have tf
       | for few of the manual dependencies etc that can't be easily
       | inferred.
       | 
       | & get automatic scaling out of the box etc. a more simplified
       | flow rather than wrangling yaml or hcl
       | 
       | in short imaging if k8s was a 2-3 max 5 line docker compose like
       | file
        
       | singularity2001 wrote:
       | More like wasm?
        
         | mdaniel wrote:
         | As far as I know one can do that right now, since wasmedge
         | (Apache 2) exposes a CRI interface
         | https://wasmedge.org/docs/develop/deploy/oci-runtime/crun#pr...
         | (et al)
        
       | nunez wrote:
       | I _still_ think Kubernetes is insanely complex, despite all that
       | it does. It seems less complex these days because it's so
       | pervasive, but complex it remains.
       | 
       | I'd like to see more emphasis on UX for v2 for the most common
       | operations, like deploying an app and exposing it, then doing
       | things like changing service accounts or images without having to
       | drop into kubectl edit.
       | 
       | Given that LLMs are it right now, this probably won't happen, but
       | no harm in dreaming, right?
        
         | Pet_Ant wrote:
         | Kubernetes itself contains so many layers of abstraction. There
         | are pods, which is the core new idea, and it's great. But now
         | there are deployments, and rep sets, and namespaces... and it
         | makes me wish we could just use Docker Swarm.
         | 
         | Even Terraform seems to live on just a single-layer and was
         | relatively straight-forward to learn.
         | 
         | Yes, I am in the middle of learning K8s so I know exactly how
         | steep the curve is.
        
           | jakewins wrote:
           | The core idea isn't pods. The core idea is reconciliation
           | loops: you have some desired state - a picture of how you'd
           | like a resource to look or be - and little controller loops
           | that indefinitely compare that to the world, and update the
           | world.
           | 
           | Much of the complexity then comes from the enormous amount of
           | resource types - including all the custom ones. But the basic
           | idea is really pretty small.
           | 
           | I find terraform much more confusing - there's a spec, and
           | the real world.. and then an opaque blob of something I don't
           | understand that terraform sticks in S3 or your file system
           | and then.. presumably something similar to a one-shot
           | reconciler that wires that all together each time you plan
           | and apply?
        
             | vrosas wrote:
             | Someone saying "This is complex but I think I have the core
             | idea" and someone to responding "That's not the core idea
             | at all" is hilarious and sad. BUT ironically what you just
             | laid out about TF is exactly the same - you just manually
             | trigger the loop (via CI/CD) instead of a thing waiting for
             | new configs to be loaded. The state file you're referencing
             | is just a cache of the current state and TF reconciles the
             | old and new state.
        
               | jauco wrote:
               | Always had the conceptual model that terraform executes
               | something that resembles a merge using a three way diff.
               | 
               | There's the state file (base commit, what the system
               | looked like the last time terraform succesfully
               | executed). The current system (the main branch, which
               | might have changed since you "branched off") and the
               | terraform files (your branch)
               | 
               | Running terraform then merges your branch into main.
               | 
               | Now that I'm writing this down, I realize I never really
               | checked if this is accurate, tf apply works regardless of
               | course.
        
               | mdaniel wrote:
               | and then the rest of the owl is working out the merge
               | conflicts :-D
               | 
               | I don't know how to have a cute git analogy for "but
               | first, git deletes your production database, and then
               | recreates it, because some attribute changed that made
               | the provider angry"
        
             | mdaniel wrote:
             | > a one-shot reconciler that wires that all together each
             | time you plan and apply?
             | 
             | You skipped the { while true; do tofu plan; tofu apply;
             | echo "well shit"; patch; done; } part since the providers
             | do fuck-all about actually, no kidding, saying whether the
             | plan could succeed
        
             | jonenst wrote:
             | To me the core of k8s is pod scheduling on nodes,
             | networking ingress (e.g. nodeport service), networking
             | between pods (everything addressable directly), and
             | colocated containers inside pods.
             | 
             | Declarative reconciliation is (very) nice but not
             | irreplaceable (and actually not mandatory, e.g. kubectl run
             | xyz)
        
         | throwaway5752 wrote:
         | I've come to think that it is a case of "the distinctions
         | between types of computer programs are a human construct"
         | problem.
         | 
         | I agree with you on a human level. Operators and controllers
         | remind me of COM and CORBA, in a sense. They are hightly
         | abstract things, that are intrinsically so flexible that they
         | allow judgement (and misjudgement) in design.
         | 
         | For simple implementations, I'd want k8s-lite, that was more
         | opinionated and less flexible. Something which doesn't allow
         | for as much shooting ones' self in the foot. For very complex
         | implementations, though, I've felt existing abstractions to be
         | limiting. There is a reason why a single cluster is sometimes
         | the basis for cell boundaries in cellular architectures.
         | 
         | I sometimes wonder if one single system - kubernetes 2.0 or
         | anything else - can encompass the full complexity of the
         | problem space while being tractable to work with by human
         | architects and programmers.
        
           | nine_k wrote:
           | > _I 'd want k8s-lite, that was more opinionated and less
           | flexible_
           | 
           | You seem to want something like https://skateco.github.io/
           | (still compatible to k8s manifests).
           | 
           | Or maybe even something like https://uncloud.run/
           | 
           | Or if you still want real certified Kubernetes, but small,
           | there is https://k3s.io/
        
             | mdaniel wrote:
             | Ah, so that explains it: https://github.com/skateco/skate#:
             | ~:text=leverages%20podman%...
        
         | NathanFlurry wrote:
         | We're still missing a handful of these features, but this is
         | the end goal with what we're building over at Rivet:
         | https://github.com/rivet-gg/rivet
         | 
         | This whole thing started scratching my own itch of wanting an
         | orchestrator that I can confidently stand up, delpoy to, then
         | forget about.
        
           | coderatlarge wrote:
           | where is that in the design space relative to where goog
           | internal cluster management has converged to after the many
           | years and the tens of thousands of engineers who have sanded
           | it down under heavy fire since the original borg?
        
           | mdaniel wrote:
           | I recognize that I'm biased, but you'll want to strongly
           | consider whether https://rivet.gg/docs/config is getting your
           | audience where they can be successful, as compared to (e.g.)
           | https://kubernetes.io/docs/reference/generated/kubernetes-
           | ap...
        
           | stackskipton wrote:
           | Ops type here, after looking at Rivet, I've started doing The
           | Office "Dear god no, PLEASE NO"
           | 
           | Most people are looking for Container Management runtime with
           | HTTP(S) frontend that will handle automatic certificate from
           | Let's Encrypt.
           | 
           | I don't want Functions/Actors or require this massive suite:
           | 
           | FoundationDB: Actor state
           | 
           | CockroachDB: OLTP
           | 
           | ClickHouse: Developer-facing monitoring
           | 
           | Valkey: Caching
           | 
           | NATS: Pub/sub
           | 
           | Traefik: Load balancers & tunnels
           | 
           | This is just switching Kubernetes cloud lock in with KEDA and
           | some other more esoteric operators to Rivet Cloud lock in. At
           | least Kubernetes is slightly more portable than this.
           | 
           | Oh yea, I don't know what Clickhouse is doing with monitoring
           | but Prometheus/Grafana suite called, said they would love for
           | you to come home.
        
       | mountainriver wrote:
       | We have started working on a sort of Kubernetes 2.0 with
       | https://github.com/agentsea/nebulous -- still pre-alpha
       | 
       | Things we are aiming to improve:
       | 
       | * Globally distributed * Lightweight, can easily run as a single
       | binary on your laptop while still scaling to thousands of nodes
       | in the cloud. * Tailnet as the default network stack * Bittorrent
       | as the default storage stack * Multi-tenant from the ground up *
       | Live migration as a first class citizen
       | 
       | Most of these needs were born out of building modern machine
       | learning products, and the subsequent GPU scarcity. With ML
       | taking over the world though this may be the norm soon.
        
         | hhh wrote:
         | Wow... Cool stuff, the live migration is very interesting. We
         | do autoscaling across clusters across clouds right now based on
         | pricing, but actual live migration is a different beast
        
         | znpy wrote:
         | > * Globally distributed
         | 
         | Non-requirement?
         | 
         | > * Tailnet as the default network stack
         | 
         | That would probably be the first thing I look to rip out if I
         | ever was to use that.
         | 
         | Kubernetes assuming the underlying host only has a single NIC
         | has been a plague for the industry, setting it back ~20 years
         | and penalizing everyone that's not running on the cloud. Thank
         | god there are multiple CNI implementation.
         | 
         | Only recently with Multus
         | (https://www.redhat.com/en/blog/demystifying-multus) some sense
         | seem to be coming back into that part of the infrastructure.
         | 
         | > * Multi-tenant from the ground up
         | 
         | How would this be any different from kubernetes?
         | 
         | > * Bittorrent as the default storage stack
         | 
         | Might be interesting, unless you also mean seeding public
         | container images. Egress traffic is crazy expensive.
        
           | nine_k wrote:
           | > _Non-requirement_
           | 
           | > _the first thing I look to rip out_
           | 
           | This only shows how varied the requirements are across the
           | industry. One size does not fit all, hence multiple
           | materially different solutions spring up. This is only good.
        
             | znpy wrote:
             | > One size does not fit all, hence multiple materially
             | different solutions spring up.
             | 
             | Sooo... like what kubernetes does today?
        
           | mountainriver wrote:
           | >> * Globally distributed >Non-requirement?
           | 
           | It is a requirement because you can't find GPUs in a single
           | region reliably and Kubernetes doesn't run on multiple
           | regions.
           | 
           | >> * Tailnet as the default network stack
           | 
           | > That would probably be the first thing I look to rip out if
           | I ever was to use that.
           | 
           | This is fair, we find it very useful because it easily scales
           | cross clouds and even bridges them locally. It was the
           | simplest solution we could implement to get those properties,
           | but in no way would we need to be married to it.
           | 
           | >> * Multi-tenant from the ground up
           | 
           | > How would this be any different from kubernetes?
           | 
           | Kuberentes is deeply not multi-tenant, anyone who has tried
           | to make a multi-tenant solution over kube has dealt with
           | this. I've done it at multiple companies now, its a mess.
           | 
           | >> * Bittorrent as the default storage stack
           | 
           | > Might be interesting, unless you also mean seeding public
           | container images. Egress traffic is crazy expensive.
           | 
           | Yeah egress cost is a concern here, but its lazy so you don't
           | pay for it unless you need it. This seemed like the lightest
           | solution to sync data when you do live migrations cross
           | cloud. For instance, I need to move my dataset and ML model
           | to another cloud, or just replicate it there.
        
           | stackskipton wrote:
           | What is use case for multiple NICs outside bonding for
           | hardware failure?
           | 
           | Every time I've had multiple NICs on a server with different
           | IPs, I've regretted it.
        
             | mdaniel wrote:
             | I'd guess management access, or the old school way of doing
             | vLANs. Kubernetes offers Network Policies to solve the risk
             | of untrusted workloads in the cluster accessing both pods
             | and ports on pods that they shouldn't
             | https://kubernetes.io/docs/concepts/services-
             | networking/netw...
             | 
             | Network Policies are also defense in depth, since another
             | Pod would need to know its sibling Pod's name or IP to
             | reach it directly, the correct boundary for such things is
             | not to expose management toys in the workload's Service,
             | rather create a separate Service that just exposes those
             | management ports
             | 
             | Akin to:                 interface Awesome { String
             | getFavoriteColor(); }       interface Management { void
             | setFavoriteColor(String value); }       class MyPod
             | implements Awesome, Management {}
             | 
             | but then only make either Awesome, or Management, available
             | to the consumers of each behavior
        
             | znpy wrote:
             | A nic dedicated to SAN traffic, for example. People being
             | serious about networked storage don't run their storage
             | network i/o on the same nic where they serve traffic.
        
         | Thaxll wrote:
         | This is not Kubernetes, this a custom made solution to run GPU.
        
           | nine_k wrote:
           | Since it still can consume Kubernetes manifests, it's of
           | interest for k8s practitioners.
           | 
           | Since k8s manifests are a _language_ , there can be multiple
           | implementations of it, and multiple dialects will necessarily
           | spring up.
        
           | mountainriver wrote:
           | Which is the future of everything and Kuberentes does a very
           | bad job at
        
             | mdaniel wrote:
             | You stopped typing; what does Kubernetes do a bad job at
             | with relation to scheduling workloads that declare they
             | need at least 1 GPU resource but should be limited to no
             | more than 4 GPU resources on a given Node?
             | https://kubernetes.io/docs/tasks/manage-gpus/scheduling-
             | gpus...
        
         | mdaniel wrote:
         | heh, I think you didn't read the room given this directory
         | https://github.com/agentsea/nebulous/blob/v0.1.88/deploy/cha...
         | 
         | Also, ohgawd please never ever do this ever ohgawd
         | https://github.com/agentsea/nebulous/blob/v0.1.88/deploy/cha...
        
           | mountainriver wrote:
           | Why not? We can run on Kube and extend it to multi-region
           | when needed, or we can run on any VM as a single binary, or
           | just your laptop.
           | 
           | If you mean Helm, yeah I hate it but it is the most common
           | standard. Also not sure what you mean by the secret, that is
           | secure.
        
             | mdaniel wrote:
             | Secure from what, friend? It's a credential leak waiting to
             | happen, to say nothing of the need to now manage IAM Users
             | in AWS. That is the 2000s way of authenticating with AWS,
             | and reminds me of people who still use passwords for ssh.
             | Sure, it works fine, until some employee leaves and takes
             | the root password with them
        
       | Dedime wrote:
       | From someone who was recently tasked with "add service mesh" -
       | make service mesh obsolete. I don't want to install a service
       | mesh. mTLS or some other from of encryption between pods should
       | just happen automatically. I don't want some janky ass sidecar
       | being injected into my pod definition ala linkerd, and now I've
       | got people complaining that cilium's god mode is too permissive.
       | Just have something built-in, please.
        
         | mdaniel wrote:
         | For my curiosity, what threat model is mTLS and encryption
         | between pods driving down? Do you run untrusted workloads in
         | your cluster and you're afraid they're going to exfil your ...
         | I dunno, SQL login to the in-cluster Postgres?
         | 
         | As someone who has the same experience you described with janky
         | sidecars blowing up normal workloads, I'm violently anti
         | service-mesh. But, cert expiry and subjectAltName management is
         | already hard enough, and you would want that to happen for
         | _every pod_? To say nothing of the TLS handshake for every
         | connection?
        
         | ahmedtd wrote:
         | Various pieces support pieces for pod to pod mTLS are slowly
         | being brought into the main Kubernetes project.
         | 
         | Take a look at https://github.com/kubernetes/enhancements/tree/
         | master/keps/..., which is hopefully landing as alpha in
         | Kubernetes 1.34. It lets you run a controller that issues
         | certificates, and the certificates get automatically plumbed
         | down into pod filesystems, and refresh is handled
         | automatically.
         | 
         | Together with ClusterTrustBundles (KEP 3257), these are all the
         | pieces that are needed for someone to put together a controller
         | that distributes certificates and trust anchors to every pod in
         | the cluster.
        
       | benced wrote:
       | I found Kubernetes insanely intuitive coming from the frontend
       | world. I was used to writing code that took in data and made the
       | UI react to that - now I write code that the control panel uses
       | reconciles resources with config.
        
       | znpy wrote:
       | I'd like to add my points of view:
       | 
       | 1. Helm: make it official, ditch the text templating. The helm
       | workflow is okay, but templating text is cumbersome and error-
       | prone. What we should be doing instead is patching objects. I
       | don't know how, but I should be setting fields, not making sure
       | my values contain text that are correctly indented (how many
       | spaces? 8? 12? 16?)
       | 
       | 2. Can we get a rootless kubernetes already, as a first-class
       | citizen? This opens a whole world of possibilities. I'd love to
       | have a physical machine at home where I'm dedicating only an
       | unprivileged user to it. It would have limitations, but I'd be
       | okay with it. Maybe some setuid-binaries could be used to handle
       | some limited privileged things.
        
       | d4mi3n wrote:
       | I agree with the author that YAML as a configuration format
       | leaves room for error, but please, for the love of whatever god
       | or ideals you hold dear, do not adopt HCL as the configuration
       | language of choice for k8s.
       | 
       | While I agree type safety in HCL beats that of YAML (a low bar),
       | it still leaves a LOT to be desired. If you're going to go
       | through the trouble of considering a different configuration
       | language anyway, let's do ourselves a favor and consider things
       | like CUE[1] or Starlark[2] that offer either better type safety
       | or much richer methods of composition.
       | 
       | 1. https://cuelang.org/docs/introduction/#philosophy-and-
       | princi...
       | 
       | 2. https://github.com/bazelbuild/starlark?tab=readme-ov-
       | file#de...
        
         | mdaniel wrote:
         | I repeatedly see this "yaml isn't typesafe" claim but have no
         | idea where it's coming from since all the Kubernetes APIs are
         | OpenAPI, and thus JSON Schema, and since YAML is a superset of
         | JSON it is necessarily typesafe
         | 
         | Every JSON Schema aware tool in the universe will instantly
         | know this PodSpec is wrong:                 kind: 123
         | metadata: [ {you: wish} ]
         | 
         | I think what is very likely happening is that folks are --
         | rightfully! -- angry about using a _text_ templating language
         | to try and produce structured files. If they picked jinja2 they
         | 'd have the same problem -- it does not consider _any_ textual
         | output as  "invalid" so jinja2 thinks this is a-ok
         | jinja2.Template("kind: {{ youbet }}").render(youbet=True)
         | 
         | I am aware that helm does *YAML* sanity checking, so one cannot
         | just emit whatever crazypants yaml they wish, but it does not
         | then go one step further to say "uh, your json schema is fubar
         | friend"
        
       | fragmede wrote:
       | Instead of yaml, json, or HCL, how about starlark? It's a
       | stripped down Python, used in production by bazel, so it's
       | already got the go libraries.
        
         | fjasdfwa wrote:
         | kube-apiserver uses a JSON REST API. You can use whatever
         | serializes to JSON. YAML is the most common and already works
         | directly with kubectl.
         | 
         | I personally use TypeScript since it has unions and structural
         | typing with native JSON support but really anything can work.
        
           | mdaniel wrote:
           | Fun fact, while digging into the sibling comment's complaint
           | about the OpenAPI spec, I learned that it actually advertises
           | multiple content-types:                 application/json
           | application/json;stream=watch
           | application/vnd.kubernetes.protobuf
           | application/vnd.kubernetes.protobuf;stream=watch
           | application/yaml
           | 
           | which I _presume_ all get coerced into protobuf before being
           | actually interpreted
        
         | mdaniel wrote:
         | As the sibling comment points out, I think that would be a
         | perfectly fine _helm_ replacement, but I would never ever want
         | to feed starlark into k8s apis directly
        
       | NathanFlurry wrote:
       | The #1 problem with Kubernetes is it's not something that "Just
       | Works." There's a very small subset of engineers who can stand up
       | services on Kubernetes without having it fall over in production
       | - not to mention actually running & maintaining a Kubernetes
       | cluster on your own VMs.
       | 
       | In response, there's been a wave of "serverless" startups because
       | the idea of running anything yourself has become understood as
       | (a) a time sink, (b) incredibly error prone, and (c) very likely
       | to fail in production.
       | 
       | I think a Kubernetes 2.0 should consider what it would look like
       | to have a deployment platform that engineers can easily adopt and
       | feel confident running themselves - while still maintaining
       | itself as a small-ish core orchestrator with strong primitives.
       | 
       | I've been spending a lot of time building Rivet to itch my own
       | itch of an orchestrator & deployment platform that I can self-
       | host and scale trivially: https://github.com/rivet-gg/rivet
       | 
       | We currently advertise as the "open-source serverless platform,"
       | but I often think of the problem as "what does Kubernetes 2.0
       | look like." People are already adopting it to push the limits
       | into things that Kubernetes would traditionally be good at. We've
       | found the biggest strong point is that you're able to build
       | roughly the equivalent of a Kubernetes controller trivially. This
       | unlocks features more complex workload orchestration (game
       | servers, per-tenant deploys), multitenancy (vibe coding per-
       | tenant backends, LLM code interpreters), metered billing per-
       | tenant, more powerful operators, etc.
        
         | stuff4ben wrote:
         | I really dislike this take and I see it all the time. Also I'm
         | old and I'm jaded, so it is what it is...
         | 
         | Someone decides X technology is too heavy-weight and wants to
         | just run things simply on their laptop because "I don't need
         | all that cruft". They spend time and resources inventing
         | technology Y to suit their needs. Technology Y gets popular and
         | people add to it so it can scale, because no one runs shit in
         | production off their laptops. Someone else comes along and
         | says, "damn, technology Y is too heavyweight, I don't need all
         | this cruft..."
         | 
         | "There are neither beginnings nor endings to the Wheel of Time.
         | But it was a beginning."
        
           | adrianmsmith wrote:
           | It's also possible for things to just be too complex.
           | 
           | Just because something's complex doesn't necessarily mean it
           | has to be that complex.
        
             | mdaniel wrote:
             | IMHO, the rest of that sentence is "be too complex for some
             | metric within some audience"
             | 
             | I can assure you that trying to reproduce kubernetes with a
             | shitload of shell scripts, autoscaling groups, cloudwatch
             | metrics, and hopes-and-prayers is too complex for my metric
             | within the audience of people who know Kubernetes
        
             | wongarsu wrote:
             | Or too generic. A lot of the complexity if from trying to
             | support all use cases. For each new feature there is a
             | clear case of "we have X happy users, and Y people who
             | would start using it if we just added Z". But repeat that
             | often enough and the whole things becomes so complex and
             | abstract that you lose those happy users.
             | 
             | The tools I've most enjoyed (including deployment tools)
             | are those with a clear target group and vision, along with
             | leadership that rejects anything that falls too far outside
             | of it. Yes, it usually doesn't have _all_ the features I
             | want, but it also doesn 't have a myriad of features I
             | don't need
        
             | supportengineer wrote:
             | Because of promo-driven, resume-driven culture, engineers
             | are constantly creating complexity. No one EVER got a
             | promotion for creating LESS.
        
           | NathanFlurry wrote:
           | I hope this isn't the case here with Rivet. I genuinely
           | believe that Kubernetes does a good job for what's on the tin
           | (i.e. container orchestration at scale), but there's an
           | evolution that needs to happen.
           | 
           | If you'll entertain my argument for a second:
           | 
           | The job of someone designing systems like this is to decide
           | what are the correct primitives and invest in building a
           | simple + flexible platform around those.
           | 
           | The original cloud primitives were VMs, block devices, LBs,
           | and VPCs.
           | 
           | Kubernetes became popular because it standardized primitives
           | (pods, PVCs, services, RBAC) that containerized applications
           | needed.
           | 
           | Rivet's taking a different approach of investing in different
           | three primitives based on how most organizations deploy their
           | applications today:
           | 
           | - Stateless Functions (a la Fluid Compute)
           | 
           | - Stateful Workers (a la Cloudflare Durable Objects)
           | 
           | - Containers (a la Fly.io)
           | 
           | I fully expect to raise a few hackles claiming these are the
           | "new primitives" for modern applications, but our experience
           | shows it's solving real problems for real applications today.
           | 
           | Edit: Clarified "original _cloud_ primitives "
        
           | RattlesnakeJake wrote:
           | See also: JavaScript frameworks
        
         | themgt wrote:
         | The problem Kubernetes solves is "how do I deploy this" ... so
         | I go to Rivet (which does look cool) docs, and the options are:
         | 
         | * single container
         | 
         | * docker compose
         | 
         | * manual deployment (with docker run commands)
         | 
         | But erm, realistically how is this a viable way to deploy a
         | "serverless infrastructure platform" at any real scale?
         | 
         | My gut response would be ... how can I deploy Rivet on
         | Kubernetes, either in containers or something like kube-virt to
         | run this serverless platform across a bunch of physical/virtual
         | machines? How is docker compose a better more reliable/scalable
         | alternative to Kubernetes? So alternately then you sell a cloud
         | service, but ... that's not a Kubernetes 2.0. If I was going to
         | self-host Rivet I'd convert your docs so I could run it on
         | Kubernetes.
        
           | NathanFlurry wrote:
           | Our self-hosting docs are very rough right now - I'm fully
           | aware of the irony given my comment. It's on our roadmap to
           | get them up to snuff within the next few weeks.
           | 
           | If you're curious on the details, we've put a lot of work to
           | make sure that there's as few moving parts as possible:
           | 
           | We have our own cloud VM-level autoscaler that's integrated
           | with the core Rivet platofrm - no k8s or other orchestrators
           | in between. You can see the meat of it here:
           | https://github.com/rivet-
           | gg/rivet/blob/335088d0e7b38be5d029d...
           | 
           | For example, Rivet has an API to dynamically spin up a
           | cluster on demand: https://github.com/rivet-
           | gg/rivet/blob/335088d0e7b38be5d029d...
           | 
           | Once you start the Rivet "seed" process with your API key,
           | everything from there is automatic.
           | 
           | Therefore, self-hosted deployments usually look like one of:
           | 
           | - Plugging in your cloud API token in to Rivet for
           | autoscaling (recommended)
           | 
           | - Fixed # of servers (hobbyist deployments that were manually
           | set up, simple Terraform deployments, or bare metal)
           | 
           | - Running within Kubernetes (usually because it depends on
           | existing services)
        
         | hosh wrote:
         | It's been my experience that nothing in infra and ops will ever
         | "just work". Even something like Heroku will run into scaling
         | issues, and how much you are willing to pay for it.
         | 
         | If people's concerns is that they want a deployment platform
         | that can be easily adopted and used, it's better to understand
         | Kubernetes as the primitives on which the PaaS that people want
         | can be built on top of it.
         | 
         | Having said all that, Rivet looks interesting. I recognize some
         | of the ideas from the BEAM ecosystem. Some of the appeal to me
         | has less to do with deploying at scale, and more to do with
         | resiliency and local-first.
        
       | nikisweeting wrote:
       | It should natively support running docker-compose.yml configs,
       | essentially treating them like swarm configurations and
       | "automagically" deploying them with sane defaults for storage and
       | network. Right now the gap between compose and full-blown k8s is
       | too big.
        
         | mdaniel wrote:
         | So, what I'm hearing is that it should tie itself to a
         | commercial company, who now have a private equity master to
         | answer to, versus an open source technology run by a foundation
         | 
         | Besides, easily half of this thread is whining about helm for
         | which docker-compose has _no_ answer whatsoever. There is no
         | $(docker compose run oci: //example.com/awesome --version 1.2.3
         | --set-string root-user=admin)
        
         | ChocolateGod wrote:
         | > Right now the gap between compose and full-blown k8s is too
         | big.
         | 
         | It's Hashicorp so you have to be weary, but Nomad fills this
         | niche
         | 
         | https://developer.hashicorp.com/nomad
        
       | woile wrote:
       | What bothers me:
       | 
       | - it requires too much RAM to run in small machines (1GB RAM). I
       | want to start small but not have to worry about scalability.
       | docker swarm was nice in this regard.
       | 
       | - use KCL lang or CUE lang to manage templates
        
       | otterley wrote:
       | First, K8S doesn't force anyone to use YAML. It might be
       | idiomatic, but it's certainly not required. `kubectl apply` has
       | supported JSON since the beginning, IIRC. The endpoints
       | themselves speak JSON and grpc. And you can produce JSON or YAML
       | from whatever language you prefer. Jsonnet is quite nice, for
       | example.
       | 
       | Second, I'm curious as to why dependencies are a thing in Helm
       | charts and why dependency ordering is being advocated, as though
       | we're still living in a world of dependency ordering and service-
       | start blocking on Linux or Windows. One of the primary idioms in
       | Kubernetes is looping: if the dependency's not available, your
       | app is supposed to treat that is a recoverable error and try
       | again until the dependency becomes available. Or, crash, in which
       | case, the ReplicaSet controller will restart the app for you.
       | 
       | You can't have dependency conflicts in charts if you don't have
       | dependencies (cue "think about it" meme here), and you install
       | each chart separately. Helm does let you install multiple
       | versions of a chart if you must, but woe be unto those who do
       | that in a single namespace.
       | 
       | If an app _truly_ depends on another app, one option is to
       | include the dependency in the same Helm chart! Helm charts have
       | always allowed you to have multiple application and service
       | resources.
        
         | Arrowmaster wrote:
         | You say supposed to. That's great when building your own
         | software stack in house but how much software is available that
         | can run on kubenetes but was created before it existed. But
         | somebody figured out it could run in docker and then later
         | someone realized it's not that hard to make it run in kubenetes
         | because it already runs in docker.
         | 
         | You can make an opinionated platform that does things how you
         | think is the best way to do them, and people will do it how
         | they want anyway with bad results. Or you can add the features
         | to make it work multiple ways and let people choose how to use
         | it.
        
         | delusional wrote:
         | > One of the primary idioms in Kubernetes is looping
         | 
         | Indeed, working with kubernetes I would argue that the primary
         | architectural feature of kubernetes is the "reconciliation
         | loop". Observe the current state, diff a desired state, apply
         | the diff. Over and over again. There is no "fail" or "success"
         | state, only what we can observe and what we wish to observe.
         | Any difference between the two is iterated away.
         | 
         | I think it's interesting that the dominant "good enough
         | technology" of mechanical control, the PID feedback loop, is
         | quite analogous to this core component of kubernetes.
        
           | tguvot wrote:
           | i developed a system like this (with reconciliation loop, as
           | you call it) some years ago. there is most definitely failed
           | state (for multiple reasons). but as part of "loop" you can
           | have logic to fix it up in order to bring it to desired
           | state.
           | 
           | we had integrated monitoring/log analysis to correlate
           | failures with "things that happen"
        
       | cyberax wrote:
       | I would love:
       | 
       | 1. Instead of recreating the "gooey internal network" anti-
       | pattern with CNI, provide strong zero-trust authentication for
       | service-to-service calls.
       | 
       | 2. Integrate with public networks. With IPv6, there's no _need_
       | for an overlay network.
       | 
       | 3. Interoperability between several K8s clusters. I want to run a
       | local k3s controller on my machine to develop a service, but this
       | service still needs to call a production endpoint for a dependent
       | service.
        
         | mdaniel wrote:
         | To the best of my knowledge, nothing is stopping you from doing
         | any of those things right now. Including, ironically,
         | authentication for pod-to-pod calls, since that's how Service
         | Accounts work today. That even crosses the Kubernetes API
         | boundary thanks to IRSA and, if one were really advanced, any
         | OIDC compliant provider that would trust the OIDC issuer in
         | Kubernetes. The eks-anywhere distribution even shows how to
         | pull off this stunt _from your workstation_ via publishing the
         | JWKS to S3 or some other publicly resolvable https endpoint
         | 
         | I am not aware of any reason why you couldn't connect directly
         | to any Pod, which necessarily includes the kube-apiserver's
         | Pod, from your workstation except for your own company's
         | networking policies
        
       | solatic wrote:
       | I don't get the etcd hate. You can run single-node etcd in simple
       | setups. You can't easily replace it because so much of the
       | Kubernetes API is a thin wrapper around etcd APIs like watch that
       | are quite essential to writing controllers and don't map cleanly
       | to most other databases, certainly not sqlite or frictionless
       | hosted databases like DynamoDB.
       | 
       | What actually makes Kubernetes hard to set up by yourself are a)
       | CNIs, in particular if you both intend to avoid cloud-provider
       | specific CNIs, support all networking (and security!) features,
       | and still have high performance; b) all the cluster PKI with all
       | the certificates for all the different components, which
       | Kubernetes made an absolute requirement because, well,
       | prpduction-grade security.
       | 
       | So if you think you're going to make an "easier" Kubernetes, I
       | mean, you're avoiding all the lessons learned and why we got here
       | in the first place. CNI is hardly the naive approach to the
       | problem.
       | 
       | Complaining about YAML and Helm are dumb. Kubernetes doesn't
       | force you to use either. The API server anyway expects JSON at
       | the end. Use whatever you like.
        
         | mdaniel wrote:
         | > I don't get the etcd hate.
         | 
         | I'm going out on a limb to say you've only ever used hosted
         | Kubernetes, then. A sibling comment mentioned their need for
         | vanity tooling to babysit etcd and my experience was similar.
         | 
         | If you are running single node etcd, that would also explain
         | why you don't get it: you've been very, very, very, very lucky
         | never to have that one node fail, and you've never had to
         | resolve the very real problem of ending up with _just two_ etcd
         | nodes running
        
       | mootoday wrote:
       | Why containers when you can have Wasm components on wasmCloud
       | :-)?!
       | 
       | https://wasmcloud.com/
        
       | 0xbadcafebee wrote:
       | > Ditch YAML for HCL
       | 
       |  _Hard_ pass. One of the big downsides to a DSL is it 's
       | linguistic rather than programmatic. It depends on a human to
       | learn a language and figuring out how to apply it correctly.
       | 
       | I have written a metric shit-ton of terraform in HCL. Yet even I
       | struggle to contort my brain into the shape it needs to think of
       | _how the fuck_ I can get Terraform to do what I want with its
       | limiting logic and data structures. I have become almost
       | completely reliant on saved snippet examples, Stackoverflow, and
       | now ChatGPT, just to figure out how to deploy the right resources
       | with DRY configuration in a multi-dimensional datastructure.
       | 
       | YAML isn't a configuration format (it's a data encoding format)
       | but it does a decent job at _not being a DSL_ , which makes
       | things way easier. Rather than learn a language, you simply fill
       | out a data structure with attributes. Any human can easily follow
       | documentation to do that without learning a language, and any
       | program can generate or parse it easily. (Now, the specific
       | configuration schema of K8s does suck balls, but that's not
       | YAML's fault)
       | 
       | > I still remember not believing what I was seeing the first time
       | I saw the Norway Problem
       | 
       | It's not a "Norway Problem". It's a PEBKAC problem. The "problem"
       | is literally that the user did not read the YAML spec, so they
       | did not know what they were doing, then did the wrong thing, and
       | blamed YAML. It's wandering into the forest at night, tripping
       | over a stump, and then blaming the stump. Read the docs. YAML is
       | not crazy, it's a pretty simple data format.
       | 
       | > Helm is a perfect example of a temporary hack that has grown to
       | be a permanent dependency
       | 
       | Nobody's permanently dependent on Helm. Plenty of huge-ass
       | companies don't use it at all. This is where you proved you
       | really don't know what you're talking about. (besides the fact
       | that helm is a _joy_ to use compared to straight YAML or HCL)
        
       | rcarmo wrote:
       | One word: Simpler.
        
       | aranw wrote:
       | YAML and Helm are my two biggest pain points with k8s and I would
       | love to see them replaced with something else. CUE for YAML would
       | be really nice. As for replacing Helm, I'm not too sure really.
       | Perhaps with YAML being replaced by CUE maybe something more
       | powerful and easy to understand could evolve from using CUE?
        
       | fideloper wrote:
       | "Low maintenance", welp.
       | 
       | I suppose that's true in one sense - in that I'm using EKS
       | heavily, and don't maintain cluster health myself (other than all
       | the creative ways I find to fuck up a node). And perhaps in
       | another sense: It'll try its hardest to run some containers so
       | matter how many times I make it OOMkill itself.
       | 
       | Buttttttttt Kubernetes is almost pure maintenance in reality.
       | Don't get me wrong, it's amazing to just submit some yaml and get
       | my software out into the world. But the trade off is pure
       | maintenance.
       | 
       | The workflows to setup a cluster, decide which chicken-egg trade-
       | off you want to get ArgoCD running, register other clusters if
       | you're doing a hub-and-spoke model ... is just, like, one single
       | act in the circus.
       | 
       | Then there's installing all the operators of choice from
       | https://landscape.cncf.io/. I mean that page is a meme, but how
       | many of us run k8s clusters without at least 30 pods running
       | "ancillary" tooling? (Is "ancillary" the right word? It's stuff
       | we need, but it's not our primary workloads).
       | 
       | A repeat circus is spending hours figuring out just the right
       | values.yaml (or, more likely, hours templating it, since we're
       | ArgoCD'ing it all, right?)
       | 
       | > As an side, I once spent HORUS figuring out to (incorrectly)
       | pass boolean values around from a Secrets Manager Secret, to a
       | k8s secret - via External Secrets, another operator! - to an
       | ArgoCD ApplicationSet definition, to another values.yaml file.
       | 
       | And then you have to operationalize updating your clusters - and
       | all the operators you installed/painstakingly configured. Given
       | the pace of releases, this is literally, pure maintenance that is
       | always present.
       | 
       | Finally, if you're autoscaling (Karpenter in our case), there's a
       | whole other act in the circus (wait, am I still using that
       | analogy?) of replacing your nodes "often" without downtime, which
       | gets fun in a myriad of interesting ways (running apps with state
       | is fun in kubernetes!)
       | 
       | So anyway, there's my rant. Low fucking maintenance!
        
         | ljm wrote:
         | I've been running k3s on hetzner for over 2 years now with 100%
         | uptime.
         | 
         | In fact, it was so low maintenance that I lost my SSH key for
         | the master node and I had to reprovision the entire cluster.
         | Took about 90 mins including the time spent updating my docs.
         | If it was critical I could have got that down to 15 mins tops.
         | 
         | 20EUR/mo for a k8s cluster using k3s, exclusively on ARM, 3
         | nodes 1 master, some storage, and a load balancer with
         | automatic dns on cloudflare.
        
           | Bombthecat wrote:
           | Yeah, as soon as you got your helm charts and node
           | installers.
           | 
           | Installing is super fast.
           | 
           | We don't do backup of the cluster for example for that reason
           | ( except databases etc) we just reprovision the whole
           | cluster.
        
           | verst wrote:
           | How often do you perform version upgrades? Patching of the
           | operation system of the nodes or control plane etc? Things
           | quickly get complex if application uptime / availability is
           | critical.
        
         | aljgz wrote:
         | "Low Maintenance" is relative to alternatives. In my
         | experience, any time I was dealing with K8s I needed much lower
         | maintenance to get the same quality of service (everything from
         | [auto]scaling, to faileover, deployment, rollback, disaster
         | recovery, DevOps, ease of spinning up a completely independent
         | cluster) compared to not using it. YMMV.
        
         | turtlebits wrote:
         | Sounds self inflicted. Stop installing so much shit. Everything
         | you add is just tech debt and has a cost associated, even if
         | the product is free.
         | 
         | If autoscaling doesn't save more $$ than the tech
         | debt/maintenance burden, turn it off.
        
           | ozim wrote:
           | I agree with your take.
           | 
           | But I think a lot of people are in state where they need to
           | run stuff the way it is because "just turn it off" won't
           | work.
           | 
           | Like system after years on k8s coupled to its quirks. People
           | not knowing how to setup and run stuff without k8s.
        
         | pnathan wrote:
         | vis-a-vis running a roughly equivalent set of services cobbled
         | together, its wildly low maintenance to the point of fire and
         | forget.
         | 
         | you do have to know what you're doing and not fall prey to the
         | "install the cool widget" trap.
        
       | hosh wrote:
       | While we're speculating:
       | 
       | I disagree that YAML is so bad. I don't particularly like HCL.
       | The tooling I use don't care though -- as long as I can stil
       | specify things in JSON, then I can generate (not template) what I
       | need. It would be more difficult to generate HCL.
       | 
       | I'm not a fan of Helm, but it is the de facto package manager.
       | The main reason I don't like Helm has more to do with its
       | templating system. Templated YAML is very limiting, when compared
       | to using a full-fledged language platform to generate a
       | datastructure that can be converted to JSON. There are some
       | interesting things you can do with that. (cdk8s is like this, but
       | it is not a good example of what you can do with a generator).
       | 
       | On the other hand, if HCL allows us to use modules, scoping, and
       | composition, then maybe it is not so bad after all.
        
       | mikeocool wrote:
       | How about release 2.0 and then don't release 2.1 for a LONG time.
       | 
       | I get that in the early days such a fast paced release/EOL
       | schedule made sense. But now something that operates at such a
       | low level shouldn't require non-security upgrades every 3 months
       | and have breaking API changes at least once a year.
        
       ___________________________________________________________________
       (page generated 2025-06-19 23:00 UTC)