[HN Gopher] So you wanna write Kubernetes controllers?
       ___________________________________________________________________
        
       So you wanna write Kubernetes controllers?
        
       Author : gokhan
       Score  : 77 points
       Date   : 2025-01-22 22:33 UTC (4 days ago)
        
 (HTM) web link (ahmet.im)
 (TXT) w3m dump (ahmet.im)
        
       | Vampiero wrote:
       | Why do devops keep piling abstractions on top of abstractions?
       | 
       | There's the machine. Then the VM. Then the container. Then the
       | orchestrator. Then the controller. And it's all so complex that
       | you need even more tools to generate the configuration files for
       | the former tools.
       | 
       | I don't want to write a Kubernetes controller. I don't even know
       | why it should exist.
        
         | GiorgioG wrote:
         | I don't want Kubernetes period. Best decision we've made at
         | work is to migrate away from k8s and onto AWS ECS. I just want
         | to deploy containers! DevOps went from something you did when
         | standing up or deploying an application, to an industry-wide
         | jobs program. It's the TSA of the software world.
        
           | mugsie wrote:
           | Thats great if that works for you, and for a lot people and
           | teams. You have just shifted the complexity of networking,
           | storage, firewalling, IP management, L7 proxying to AWS, but
           | hey, you do have click ops there.
           | 
           | > DevOps went from something you did when standing up or
           | deploying an application, to an industry-wide jobs program.
           | It's the TSA of the software world.
           | 
           | DevOps was never a job title, or process, it was a way of
           | working, that went beyond yeeting to prod, and ignoring it.
           | 
           | From that one line, you never did devops - you did dev, with
           | some deployment tools (that someone else wrote?)
        
             | ninjha wrote:
             | You can have Click-Ops on Kubernetes too! Everything has a
             | schema so it's possible to build a nice UI on top of it
             | (with some effort).
             | 
             | My current project is basically this, except it edits your
             | git-ops config repository, so you can click-ops while you
             | git-ops.
        
               | k8sToGo wrote:
               | You mean ArgoCD and Rancher? Both ready to do click ops!
        
               | ninjha wrote:
               | I mean you can edit a big YAML file inside ArgoCD, but
               | what I'm building is an actual web form (e.x.
               | `spec.rules[].http.paths[].pathType` is a dropdown of
               | `Prefix`, `ImplementationSpecific`, `Exact`), and all
               | your documentation inline as you're editing.
               | 
               | People have tried this before but usually the UI version
               | is not fully complete so you have to drop to YAML. Now
               | that the spec is good enough it's possible to build a
               | complete UI for this.
        
               | mugsie wrote:
               | Yup, and it has the advantage of having a easily backed
               | up state store to represent the actions of the GUI.
               | 
               | I always liked the octant UI autogeneration for CRDs and
               | the way it just parsed things correctly from the
               | beginning, if they had an edit mode that would be perfect
        
               | ninjha wrote:
               | Is there anything in particular you like about what
               | Octant does? I don't see anything that actually looks at
               | the object spec, just the status fields / etc.
        
               | k8sToGo wrote:
               | Sounds great. An interactive Spec builder, if I
               | understand correctly.
        
           | frazbin wrote:
           | If I may ask, just to educate myself
           | 
           | where do you keep the ECS service/task specs and how do you
           | mutate them across your stacks?
           | 
           | How long does it take to stand up/decomm a new instance of
           | your software stack?
           | 
           | How do you handle application lifecycle concerns like
           | database backup/restore, migrations/upgrades?
           | 
           | How have you supported developer stories like "I want to test
           | a commit against our infrastructure without interfering with
           | other development"?
           | 
           | I recognize these can all be solved for ECS but I'm curious
           | about the details and how it's going.
           | 
           | I have found Kubernetes most useful when maintaining lots of
           | isolated tenants within limited (cheap) infrastructure, esp
           | when velocity of software and deployments is high and has
           | many stakeholders (customer needs their demo!)
        
             | liveoneggs wrote:
             | https://docs.aws.amazon.com/AmazonECS/latest/developerguide
             | /...
             | 
             | https://docs.aws.amazon.com/AmazonECS/latest/developerguide
             | /...
             | 
             | https://docs.aws.amazon.com/AmazonECS/latest/developerguide
             | /...
             | 
             | etc
        
               | mugsie wrote:
               | Yeah, that doesn't really answer the question at all...
               | Do you just have a pile of cloudformation on your
               | desktop? point and click? tf? And then none of the actual
               | questions like
               | 
               | > How do you handle application lifecycle concerns like
               | database backup/restore, migrations/upgrades?
               | 
               | were even touched.
        
           | k8sToGo wrote:
           | It is always this holier than thou attitude of Software
           | engineers towards DevOps that is annoying. Especially if it
           | comes from ignorance.
           | 
           | These days often DevOps is done by former Software Engineers
           | rather than "old fashioned" Sys admins.
           | 
           | Just because you are ignorant on how to use AKS efficiently,
           | doesn't mean your alternative is better.
        
             | mugsie wrote:
             | Yeah, DevOps was a culture not a job title, and then we let
             | us software engineers in who just want to throw something
             | into prod and go home on friday night, so they decided it
             | was a task, and the lowest importance thing possible, but
             | simultaniously, the devops/sre/prod eng teams needed to be
             | perfect, because its prod.
             | 
             | it is a wierd dichotomy I have seem, and it is getting
             | worse. We let teams have access to argo manifiests, and
             | helm charts, and even let them do custom in repo charts.
             | 
             | not one team in the last year has actually gone and looked
             | at k8s docs to figure out how to do basic shit, they just
             | dump questions into channels, and soak up time from people
             | explaining the basics of the system their software runs on.
        
             | sgarland wrote:
             | > These days often DevOps is done by former Software
             | Engineers rather than "old fashioned" Sys admins.
             | 
             | Yes, and the world is a poorer place for it. Google's SRE
             | model works in part because they have _both_ Ops and SWE
             | backgrounds.
             | 
             | The thing about traditional Ops is, while it may not scale
             | to Google levels, it does scale quite well to the level
             | most companies need, _and_ along the way, it forces people
             | to learn how computers and systems work to a modicum of
             | depth. If you're having to ssh into a box to see why a
             | process is dying, you're going to learn something about
             | that process, systemd, etc. If you drag the dev along with
             | you to fix it, now two people have learned cross-areas.
             | 
             | If everything is in a container, and there's an
             | orchestrator silently replacing dying pods, that no longer
             | needs to exist.
             | 
             | To be clear, I _love_ K8s. I run it at home, and have used
             | it professionally at multiple jobs. What I don't like is
             | how it (and every other abstraction) have made it such that
             | "infra" people haven't the slightest clue how infra
             | actually operates, and if you sat them down in front of an
             | empty, physical server, they'd have no idea how to
             | bootstrap Linux on it.
        
           | blazing234 wrote:
           | Why don't you just deploy to cloud run on gcp and call it a
           | day
        
           | Spivak wrote:
           | I'm so confused about the jobs program thing. I'm an infra
           | engineer who has had the title devops for parts of my career.
           | I feel like I've always been _desperately_ needed by teams of
           | software devs that don 't want to concern themselves with the
           | gritty reality of actually running software in production.
           | The job kinda sucks but for some reason jives with my brain.
           | I take a huge amount of work and responsibility off the
           | plates of my devs and my work scales well to multiple teams
           | and multiple products.
           | 
           | I've never seen an infra/devops/platform team not swamped
           | with work and just spinning their tires on random unnecessary
           | projects. We're more expensive on average than devs, harder
           | to hire, and two degrees separated from revenue. We're not a
           | typically overstaffed role.
        
         | danielklnstn wrote:
         | CRDs and their controllers are perhaps _the_ reason Kubernetes
         | is as ubiquitous as it is today - the ability to extend
         | clusters effortlessly is amazing and opens up the door for so
         | many powerful capabilities.
         | 
         | > I don't want to write a Kubernetes controller. I don't even
         | know why it should exist.
         | 
         | You can take a look at Crossplane for a good example of the
         | capabilities that controllers allow for. They're usually
         | encapsulated in Kubernetes add-ons and plugins, so much as you
         | might never have to write an operating system driver yourself,
         | you might never have to write a Kubernetes controller yourself.
        
           | raffraffraff wrote:
           | One of the first really pleasant surprises I got while
           | learning was that the kubectl command itself was extended
           | (along with tab completion) by CRDs. So install external
           | secrets operator and you get tab complete on those resources
           | and actions.
        
         | mugsie wrote:
         | Yeah, for a lot of companies, this is way overkill. Thats fine,
         | don't use it! In the places I have seen use it when it is
         | actually needed, the controller makes a lot of work for teams
         | disappear. It exists, because thats how K8S itself works? - how
         | it translates from a deployment -> replica set -> pod ->
         | container.
         | 
         | Abstractions are useful to stop 100000s lines of boiler plate
         | code. Same reason we have terraform providers, Ansible modules,
         | and well, the same concepts in programming ...
        
         | stouset wrote:
         | Right now I'm typing on a glass screen that pretends to have a
         | keyboard on it that is running a web browser developed with a
         | UI toolkit in a programming language that compiles down to an
         | intermediate bytecode that's compiled to machine code that's
         | actually interpreted as microcode on the processor, half of it
         | is farmed out to accelerators and coprocessors of various
         | kinds, all assembled out of a gajillion transistors that neatly
         | hide the fact that we've somehow made it possible to make sand
         | think.
         | 
         | The number of layers of abstraction you're already relying on
         | just to post this comment is nigh uncountable. Abstraction is
         | literally the only way we've continued to make progress in any
         | technological endeavor.
        
           | petercooper wrote:
           | Then all of that data is turned into HTTP requests which turn
           | into TCP packets distributed over IP over wifi over Ethernet
           | over PPPoE over DSL and probably turned into light sent over
           | fiber optics at various stages... :-)
        
           | ok123456 wrote:
           | The problem isn't abstractions. The problem is leaky
           | abstractions that make it harder to reason about a system and
           | add lots of hidden states and configurations of that state.
           | 
           | What could have been a static binary running a system service
           | has become a Frankenstein mess of opaque nested environments
           | operated by action at a distance.
        
           | zug_zug wrote:
           | I think the point is that there are abstractions that require
           | you to know almost nothing (e.g. that my laptop has a SSD
           | with blocks that are constantly dying is abstracted to a
           | filesystem that looks like a basic tree structure).
           | 
           | Then there are abstractions that may actually _increase_
           | cognitive load  "What if instead of thinking about chairs, we
           | philosophically think about ALL standing furniture types,
           | stools, tables, etc. They may have 4 legs, 3, 6? What about a
           | car seats too?"
           | 
           | AFAICT writing a kubernetes controller is probably overkill
           | challenge-yourself level exercise (e.g. a quine in BF)
           | because odds are that any resource you've ever needed to
           | manage somebody else has built an automated way to do it
           | first.
           | 
           | Would love to hear other perspectives though if anybody has
           | great examples of when you really couldn't succeed without
           | writing your own kubernetes controller.
        
             | stouset wrote:
             | Those only require you to understand them because you're
             | working directly on top of them. If you were writing a
             | filesystem driver you would _absolutely_ need to know those
             | details. If you're writing a database backend, you probably
             | need to know a lot about the filesystem. If you're writing
             | an ORM, you need to know a lot about databases.
             | 
             | Some of these abstractions are leakier than others. Web
             | development coordinates a _lot_ of different technologies
             | so often times you need to know about a wide variety of
             | topics, and sometimes a layer below those. Part of it is
             | that there's a lot less specialization in our profession
             | than in others, so we need lots of generalists.
        
               | zug_zug wrote:
               | I think you're sort of hand-waving here.
               | 
               | I think the concrete question is -- do you need to learn
               | more or fewer abstractions to use kubernetes versus say
               | AWS?
               | 
               | And it looks like kubernetes is more abstractions in
               | exchange for more customization. I can understand why
               | somebody would roll their eyes at a system that has as
               | much abstraction as kuberenetes does if their use-case is
               | very concrete - they are scaling a web app based on
               | traffic.
        
           | zenethian wrote:
           | Seemingly endlessly layered abstraction is also why phones
           | and computers get faster and faster yet nothing seems to
           | actually run better. Nobody wants to write native software
           | anymore because there are too many variations of hardware and
           | operating systems but everyone wants their apps to run on
           | everything. Thus, we are stuck in abstraction hell.
           | 
           | I'd argue the exact opposite has happened. We have made very
           | little progress because everything is continually abstracted
           | out to the least common denominator, leaving accessibility
           | high but features low. Very few actual groundbreaking leaps
           | have been accomplished with all of this abstraction; we've
           | just made it easier to put dumb software on more devices.
        
             | stouset wrote:
             | I encourage you to actually work on a twenty year old piece
             | of technology. It's easy to forget that modern computers
             | are doing a _lot_ more. Sure, there's waste. But the
             | expectations from software these days are exponentially
             | greater than what we used to ship.
        
         | solatic wrote:
         | Current example from work: an extreme single-tenant
         | architecture, deployed for large N number of tenants, which
         | need both logically and physically isolation; the cost of the
         | cloud provider's managed databases is considered Too Expensive
         | to create one per tenant, so an open-source Kubernetes
         | controller for the database is used instead.
         | 
         | Not all systems are small-N modern multi-tenant architectures
         | deployed at small scale.
        
         | dijit wrote:
         | > Why do devops keep piling abstractions on top of
         | abstractions?
         | 
         | Mostly, because developers keep trying to replace sysadmins
         | with higher levels of abstraction. Then when they realise that
         | they require (some new word for) sysadmins still, they pile on
         | more abstractions again and claim they don't need them.
         | 
         | The abstraction du-jour is not Kubernetes at the moment, it's
         | FaaS. At some point managing those FaaS will require operators
         | again and another abstraction on top of FaaS will exist, some
         | kind of FaaS orchestrator, and the cycle will continue.
        
           | robertlagrant wrote:
           | I think it's clear that Kubernetes et al aren't trying to
           | replace sysadmins. They're trying to massively increase the
           | ratio of sysadmin:machine.
        
         | antonvs wrote:
         | If you're implementing a distributed system that needs to
         | manage many custom resources (of whatever kind, not Kubernetes-
         | specific), implementing a Kubernetes controller for it can save
         | a great deal of development time and give you a better system
         | in the end, with standard built-in observability,
         | manageability, deployment automation, and a whole lot else.
         | 
         | It's certainly true that some use of Kubernetes is overkill.
         | But if you actually need what it offers, it can be a game-
         | changer. That's a big reason why it caught on so fast in big
         | enterprises.
         | 
         | Don't fall into the trap of thinking that because you don't
         | understand the need for something, that the need doesn't exist.
        
       | clx75 wrote:
       | At work we are using Metacontroller to implement our "operators".
       | Quoted because these are not real operators but rather
       | Metacontroller plugins, written in Python. All the watch and
       | update logic - plus the resource caching - is outsourced to
       | Metacontroller (which is written in Go). We define - via its
       | CompositeController or DecoratorController CRDs - what kind of
       | resources it should watch and which web service it should call
       | into when it detects a change. The web service speaks plain HTTP
       | (or HTTPS if you want).
       | 
       | In case of a CompositeController, the web service gets the
       | created/updated/deleted parent resource and any already existing
       | child resources (initially none). The web service then analyzes
       | the parent and existing children, then responds with the list of
       | child resources whose existence and state Metacontroller should
       | ensure in the cluster. If something is left out from the response
       | compared to a previous response, it is deleted.
       | 
       | Things we implemented using this pattern:
       | 
       | - Project: declarative description of a company project, child
       | resources include a namespace, service account, IAM role,
       | SMB/S3/FSX PVs and PVCs generated for project volumes (defined
       | under spec.volumes in the Project CR), ingresses for a set of
       | standard apps
       | 
       | - Job: high-level description of a DAG of containers, the web
       | service works as a compiler which translates this high-level
       | description into an Argo Workflow (this will be the child)
       | 
       | - Container: defines a dev container, expands into a pod running
       | an sshd and a Contour HTTPProxy (TCP proxy) which forwards TLS-
       | wrapped SSH traffic to the sshd service
       | 
       | - KeycloakClient: here the web service is not pure - it talks to
       | the Keycloak Admin REST API and creates/updates a client in
       | Keycloak whose parameters are given by the CRD spec
       | 
       | So far this works pretty well and makes writing controllers a
       | breeze - at least compared to the standard kubebuilder approach.
       | 
       | https://metacontroller.github.io/metacontroller/intro.html
        
         | fsniper wrote:
         | At work we are using nolar/kopf for writing controllers that
         | provisions/manages our kubernetes clusters. This also includes
         | managing any infrastructure related apps that we deploy on
         | them.
         | 
         | We were using whitebox controller at the start, which is also
         | like metacontroller that runs your scripts on kubernetes
         | events. That was easy to write. However not having full control
         | on the lifecycle of the controller code gets in the way time to
         | time.
         | 
         | Considering you are also writing Python did you review kopf
         | before deciding on metacontroller?
        
         | ec109685 wrote:
         | Curious why using controller for these aspects versus
         | generating the K8s objects as part of your deployment pipeline
         | that you just apply? The latter gives you versioned artifacts
         | you can roll forward and back and independent deployment of
         | these supporting pieces with each app.
         | 
         | Is there runtime dynamism that you need the control loop to
         | handle beyond what the built-in primitives can handle?
        
       | neuroelectron wrote:
       | No not really
        
       ___________________________________________________________________
       (page generated 2025-01-26 23:00 UTC)