hngopher.com

       [HN Gopher] The hater's guide to Kubernetes
       ___________________________________________________________________
        
       The hater's guide to Kubernetes
        
       Author : paulgb
       Score  : 217 points
       Date   : 2024-03-03 16:44 UTC (6 hours ago)
        
 (HTM) web link (paulbutler.org)
 (TXT) w3m dump (paulbutler.org)
        
       | t3rabytes wrote:
       | My current company is split... maybe 75/25 (at this point)
       | between Kubernetes and a bespoke, Ansible-driven deployment
       | system that manually runs Docker containers on nodes in an AWS
       | ASG and will take care of deregistering/reregistering the nodes
       | with the ALB while the containers on a given node are getting
       | futzed with. The Ansible method works remarkably well for it's
       | age, but the big thing I use to convince teams to move to
       | Kubernetes is that we can take your peak deploy times from, say,
       | a couple hours down to a few minutes, and you can autoscale far
       | faster and more efficiently than you can with CPU-based scaling
       | on an ASG.
       | 
       | From service teams that have done the migrations, the things I
       | hear consistently though are:
       | 
       | - when a Helm deploy fails, finding the reason why is a PITA (we
       | run with --atomic so it'll rollback on a failed deploy. What
       | failed? Was it bad code causing a pod to crash loop? Failed k8s
       | resource create? who knows! have fun finding out!)
       | 
       | - they have to learn a whole new way of operating, particularly
       | around in-the-moment scaling. A team today can go into the AWS
       | Console at 4am during an incident and change the ASG scaling
       | targets, but to do that with a service running in Kubernetes
       | means making sure they have kubectl (and it's deps, for us that's
       | aws-cli) installed and configured, AND remembering the `kubectl
       | scale deployment X --replicas X` syntax.
       | 
       | [Both of those things are very much fixable]
        
         | dpflan wrote:
         | HPAs and VPAs are useful k8s concepts for your auto-scaling
         | needs.
        
           | t3rabytes wrote:
           | HPA is useful until your maxReplicas count is set too low and
           | you're already tapped out.
        
             | cogman10 wrote:
             | Sort of a learning thing though right? Like, if you find
             | maxReplicas is too low you move that number up until it
             | isn't right?
             | 
             | This is different from waking people up at 4am frequently
             | to bump up the number of replicas.
        
             | dpflan wrote:
             | You can edit your HPA live, in maybe as many commands or
             | keystrokes as manually scaling...until you commit the
             | change to your repo of configs.
        
         | makestuff wrote:
         | I haven't used kubernetes in a few years, but do they have a
         | good UI for operations? Your example of the AWS console where
         | you can just log in and scale something in the UI but for
         | kubernetes. We run something similar on AWS right now, during
         | an incident we log into the account with admin access to modify
         | something and then go back to configure that in the CDK post
         | incident.
        
           | t3rabytes wrote:
           | AWS has a UI for resources in the cluster but it relies on
           | the IAM role you're using in the console to have configured
           | perms in the cluster, and our AWS SSO setup prevents that
           | from working properly (this isn't usually the case for AWS
           | SSO users, it's a known quirk of our particular auth setup
           | between EKS and IAM -- we'll fix it sometime).
        
           | adhamsalama wrote:
           | https://k8slens.dev
        
         | cogman10 wrote:
         | For scaling, have you tried using either an HPA or keda?
         | 
         | We've had pretty good success with simple HPAs.
        
           | t3rabytes wrote:
           | Yep, I'd say >half of the teams with K8s services have
           | adopted KEDA, but we've got some HPA stragglers for sure.
        
             | dpflan wrote:
             | I have to say that when you have more buy in from delivery
             | teams and adoption of HPAs your system can become more
             | harmonious overall. Each team can monitor and tweak their
             | services, and many services are usually connected upstream
             | or downstream. When more components can ebb and flow
             | according to the compute context then the system overall
             | ebbs and flows better. #my2cents
        
         | freedomben wrote:
         | Personally, I don't like Helm. I think for the vast majority of
         | usecases where all you need is some simple
         | templating/substitution, it just introduces way more complexity
         | and abstraction than it is worth.
         | 
         | I've been really happy with just using `envsubst` and
         | environment variables to generate a manifest at deploy time.
         | It's easy with most CI systems to "archive" the manifest, and
         | it can then be easily read by a human or downloaded/applied
         | manually for debugging with. Deploys are also just `cat
         | k8s/${ENV}/deploy.yaml | envsubt > output.yaml && kubectl apply
         | -f output.yaml`
         | 
         | I've also experimented with using terraform. It's actually been
         | a good enough experience that I may go fully with terraform on
         | a new project and see how it goes.
        
           | linuxftw wrote:
           | You might like kubernetes kustomize if you don't care for
           | helm (IMO, just embrace helm, you can keep your charts very
           | simple and it's straight forward). Kustomize takes a little
           | getting used to, but it's a nice abstraction and widely used.
           | 
           | I cannot recommend terraform. I use it daily, and daily I
           | wish I did not. I think Pulumi is the future. Not as battle
           | tested, but terraform is a mountain of bugs anyway, so it
           | can't possibly be worse.
           | 
           | Just one example where terraform sucks: You cannot both
           | deploy a kubernetes cluster (say an EKS/AKS cluster) and then
           | use kubernetes_manifest provider in a single workspace. You
           | must do this across two separate terraform runs.
        
         | jonathaneunice wrote:
         | The problem with bespoke, homegrown, and DIY isn't that the
         | solutions are bad. Often, they are quite good--excellent, even,
         | within their particular contexts and constraints. And because
         | they're tailored and limited to your context, they can even be
         | quite a bit simpler.
         | 
         | The problem is that they're custom and homegrown. Your
         | organization alone invests in them, trains new staff in them,
         | is responsible for debugging and fixing when they break, has to
         | re-invest when they no longer do all the things you want. DIY
         | frameworks ultimately end up as byzantine and labyrinthine as
         | Kubernetes itself. The virtue of industry platforms like
         | Kubernetes is, however complex and only half-baked they start,
         | over time the entire industry trains on them, invests in them,
         | refines and improves them. They benefit from a long-term
         | economic virtuous cycle that DIY rarely if ever can. Even the
         | longest, strongest, best-funded holdouts for bespoke languages,
         | OSs, and frameworks--aerospace, finance, miltech--have largely
         | come 'round to COTS first and foremost.
        
       | api wrote:
       | I understand where most of the complexity in K8S comes from, but
       | it still horrifies and offends me and I hate it. But I don't
       | think it's Kubernetes' fault _directly_. I think the problem is
       | deeper in the foundation. It comes from the fact that we are
       | trying to build modern, distributed, high availability,
       | incrementally upgradeable, self-regulating systems on a
       | foundation of brittle clunky 1970s operating systems that are not
       | designed for any of that.
       | 
       | The whole thing is a bolt-on that has to spend a ton of time
       | working around the limitations of the foundation, and it shows.
       | 
       | Unfortunately there seems to be zero interest in fixing _that_
       | and so much sunk cost in existing Unix /Posix designs that it
       | seems like we are completely stuck with a basic foundation of
       | outdated brittleness.
       | 
       | What I think we need:
       | 
       | * An OS that runs hardware-independent code (WASM?) natively and
       | permits things like hot updates, state saving and restoration,
       | etc. Abstract away the hardware.
       | 
       | * Native built-in support for clustering, hot backups, live
       | process migration between nodes, and generally treating hardware
       | as a pure commodity in a RAIN (redundant array of inexpensive
       | nodes) configuration.
       | 
       | * A modern I/O API. Posix I/O APIs are awful. They could be
       | supported for backward compatibility via a compatibility library.
       | 
       | * Native built-in support for distributed clustered storage with
       | high availability. Basically a low or zero config equivalent of
       | Ceph or similar built into the OS as a first class citizen.
       | 
       | * Immutable OS that installs almost instantly on hardware, can be
       | provisioned entirely with code, and where apps/services can be
       | added and removed with no "OS rot." The concept of installing
       | software "on" the OS needs to be killed with fire.
       | 
       | * Shared distributed network stack where multiple machines can
       | have the same virtual network interfaces, IPs, and open TCP
       | connections can migrate. Built-in load balancing.
       | 
       | I'm sure people around here can think of more ideas that belong
       | in this list. These are not fringe things that are impossible to
       | build.
       | 
       | Basically you should have an immutable image OS that turns many
       | boxes into one box and you don't have to think about it. Storage
       | is automatically clustered. Processes automatically restart or,
       | if a hardware fault is detected in time, automatically _migrate_.
       | 
       | There were efforts to build such things (Mosix, Plan 9, etc.) but
       | they were bulldozed by the viral spread of free Unix-like OSes
       | that were "good enough."
       | 
       | Edit:
       | 
       | That being said, I'm not saying Kubernetes is good software
       | either. The core engine is actually decent and as the OP said has
       | a lot of complexity that's needed to support what it does. The
       | ugly nasty disgusting parts are the config interface, clunky shit
       | like YAML, and how generally arcane and unapproachable and _ugly_
       | the thing is to actually use.
       | 
       | I just _loathe_ software like this. I feel the same way about
       | Postgres and Systemd.  "Algorithmically" they are fine, but the
       | interface and the way you use them is arcane and makes me feel
       | like I'm using a 70s mainframe on a green VT220 monitor.
       | 
       | Either these things are designed by the sorts of "hackers" who
       | _like_ complexity and arcane-ness, or they 're hacks that went
       | viral and matured into global infrastructure without planning. I
       | think it's a mix of both... though in the case of Postgres it's
       | also that the project is legitimately old. It feels like old-
       | school Unix clunkware because it is.
        
         | tayo42 wrote:
         | What would you fix if you could?
        
         | happymellon wrote:
         | I'm not entirely convinced that there isn't a better way. With
         | AWS Lambda and alternatives able to run containers on demand,
         | and OpenFaas, they all point to "a better way".
         | 
         | [Edit] Parent comment is almost entirety completely different
         | after that edit to what I responded to. But I think my point
         | still stands. One day, hopefully in my lifetime, we shall see
         | it.
        
           | api wrote:
           | Yeah I do think lambda-style coding where you move away from
           | the idea of _processes_ toward functions and data are another
           | possibly superior way.
           | 
           | The problem is that right now this gets you lock-in to a
           | proprietary cloud. There are some loose standards but the
           | devil's in the details and once you are deployed somewhere
           | it's damn hard to impossible to move without serious downtime
           | and fixing.
        
             | Too wrote:
             | How about Erlang?
             | 
             | I can't say I know it myself. It always looks good on
             | paper. Strangely nobody uses it. There must be a catch that
             | detracts from it?
        
               | api wrote:
               | People want to program in their preferred language, not
               | be forced to use one language to have these benefits.
        
               | Izkata wrote:
               | I don't know it either, but a vague understanding I got
               | in the past was the language itself wasn't very user-
               | friendly. I think Elixir was supposed to solve that.
        
             | happymellon wrote:
             | Completely agree, but that's where OpenFaas (or another
             | open standard) comes in.
             | 
             | Hopefully we should get OpenFaas and Lambda, in the same
             | way we have ECS and EKS. Standardised ways to complete
             | tasks, rather than managing imaginary servers.
             | 
             | We are still early in the cycle.
        
         | throwaway892238 wrote:
         | Agreed. If Linux were a distributed OS, people would just be
         | running a distro with systemd instead of K8s. (Of course,
         | systemd is just another kubernetes, but without the emphasis on
         | running distributed systems)
        
           | p_l wrote:
           | CoreOS tried to distribute systemd and seemed it wasn't
           | working all that well compared to trying to optimize for k8s
        
             | geodel wrote:
             | Maybe they folded before their ideas could take root and
             | backed by decent implementation.
        
             | throwaway892238 wrote:
             | That whole concept is bizarre. It's like wanting to fly, so
             | rather than buy a plane, you take a Caprice Classic and try
             | to make it fly.
             | 
             | If CoreOS actually wanted to make distributed computing
             | easier, they'd make patches for the Linux kernel (or make
             | an entirely different kernel). See the many distributed OS
             | kernels that were made over 20 years ago. But that's a lot
             | of work. So instead they tried to go the cheap and easy
             | route. But the cheap and easy route ends up being much
             | shittier.
             | 
             | There's no commercial advantage to building a distributed
             | OS, which is why no distributed OS is successful today. You
             | would need a crazy person to work for 10 years on a pet
             | project until it's feature-complete, and then all of a
             | sudden everyone would want to use it. But until it's
             | complete, nobody would use it, and nobody would spend time
             | developing it. Even once it's created, if it's not popular,
             | still nobody will use it (you can use Plan9 today, but
             | nobody does).
             | 
             | https://en.wikipedia.org/wiki/Distributed_operating_system
        
         | kiitos wrote:
         | > These are not fringe things that are impossible to build.
         | 
         | Maybe not, but I'm confident that the system you're describing
         | is impossible to build in a way that is both general and
         | efficient.
        
       | Spivak wrote:
       | This matches our experience as well. As long as you treat your
       | managed k8s cluster as autoscaling-group as-a-service you'll do
       | fine.
       | 
       | k8s's worst property is that it's a cleverness trap. You can do
       | anything in k8s whether it's sane to do so or not. The biggest
       | guardrail against falling into is managing your k8s with
       | terraform-ish so that you don't find yourself in a spot where
       | "effort to do it right" >> "effort to hack in YAML" and finding
       | your k8s cluster becoming spaghetti.
        
         | x86x87 wrote:
         | Why not just use an autoscaling group?
         | 
         | Re: cleverness trap. I feel like this is the tragedy of
         | software development. We like to be seen as clever. We are
         | doing "hard" things. I have way more respect for engineers that
         | do "simple" things that just work using boring tech and factor
         | in whole lifecycle of the product.
        
           | p_l wrote:
           | > Why not just use an autoscaling group?
           | 
           | Not everyone has money to burn, even back in ZIRP era.
           | 
           | And before you trot out wages for experienced operations team
           | - I've regularly dealt with it being cheaper to pay for one
           | or two very experienced people than deal with AWS bill.
           | 
           | For the very simple reason that cloud provider's prices are
           | scaled to US market and not everyone has US money levels.
        
           | Spivak wrote:
           | Sorry, I could have explained that better. The biggest value
           | add that k8s has is that it gives you as many or as few
           | autoscaling groups as you need at a given time using only a
           | single pool (or at least fewer pools) of heterogeneous
           | servers. There's lots of fine print here but it really does
           | let you run the same workloads on less hardware and to me
           | that's the first and last reason you should be using it.
           | 
           | I wouldn't start with k8s and instead opt for ASGs until you
           | reach the point where you look at your AWS account and see a
           | bunch of EC2 instances sitting underutilized.
        
       | treesciencebot wrote:
       | > Above I alluded to the fact that we briefly ran ephemeral,
       | interactive, session-lived processes on Kubernetes. We quickly
       | realized that Kubernetes is designed for robustness and
       | modularity over container start times.
       | 
       | Is there a clear example of this? E.g. is kubernetes inherently
       | unable to start a pod (assuming the same sequence of events, e.g.
       | warm/cold image with streaming enabled) under 500ms, 1s etc?
       | 
       | I am asking this as someone who spent quite a bit of time and
       | wasn't able to bring it down 2s< mark, which eventually led us to
       | rewrite the latency sensitive parts to use Nomad. But we are
       | currently in a state where we are re-considering kubernetes for
       | its auxilary tooling benefits and would love to learn more if
       | anyone had experiences with starting and stopping thousands of
       | pods with the lowest possible latencies without caring for
       | utilization or placement but just observable boot latencies.
        
         | p_l wrote:
         | You'd have to ensure that
         | 
         | a) preload all images of course
         | 
         | b) there's enough of nodes with enough capacity
         | 
         | c) the pods don't use anything that has possible longer latency
         | (high latency CSI etc.)
         | 
         | d) you might want to write custom scheduler for your workloads
         | (it could take into account what images are preloaded where,
         | etc)
        
         | paulgb wrote:
         | I do believe that with the right knowledge of Kubernetes
         | internals it's _probably_ possible to get k8s cold start times
         | competitive with where we landed without Kubernetes (generally
         | subsecond, often under 0.5s depending on how much the container
         | does before passing a health check), but we 'd have to
         | understand k8s internals really well and would have ended up
         | throwing out much of what already existed. And we'd probably
         | end up breaking most of the reasons for using Kubernetes in the
         | first place in the process.
        
           | p_l wrote:
           | Not much internals needed, but actual in depth understanding
           | of Pod kube-api plus at least basics of how scheduler,
           | kubelet, and kubelet drivers interact.
           | 
           | Big possible win is custom scheduling, but barely anyone
           | seems to know it exists
        
             | paulgb wrote:
             | Yeah, looking into writing a scheduler was basically where
             | we stepped back and said "if we write this ourselves, why
             | not the rest, too". As I see it, the biggest gains that we
             | were able to get were by making things happen in parallel
             | that would by default happen in sequence, and optimizing
             | for the happy path instead of optimizing for reducing
             | failure. In Kubernetes it's reasonable to have to wait for
             | a dozen things to serially go through RAFT consensus in
             | etcd before the pod runs, but we don't want that.
             | 
             | (I made up the dozen number, but my point is that that
             | design would be perfectly acceptable given Kubernetes'
             | design constraints)
        
             | cogman10 wrote:
             | Not surprising to me. People are complaining about how
             | difficult it is to know k8s when you talk about the basic
             | default objects. Getting into the weeds of how the api and
             | control plane work (especially since it has little impact
             | on day to day dev) is something devs tend to just avoid.
        
               | p_l wrote:
               | Honestly, devs of the applications that run on top
               | probably should not have to worry about it. Instead have
               | a platform team provide the necessary features.
        
           | hobofan wrote:
           | Yeah, with plain Kubernetes I'd also see the practical limit
           | around ~0.5s. If you are on GKE Autopilot where you also have
           | little control over node startup there is likely also a lot
           | more unpredictability.
           | 
           | Something like Knative can allow for faster startup times if
           | you follow the common best-practices (pre-fetching images,
           | etc.), but I'm not sure if it supports enough of the session-
           | related feature that you were probably looking for to be a
           | stand-in for Plane.
        
       | fifilura wrote:
       | There is nothing wrong with k8s it is a nice piece of technology.
       | 
       | But the article trending here a couple of days ago describes it
       | well. https://www.theolognion.com/p/company-forgets-why-they-
       | exist...
       | 
       | It is complex enough to make the k8s maintainers the heroes of
       | the company. And this is where things tend to go sideways.
       | 
       | It has enough knobs and levers to distract the project from what
       | they are actually trying to achieve.
        
         | cedws wrote:
         | I see Kubernetes the same way as git. Elegant fundamental
         | design, but the interface to it is awful.
         | 
         | Kubernetes is designed to solve big problems and if you don't
         | have those problems, you're introducing a tonne of complexity
         | for very little benefit. An ideal orchestrator would be more
         | composable and not introduce more complexity than needed for
         | the scale you're running at. I'd really like to see a modern
         | alternative to K8S that learns from some of its mistakes.
        
           | pphysch wrote:
           | Git is a much more subtle abstraction than k8s though. You
           | can be blissfully unaware that a directory is a git repo, and
           | still read/patch files.
           | 
           | You cannot pretend k8s doesn't exist in a k8s system.
        
       | PUSH_AX wrote:
       | I was once talking to an ex google site reliability engineer. He
       | said there are maybe a handful of companies in the world that
       | _need_ k8s. I tend to agree. A lot of people practice hype driven
       | development.
        
         | x86x87 wrote:
         | I tend to agree. K8s makes a lot of sense if you are running
         | your own bare metal servers at scale.
         | 
         | If you are already using the cloud, maybe leverage abstraction
         | already available in that context.
        
           | candiddevmike wrote:
           | You either recreate a less reliable version of kubernetes for
           | workload ops or you go all in on your cloud provider and hope
           | they'll be responsible for your destiny.
           | 
           | Vanilla Kubernetes is just enough abstraction to avoid both
           | of those situations.
        
             | x86x87 wrote:
             | You cannot really be cloud agnostic these days - even when
             | using k8s. So learning to use the capabilities the cloud
             | provides is key.
        
               | p_l wrote:
               | Doesn't really mesh with my experience, especially the
               | longer k8s been out.
               | 
               | It can be _cheaper_ to depend on cloud provider to ship
               | some features, but with tools like crossplane you can
               | abstract that out so developers can just  "order" a
               | database service etc. for their application.
        
             | PUSH_AX wrote:
             | Is "hope" the new replacement for SLAs? Or am I missing
             | something with that statement?
        
               | k8sToGo wrote:
               | SLA do not prevent something from breaking,
               | unfortunately. It is just a blame construct.
        
               | p_l wrote:
               | "Hope" that your cloud provider matches as well your
               | needs as you thought, that vendor lock-in doesn't let
               | them milk you with high prices, etc. etc.
               | 
               | None of that is prevented with SLA
        
               | PUSH_AX wrote:
               | This requires the same skill and experience as figuring
               | out if k8s is going to be a good fit.
               | 
               | Arguably if you can't evaluate the raw cloud offerings
               | and jump on a supposed silver bullet you need to stop
               | immediately.
        
               | p_l wrote:
               | At this point I found out that k8s knowledge is more
               | portable, whereas your trove of $VENDOR_1 knowledge might
               | suddenly have issues because for reasons outside of your
               | capacity to control there's now big spending contract
               | signed with $VENDOR_2 and a mandate to move.
               | 
               | And with smaller companies I tend to find k8s way more
               | cost effective. I pulled things I wouldn't be able to fit
               | in a budget otherwise.
        
           | k8sToGo wrote:
           | I joined a team that used AWS without kubernetes. Thousands
           | of fragile weird python and bash scripts. Deployment was
           | always such a headache.
           | 
           | A few months later I transitioned the team to use containers
           | with proper CI/CD and EKS with Terraform and Argo CD. The
           | team and also the managers like it, since we could deploy
           | quite quickly.
        
             | evantbyrne wrote:
             | This is an apples-to-oranges comparison. You would still
             | have to write and maintain glue without the presence of a
             | proper CD.
        
             | PUSH_AX wrote:
             | Thanks for the anecdote k8sToGo
        
         | misiti3780 wrote:
         | if not k8, what would other people be using? ECS?
        
           | k8sToGo wrote:
           | From my experience, classical VMs with self written Bash
           | scripts. The horror!
        
           | kenhwang wrote:
           | If you're on AWS, yeah, I'd say just use ECS until you need
           | more complexity. Our ECS deployments have been unproblematic
           | for years now.
           | 
           | Our K8s clusters never goes more than a couple days without
           | some sort of strange issue popping up. Arguably it could be
           | because my company outsourced maintenance of it to an army of
           | idiots. But K8s is a tool that is only as good as the
           | operator, and competence can be hard to come by at some
           | companies.
        
             | p_l wrote:
             | K8s or no K8s, outsource to lowest bidder and you'll get
             | unworkable platform :|
        
               | kenhwang wrote:
               | Agreed. But if you're already on AWS, I'd say the quality
               | floor is already higher than the potential at 95%+ of
               | other companies.
               | 
               | So I say unless you're at a company that pays top
               | salaries for the top 5% of engineering talent, you're
               | probably better off just using the AWS provided service.
        
               | p_l wrote:
               | I used to have a saying back when Heroku was more
               | favourable, is that you use Heroku because you want to go
               | bankrupt. AWS is at times similar.
               | 
               | Depending on your local market, AWS bills might be way
               | worse than the cost of few bright ops people who will let
               | you choose from offerings including running dev envs on
               | random assortment of dedicated servers and local e-waste
               | escapees
        
           | liveoneggs wrote:
           | ECS is so nice and simple. http://kubernetestheeasyway.com
        
           | nprateem wrote:
           | Cloud run, etc, but there seem to be some biggish gaps in
           | what those tools can do (probably because if deploying a
           | container was too easy the cloud providers would lose loads
           | of profit).
        
           | evantbyrne wrote:
           | I honestly think docker compose is the best default option
           | for single-machine orchestration. The catch is that you
           | either need to do some scripting to get fully automated zero
           | downtime deploys. I have to imagine someone will eventually
           | figure out a way to trivialize that, if they haven't already.
           | Or, you could just do the poor man's zero downtime deploy:
           | run two containers, deploy container a, wait for it to be
           | ready, then deploy container b, and let the reverse proxy do
           | the rest.
        
             | KronisLV wrote:
             | Docker Swarm takes the Compose format and takes it to
             | multi-node clusters with load balancing, while keeping
             | things pretty simple and manageable, especially with
             | something like Porainer!
             | 
             | For larger scale orchestratiom, Hashicorp Nomad can also be
             | a notable contender, while in some ways still being simpler
             | than Kubernetes.
             | 
             | And even when it comes to Kubernetes, distros like K3s and
             | tools like Portainer or Rancher can keep managing the
             | cluster easy.
        
         | geodel wrote:
         | And that hype is in large part created by Google and other
         | cloud vendors.
         | 
         | To be honest I hardly see any reasonable/actionable advice from
         | Cloud/SAAS vendors. Either it is to sell their stuff or generic
         | stuff like "One should be securing / monitoring their stuff
         | running in prod". Oh wow, never thought or done any such thing
         | before.
        
         | imiric wrote:
         | That might be true, but unfortunately the state of the art
         | infrastructure tooling is mostly centered around k8s. This
         | means that companies choose k8s (or related technologies like
         | k3s, Microk8s, etc.) not because they strictly _need_ k8s, but
         | because it improves their workflows. Otherwise they would need
         | to invest a disproportionate amount of time and effort adopting
         | and maintaining alternative tooling, while getting an inferior
         | experience.
         | 
         | Choosing k8s is not just based on scaling requirements anymore.
         | There are also benefits of being compatible with a rich
         | ecosystem of software.
        
           | PUSH_AX wrote:
           | Can you specify what state of the art infra tooling you mean?
        
             | imiric wrote:
             | Continuous deployment systems like ArgoCD and Flux, user
             | friendly local development environments with tools like
             | Tilt, novel networking, distributed storage, distributed
             | tracing, etc. systems that are basically plug-and-play,
             | etc. Search for "awesome k8s" and you'll get many lists of
             | these.
             | 
             | It's surely possible to cobble all of this together without
             | k8s, but k8s' main advantage is exposing a standardized API
             | that simplifies managing this entire ecosystem. It often
             | makes it worth the additional overhead of adopting,
             | understanding and managing k8s itself.
        
         | k8sToGo wrote:
         | I push for k8s because I _know_ it. Why not use something that
         | I know how to use? I know how to quickly set up a cluster, what
         | to deploy, and teach other team members about fundamentals.
         | 
         | How many people out there really _need_ C# or object oriented
         | programming?
         | 
         | The argument you present might be valid if you decide to use a
         | tech stack prior having much experience with it.
        
           | nprateem wrote:
           | Yeah that's the point. You know it and stuff everyone else.
        
             | p_l wrote:
             | Some custom bash/python/ansible monstrosity is only going
             | to be known by few brains in the world.
             | 
             | K8s is remarkably easier to retain institutional knowledge
             | as well as spread it.
        
               | nprateem wrote:
               | If you're expecting app/FE devs to have to learn it
               | you're putting a ton of barriers in their way in terms of
               | deploying. Just chucking a container on a non-k8s managed
               | platform (e.g. Cloud Run) would be much simpler, and no
               | pile of bash scripts.
        
               | p_l wrote:
               | PaaSes are for companies with money to burn, most of the
               | time. A good k8s team (even a single person, to be quite
               | honest) is going to work towards providing your
               | application teams with simple templates to let them
               | deploy their software easily. Just let them do it.
               | 
               | Also, in my experience, you either have to spend
               | ridiculous amounts of money on SaaS/PaaS, or you find
               | that you have to host a lot more than just your
               | application and suddenly the deployment story becomes
               | more complex.
               | 
               | Depending on where you are and how much you're willing to
               | burn money, you might find out that k8s experts are
               | cheaper than the money saved by not going PaaS.
        
               | foverzar wrote:
               | > If you're expecting app/FE devs to have to learn it
               | 
               | Why would anyone expect it? It's not their job, is it? We
               | don't expect backend devs to know frontend and vice-
               | versa, or any of them to have AWS certification. Why
               | would it be different with k8s?
               | 
               | > Just chucking a container on a non-k8s managed platform
               | (e.g. Cloud Run) would be much simpler, and no pile of
               | bash scripts.
               | 
               | Simpler to deploy, sure, but not to actually run it
               | seriously in the long term. Though, if we are talking
               | about A container (as in singular), k8s would indeed be
               | some serious over-engineering
        
             | k8sToGo wrote:
             | If it's about the knowledge of everyone else, why was I
             | hired as a _cloud_ engineer? Everyone else in my team was
             | more R &D
        
         | planetafro wrote:
         | Just a thought as well in my corpo experience: Unfortunately,
         | there are some spaces that distribute solutions as k8s-only...
         | Which sucks. I've noticed this mostly in the data
         | science/engineering world. These are solutions that could be
         | easily served up in a small docker compose env. The
         | complexity/upsell/devops BS is strong.
         | 
         | To add insult to injury, I've seen more than one use IaC cloud
         | tooling as an install script vs a maintainable and idempotent
         | solution. It's all quite sad really.
        
         | p_l wrote:
         | There's a difference between _need or you don 't survive_ and
         | _it improves our operations_.
         | 
         | The former is a very small set involving having huge amounts of
         | bare metal systems.
         | 
         | The latter is suprisingly large set of companies, sometimes
         | even with one server.
        
         | Thaxll wrote:
         | It's a dumb statement especially from an SRE, it's typically a
         | comment from people that don't understand k8s and think that
         | k8s is only there to have the SLA of Google.
         | 
         | For most use case k8s is not there to give you HA but to give
         | you a standard way of deploying a stack, that being on the
         | cloud or on prem.
        
           | PUSH_AX wrote:
           | He understood it fully, he was running a multi day course on
           | it when I spoke to him. He was candid about the tech, most of
           | us where there at the behest of our orgs.
        
             | p_l wrote:
             | In my personal experience, Google SREs as well as k8s devs
             | sometimes didn't grok how wide k8s usability was - they
             | also can be blind to financial aspects of companies living
             | outside of Silly Valley.
        
         | throwawaaarrgh wrote:
         | Most companies in the world don't need to develop software.
         | Software development itself is hype. But there's lots of money
         | in it, despite no actual value being created most of the time.
        
       | rwmj wrote:
       | > _It's also worth noting that we don't administer Kubernetes
       | ourselves_
       | 
       | This is the key point. Even getting to the point where I could
       | install Kubernetes myself on my own hardware took weeks, just
       | understanding what hardware was needed and which of the (far too
       | many) different installers I had to use.
        
         | LegibleCrimson wrote:
         | I found k3s pretty easy to spin up.
        
       | hobofan wrote:
       | OT: Can something be done about HN commenting culture so that the
       | comments stay more on topic?
       | 
       | Some technologies (like Kubernetes) tend to attract discussions
       | where half of the commenters completely ignore the original
       | article, so we end up having a weekly thread about Kubernetes
       | where the points of the article (which are interesting) are not
       | able to be discussed because they are drowned out by the same
       | unstructured OT discussions?
       | 
       | At the time of this posting there are ~20 comments with ~2
       | actually having anything to do with the points of the article
       | rather than Kubernetes in general.
        
         | cogman10 wrote:
         | Having read the article, isn't the point of the article
         | kubernetes in general and what the author prescribes you sign
         | up for/avoid?
         | 
         | Discussions of k8s pitfalls and successes in general seems to
         | be very much in line with what the article is advocating. And,
         | to that point, there's frankly just not a whole lot interesting
         | in this article for discussion "We avoid yaml and operators"...
         | Neat.
        
           | hobofan wrote:
           | > Having read the article, isn't the point of the article
           | kubernetes in general and what the author prescribes you sign
           | up for/avoid?
           | 
           | Yeah, and I think that provides a good basis to discussion,
           | where people can critique/discuss whether the evaluation that
           | the author has made are correct (which a few comments are
           | doing). At the same time a lot of that discussion is being
           | displaced by what I would roughly characterize as "general
           | technology flaming" which isn't going anywhere productive.
        
         | geodel wrote:
         | Huh, this article has hardly anything deep, technical, thought
         | provoking or unique compared to ten thousand other Kubernetes
         | articles.
         | 
         | I am rather happy that people are having general purpose
         | discussion about K8s.
        
         | freedomben wrote:
         | What you're seeing is the early-crowd. With most (not all)
         | posts, comments will eventually rise to the top that are more
         | what you're looking for. IME it usually takes a couple hours.
         | If it's a post where I really want to read the relevant
         | comments, I'll usually come back at least 8 to 12 hours later
         | and there's usually some good ones to choose from. Even topics
         | like Apple that attract the extreme lovers and haters tend to
         | trend this direction
        
         | pvg wrote:
         | The solution to that is to flag boring/generic articles and/or
         | post/upvote more specific, interesting articles. Generic
         | articles produce generic, mostly repetitive comments but then
         | again that's what the material the commenters are given.
        
       | Kab1r wrote:
       | I almost feel attacked for using plain yaml, helm, cert-manager
       | AND the ingress api just for personal homelab shenanigans.
        
         | cogman10 wrote:
         | Yeah, I disagree with the OP on the dangers there. They work
         | fairly well for us and aren't the source of headache. Though, I
         | still try and teach my dev teams that "just because bitnami
         | puts in variables everywhere, doesn't mean you need to. We
         | aren't trying to make these apps deployable on homelabs."
        
       | __MatrixMan__ wrote:
       | > But we often do multiple deploys per day, and when our products
       | break, our customer's products break for their users. Even a
       | minute of downtime is noticed by someone.
       | 
       | Kubernetes might be the right tool for the job if we accept that
       | this is a necessary evil. But maybe it's not? The idea that I
       | might fail to collaborate with you because a third party failed
       | because a fourth party failed kind of smells like a recipe for
       | software that breaks all the time.
        
         | paulgb wrote:
         | It really comes down to, I don't ever want to have the
         | conversation "is this a good time to deploy, or should we wait
         | until tonight when there's less usage". We have had some
         | periods where our system was more fragile, and planning our
         | days around the least-bad deployment window was a time suck,
         | and didn't scale to our current reality of round-the-clock
         | usage.
        
           | hellcow wrote:
           | You can achieve this without k8s, though. If your goal is, "I
           | want zero-downtime deploys," that alone is not sufficient
           | reason to reach for something as massively complex as k8s.
           | Set up a reverse proxy and do blue-green deploys behind it.
        
             | paulgb wrote:
             | > Set up a reverse proxy and do blue-green deploys behind
             | it.
             | 
             | That's what I currently use Kubernetes for. What stack are
             | you proposing instead?
        
               | sureglymop wrote:
               | If you only need zero downtime deployments, compose and
               | traefik/caddy are enough.
               | 
               | If you need to replicate storage, share networks and
               | otherwise share resources across multiple hosts,
               | kubernetes is better suited.
               | 
               | But you'll also have much less control with compose, e.g.
               | no limiting of egress/ingress and more.
        
               | paulgb wrote:
               | As I see it, managed Kubernetes basically gives me the
               | same abstraction I'd have with Compose, except that I can
               | add nodes easily, have some nice observability through
               | GKE, etc. Compose might be simpler if I were running the
               | cluster myself, but because GKE takes care of that, it's
               | one less thing that I have to do.
        
             | danenania wrote:
             | "Set up a reverse proxy and do blue-green deploys behind
             | it."
             | 
             | I think this already introduces enough complexity and edge
             | cases to make reinventing the wheel a bad idea. There's a
             | lot involved in doing it robustly.
             | 
             | There are alternatives to Kubernetes (I prefer ECS/Fargate
             | if you're on AWS), but trying to do it yourself to a
             | production-ready standard sets you up for a lot of
             | unnecessary yak shaving imho.
        
               | boxed wrote:
               | For small scales you can use Dokku. I do. It's great and
               | simple.
        
             | freedomben wrote:
             | This sounds like terrible advice. Managing a reverse proxy
             | with blue-green deploys behind it is not going to be
             | trivial, and you have to roll most of that yourself. The
             | deployment scripts alone are going to be hairy. Getting the
             | same from K8s requires having a deploy.yaml file and a
             | `kubectl apply -f <file>`. K8s is way less complex.
        
               | hellcow wrote:
               | I ran such a system in prod over 7 years with >5-9s
               | uptime, multiple deploys per day, and millions of users
               | interacting with it. Our deploy scripts were ~10 line
               | shell scripts, and any more complex logic (e.g. batching,
               | parallelization, health checks) was done in a short Go
               | program. Anyone could read and understand it in full. It
               | deployed much faster than our equivalent stack on k8s.
               | 
               | k8s is a large and complex tool. Anyone who's run it in
               | production at scale has had to deal with at least one
               | severe outage caused by it.
               | 
               | It's an appropriate choice when you have a team of k8s
               | operators full-time to manage it. It's not necessarily an
               | appropriate choice when you want a zero-downtime deploy.
        
               | freedomben wrote:
               | > _It 's an appropriate choice when you have a team of
               | k8s operators full-time to manage it._
               | 
               | Are you talking about a full self-run type of scenario
               | where you setup and administer k8s entirely yourself, or
               | a managed system or semi-managed (like OpenShift)?
               | Because if the former then I would agree with you,
               | although I wouldn't recommend a full self-run unless you
               | were a big enough corp to have said team. But if you're
               | talking about even a managed service, I would have to
               | disagree. I've been running for years on a managed
               | service (as the only k8s admin) and have never had a
               | severe outage caused by K8s
        
               | esafak wrote:
               | Is your short Go program public? I'm curious how you
               | handled progressive rollouts, and automated rollbacks.
        
               | hellcow wrote:
               | It isn't, sadly, but the logic is straightforward. Have a
               | set of IPs you target, iterate with your deploy script
               | targeting each, check health before continuing. If
               | anything doesn't work (e.g. health check fails), stop the
               | deploy to debug. There's no automated rollback--simply
               | `git revert` and run the deploy script again.
        
               | zer00eyz wrote:
               | >> Managing a reverse proxy with blue-green deploys
               | behind it is not going to be trivial, and you have to
               | roll most of that yourself.
               | 
               | There are a lot of reverse proxies that will do this.
               | Traditionally this was the job of a load balancer. With
               | that being done by "software" you get the fun job of
               | setting it up!
               | 
               | The hard part is doing it the first time, and having a
               | sane strategy. What you want to do is identify and
               | segment a portion of your traffic. Mostly this means
               | injecting a cooking into the segmented traffics HTTP(S)
               | requests. If you dont have a group of users consistently
               | on the new service you get some odd behavior.
               | 
               | The deployment part is easy. Cause your running things
               | concurrently then ports matter. Just have the alternate
               | version deployed on a different port. This is not a big
               | deal and is supper easy to do. In fact your deployments
               | are probably set up to swap ports anyway. So all your
               | doing is not committing to a final step in that process.
               | 
               | But... what if it is a service to service call inside
               | your network. That too should be easy. Your passing id's
               | around between calls for tracing right? Rather than
               | "random cookie" you're just going to route based on
               | these. Again easy to do in a reverse proxy, easier in a
               | load balancer.
               | 
               | It's not like easy blue green deploys are some magic of
               | kuberneties. We have been doing them for a long time.
               | They were easy to do once set up (and highly scripted as
               | a possible path for any normal deployment).
               | 
               | Kubernetes is to operations what rails is to
               | programing... Its good, fast, helpful... till it isnt and
               | then your left having buyers remorse.
        
       | teeray wrote:
       | Is there something like a k1s? What I'd love is "run this set of
       | containers on this machine. If the machine goes down, I don't
       | care--I will fix it." If it wired into nginx or caddy as well, so
       | much the better. Something like that for homelab use would be
       | wonderful.
        
         | eropple wrote:
         | You've basically described k3s, I think. I run it in my homelab
         | (though I am enough of a tryhard to have multiple control
         | planes) as well as on a couple of cloud servers as container
         | runtimes (trading some overhead for consistency).
         | 
         | k3s really hammers home the "kubernetes is a set of behaviors,
         | not a set of tools" stuff when you realize you can ditch etcd
         | entirely and use sqlite if you really want to, and is a good
         | learning environment.
        
         | szszrk wrote:
         | That's basically just a docker-compose.
         | 
         | If you want something crazy all-in-one for homelab check out
         | https://github.com/azukaar/Cosmos-Server
        
         | silverquiet wrote:
         | Docker Compose probably fits the bill for that. They also have
         | a built in minimalist orchestrator called Swarm if you do want
         | to extend to multiple machines. I suppose it's considered
         | "dead" since Kubernetes won mindshare, but it still gets
         | updates.
        
         | morbicer wrote:
         | For homelab, Docker compose should be enough
         | 
         | For something more production oriented
         | https://github.com/basecamp/kamal
        
         | pheatherlite wrote:
         | Docker bare bones or docker compose. Run as systemd services
         | and have docker run the container as a service account. Manual
         | orchestration is all you need. Anything else like rancher or
         | whatever are just fluff.
        
         | 7sidedmarble wrote:
         | That's called docker compose
        
         | blopker wrote:
         | I run all my projects on Dokku. It's a sweet spot for me
         | between a barebones VPS with Docker Compose and something a lot
         | more complicated like k8s. Dokku comes with a bunch of solid
         | plugins for databases that handle backups and such. Zero
         | downtime deploys, TLS cert management, reverse proxies, all out
         | of the box. It's simple enough to understand in a weekend and
         | has been quietly maintained for many years. The only downside
         | is it's meant mostly for single server deployments, but I've
         | never needed another server so far.
         | 
         | https://dokku.com/
        
           | josegonzalez wrote:
           | Just a note: Dokku has alternative scheduler plugins, the
           | newest of which wraps k3s to give you the same experience
           | you've always had with Dokku but across multiple servers.
        
           | boxed wrote:
           | Dokku really is a game changer for small business. It makes
           | me look like a magician with deploys in < 2m (most of which
           | is waiting for GitHub Actions to run the tests first!) and no
           | downtime.
        
       | throwaway892238 wrote:
       | When people call Kubernetes a "great piece of technology", I find
       | it the same as people saying the United States is the "greatest
       | country in the world". Oh yeah? Great in what sense? Large?
       | Absolutely. Powerful? Definitely. But then the adjectives sort of
       | take a turn... Complicated? Expensive? Problematic? Threatening?
       | A quagmire? You betcha.
       | 
       | If there were an alternative to Kubernetes that were just 10%
       | less confusing, complicated, opaque, monolithic, clunky, etc, we
       | would all be using it. But because Kubernetes exists, and
       | everyone is using it, there's no point in trying to make an
       | alternative. It would take years to reach feature parity, and
       | until you do, you can't really switch away. It's like you're
       | driving an 18-wheeler, and you think it kinda sucks, but you
       | can't just buy and then drive a completely different 18 wheeler
       | for only a couple of your deliveries.
       | 
       | You probably will end up using K8s at some point in the next 20
       | years. There's not really an alternative that makes sense. As
       | much as it sucks, and as much as it makes some things both more
       | complicated and harder, if you actually need everything it
       | provides, it makes no sense to DIY, and there is no equivalent
       | solution.
        
         | p_l wrote:
         | People forgot just how much of a mess Mesos environment was in
         | comparison.
         | 
         | And often pushed Nomad to this day surprises me with randomly
         | missing a feature or two that turns out to be impactful enough
         | to want to deal with more complexity because ultimately the
         | result was less complexity in total.
        
       | madduci wrote:
       | I don't get most of the blame and reasoning.
       | 
       | Sure, everyone has their own product and experience and it's fine
       | to express it, but I don't get the usage of other decisions such
       | as "no to services meshes", "no to helm" and many more.
       | 
       | You don't want to ideally reinvent the wheel for every workload
       | you need (say you need a OIDC endpoint, an existing application):
       | you are tempted to write everything from scratch by yourself,
       | which is also fine, but the point is: why?
       | 
       | Many products deliver their own Helm package. And if you are sick
       | of writing YAML, I would look for Terraform over Pulumi, for the
       | reason that you use the same tool for bringing up Infrastructure
       | and then workloads.
       | 
       | Kubernetes itself isn't easy to be used, in many cases you don't
       | need it, but it might bring you nice things straight out of the
       | box with less pain than other tooling (e.g. zero downtime
       | deployments)
        
         | p_l wrote:
         | The problem with Helm is that it did the one thing you should
         | not do, and refused to fix it even when their promised to.
         | 
         | They do text-replacement templating for YAML.
         | 
         | I have once spent a month, being quite experienced k8s
         | wrangler, trying to figure out why Helm 2 was timing out, only
         | to finally trace it down to how sometimes we would get wrong
         | number of spaces in some lines.
        
         | eropple wrote:
         | I admit that I use some Helm stuff in my home environment, but
         | for production I'm genuinely worried about the need to support
         | whatever they've thrown into it. At minimum I'm going to have
         | to study the chart and understand exactly what they propose to
         | open-palm slam into my cluster, and for many/most applications
         | at that point it might genuinely be worth just writing a
         | manifest myself. Not always. Some applications are genuinely
         | complex and need to be! But often, this has been the case for
         | me. For all my stuff, though, I use kustomize and I'm pretty
         | happy with it; it's too stupid for me to be clever, and this is
         | good.
         | 
         | Service meshes are a different kettle of fish. hey add exciting
         | new points of failure where they need not exist, and while
         | there are definitely use cases for them, I'd default to
         | avoiding them until somebody proves the need for one.
        
       | pheatherlite wrote:
       | Why are people still scared of k8s? At certain jobs thresholds,
       | it is worth every ounce of effort to maintain it. Better yet, go
       | managed.
        
         | cogman10 wrote:
         | I honestly don't understand it either. Familiarity? K8s has
         | like, what, 5 big concepts to know and once you are there the
         | other concepts (generally) just build from there.
         | 
         | - Containers
         | 
         | - Pods
         | 
         | - Deployments
         | 
         | - Services
         | 
         | - Ingresses
         | 
         | There are certainly other concepts you can learn, but you
         | aren't often dealing with them (just like you aren't dealing
         | with them when working with something like docker compose).
        
           | nprateem wrote:
           | Good luck fixing etcd when a major version upgrade breaks. It
           | took all weekend to fight that fire when it happened to us.
        
             | p_l wrote:
             | Been there, done that, didn't get a t-shirt but got to yell
             | at some people for setting up with undersized VMs and
             | forgetting to note it anywhere.
             | 
             | Haven't had an issue once I fixed sizing.
        
             | Thaxll wrote:
             | Use managed k8s, problem solved.
        
               | nprateem wrote:
               | That problem solved, but plenty of other things hiding in
               | that 'simple' setup of just 5 concepts.
        
         | jakupovic wrote:
         | People don't understand k8s and are thus hating. K8s is a
         | wonderful tool for most things many teams need. It may not be
         | useful for homelab type of stuff as the learning curve is
         | steep, but for professional use it cannot be beat currently.
         | Just a bunch of I know what I'm doing and don't need this
         | complicated thing I don't understand. Pretty simple and
         | especially in a forum such as HN where we all are "experts" and
         | need to explain to ourselves, and crucially others, why we are
         | right not to use k8s. Bunch of children really.
        
       | axpy906 wrote:
       | I clicked expecting something a bit more detailed here.
       | 
       | What are the best resources to learn simple k8 in 2024?
        
         | jakupovic wrote:
         | Try putting a simple https app on a managed k8s cluster and use
         | google/whatever to figure it out, that should get you started.
        
       | k8sToGo wrote:
       | Interesting that they avoid helm. It is the "plug and play"
       | solution for Kubernetes. However, that is only in theory. My
       | experience with most operators out there was clunky, buggy, or
       | very limited and did not expose everything needed. But I still
       | end up using helm itself with the combination of ArgoCD.
        
         | habitue wrote:
         | Helm is just a mess. If you're going to deploy something from
         | helm, you're better off taking it apart and reconstructing it
         | yourself, rather than depending on it to work like a package
         | manager
        
           | hobofan wrote:
           | In my experience, if you use first-party charts (= published
           | by the same people that publish the packaged software) that
           | are likely also provided to enterprise customers you'll have
           | a good time (or at least a good starting point). For third-
           | party charts, especially for more niche software I'd also
           | rather avoid them.
        
         | szszrk wrote:
         | I think the important detail here is that he mentions he
         | doesn't use it because of operators. That may mean they tried
         | it in previous major version which used teller. That was quite
         | a long time ago.
         | 
         | That being said, helm templates are disgusting and I absolutely
         | hate how easily developers complicate their charts. Even the
         | default empty chart has helpers. Why, on Earth, why?
         | 
         | I almost fully relate to OPs aproach to k8s but I think with
         | their simplified approach helm (the current one) could work
         | quite well.
        
         | stackskipton wrote:
         | We avoided Helm as well. We found that Kustomize provides
         | enough templating to cover almost all the common use cases and
         | it's very easy for anyone to check their work, kubectl
         | kustomize > compiled.yaml. FluxCD handles postbuild find and
         | replace.
         | 
         | At most places, your cluster configuration is probably pretty
         | set in stone and doesn't vary a ton.
        
       | neya wrote:
       | There were some "hype cycles" (in Gartner's lingo) that I avoided
       | during my career. The first one was the MongoDB/NoSQL hype -
       | "Let's use NoSQL for everything!" trend. I tried it in a medium
       | sized project and burnt my finger and it was right around when HN
       | was flooded with full of "Why we migrated to MongoDB" stories.
       | 
       | The next one was Microservices. Everyone was doing something with
       | microservices and I was just on a good 'ole Ruby on Rails
       | monolith. Again, the HN stories came and went "Why we broke down
       | our simple CRUD app into 534 microservices".
       | 
       | The final one was Kubernetes. I was a Cloud consultant in my past
       | life and had to work with a lot of my peers who had the freedom
       | to deploy In any architecture they saw fit. A bunch of them were
       | on Kubernetes and I was just on a standard Compute VM for my
       | clients.
       | 
       | We had a requirement from our management that all of us had to
       | take some certification courses so they would be easily to pitch
       | to clients. So, I prepped for one and read about Kubernetes and
       | tried deploying a bunch of applications only to realize it was a
       | very complex piece of moving parts - unnecessarily I may add. I
       | was never able to understand why this was pushed on as normal. It
       | made my decision to not use it only stronger.
       | 
       | Over the course of the 5 year journey, my peers' apps would
       | randomly fail and they would be sometimes pulled over the
       | weekends to push fixes to avert the P1 situation whilst I would
       | be casually chilling in a bar with my friends. My compute engine
       | VM, till date, to its credit has only had one P1 situation yet.
       | And that was because the client forgot to renew their domain
       | name.
       | 
       | Out of all the 3 hype cycles that I avoided in my career, the
       | Kubernetes is the one I really am thankful of evading the most.
       | This sort of complexity should not be normalised. I know this
       | maybe unpopular opinion on HN, but I am willing to bite the
       | bullet and save my time and my clients' money. So, thanks for the
       | hater's guide. But, I prefer to remain one. I'd rather call a
       | spade one.
        
         | dminor wrote:
         | Early on in the container hype cycle we decided to convert some
         | of our services from VMs to ECS. It was easy to manage and the
         | container build times were so much better than AMI build times.
         | 
         | Some time down the road we got acquired, and the company that
         | acquired us ran their services in their own Kubernetes cluster.
         | 
         | When we were talking with their two person devops team about
         | our architecture, I explained that we deployed some of our
         | services on ECS. "Have you ever used it?" I asked them.
         | 
         | "No, thank goodness" one of them said jokingly.
         | 
         | By this time it was clear that Kubernetes had won and AWS was
         | planning its managed Kubernetes offering. I assumed that after
         | I became familiar with Kubernetes I'd feel the same way.
         | 
         | After a few months though it became clear that all these guys
         | did was babysit their Kubernetes cluster. Upgrading it was a
         | routine chore and every crisis they faced was related to some
         | problem with the cluster.
         | 
         | Meanwhile our ECS deploys continued to be relatively hassle
         | free. We didn't even have a devops team.
         | 
         | I grew to understand that managing Kubernetes was fun for them,
         | despite the fact that it was overkill for their situation. They
         | had architected for scale that didn't exist.
         | 
         | I felt much better about having chosen a technology that didn't
         | "win".
        
           | jakupovic wrote:
           | So you don't use things you don't understand, valid point.
           | But, saying others are using k8s as a way to use up free time
           | is pretty useless too as we have managed k8s offerings and
           | thus don't need the exercise. If you don't need k8s don't use
           | it, thanks. Pretty useless story honestly
        
           | p_l wrote:
           | A lot depended on whether the ECS fit what you needed. ECSv1,
           | even with FarGate, was so limited that my first k8s use was
           | pretty much impossible on it at sensible price points, for
           | example.
        
       | therealfiona wrote:
       | Something struck with me here that I've been thinking about. OP
       | says a human should never wait for a pod. Agreed, it is annoying
       | and sometimes means waiting for an EC2 and the pod.
       | 
       | We have jobs that users initiate that use 80+GB of memory and a
       | few dozen cores. We run only one pod per node because the next
       | size up EC2 costs a fortune and performance tops out on our
       | current size.
       | 
       | These jobs are triggered via a button click that trigger a lambda
       | that submits a job to the cluster. If it is a fresh node, user
       | has to wait for the 1gb container to download from ECR. But it is
       | the same container that the automated jobs that kick off every
       | few minutes also uses, to rarely is there any waiting. But
       | sometimes there is.
       | 
       | Should we be running some sort of clustering job scheduler that
       | gets the job request and distributes work amongst long running
       | pods in the cluster? My fear is that we just creat another layer
       | of complexity and still end up waiting for the EC2, waiting for
       | the pod to download, waiting for the agent now running on this
       | pod to join the work distribution cluster.
       | 
       | However, we probably could be more proactive with this because we
       | could spin up an extra pod+EC2 when the work cluster is 1:1
       | job:ec2.
       | 
       | Thoughts?
       | 
       | We're in the process of moving to Karpenter, so all this may be
       | solved for us very soon with some clever configuration.
        
         | p_l wrote:
         | If you don't want to change the setup too much, consider
         | running your nodes off an AMI with pre-loaded image. Maybe also
         | ensure how exactly the images are layered, so if necessary you
         | can reduce amount of "first boot patch" download.
        
         | Too wrote:
         | There is a difference between waiting and waiting.
         | 
         | For an hourly batch job that already takes 10 minutes to run,
         | the extra time for pod scheduling and container downloading is
         | negligible anyway.
         | 
         | What you shouldn't do is put pod scheduling in places where
         | thousands of users per minute expect sub-second latency.
         | 
         | In your case, if the time for starting up the EC2 becomes a
         | bigger factor than the job itself, you can add placeholder pods
         | that just sleep, while requiring exactly that machine config
         | but request 0 cpus, just to make sure it stays online.
        
       | the_duke wrote:
       | I know it's fashionable to hate on Kubernetes these days, and it
       | is overly complex and has plenty problems.
       | 
       | But what other solution allows you to:
       | 
       | * declarative define your infrastructure
       | 
       | * gives you load balancing, automatic recovery and scaling
       | 
       | * provides great observability into your whole stack (kubectl,
       | k9s, ...)
       | 
       | * has a huge amount of pre-packaged software available (helm
       | charts)
       | 
       | * and most importantly: allows you to stand up mostly the same
       | infrastructure in the cloud, on your own servers (k3s), and
       | locally (KIND), and thus doesn't tie you into a specific cloud
       | provider
       | 
       | The answer is: there isn't any.
       | 
       | Kubernetes could have been much simpler, and probably was
       | intentionally built to not be easy to use end to end.
       | 
       | But it's still by far the best we've got.
        
         | dijit wrote:
         | Cloud and terraform gives you those.
         | 
         | You're right that kubernetes is a bit batteries included, and
         | for that its tempting to take it off the shelf because it "does
         | a lot of needed things", but you don't _need_ one tool to do
         | all of those things.
         | 
         | It is ok to have domain specific processes or utilities to
         | solve those.
        
           | theossuary wrote:
           | You missed what I think is the most important point in OP's
           | list: it does all of the above in a cloud agnostic way. If I
           | want to move clouds with TF I'm rewriting everything to fit
           | into a new cloud's paradigm. With Kubernetes there's a dozen
           | providers built in (storage, loadbalancing, networking, auto
           | scaling, etc.) or easy to pull in (certificates, KMS secrets,
           | DNS); and they make moving clouds (and more importantly)
           | running locally much easier.
           | 
           | Kubernetes is currently the best way to wrap up workloads in
           | a cloud agnostic way. I've written dozens of services for K8s
           | using different deployment mechanisms (Helm, Carvel's kapp,
           | Flux, Kustomize) and I can run them just as easily in my home
           | K8s cluster and in GCP. It's honestly incredible; I don't
           | know of any other cloud tech that lets me do that.
           | 
           | One thing I think a lot of people miss too, is how good the
           | concepts around Operators in Kubernetes are. It's hard to see
           | unless you've written some yourself, but the theory around
           | how operators work is very reminiscent of reactive coding in
           | front end frameworks (or robotics closed loop control, what
           | they were originally inspired by). When written well they're
           | _extremely_ resilient and incredibly powerful, and a lot of
           | that power comes from etcd and the established patterns they
           | 're written with.
           | 
           | I think Kubernetes is really painful sometimes, and huge
           | parts of it aren't great due to limitations of the language
           | it's written in; but I also think it's the best thing
           | available that I can run locally and in a cloud with a FOSS
           | license.
        
             | dijit wrote:
             | > it does all of the above in a cloud agnostic way.
             | 
             | I'll give you the benefit of the doubt here and say that
             | some of the basics are indeed cloud agnostic.
             | 
             | However, it's plain for many or most to see that outside of
             | extremely "toy" workloads you will be learning a specific
             | "flavour" of Kubernetes. EKS/GKE/AKS etc; They have, at
             | minimum, custom resource definitions to handle a lot of
             | things and at their worst have implementation specific
             | (hidden) details between equivalent things (persistent
             | volume claims on AWS vs GCP for example is quite
             | substantially different).
        
               | theossuary wrote:
               | For multicloud I usually think of my local K8s cluster
               | and GKE, it's been a few years since I touched EKS. I'd
               | love to hear your opinions on the substantive differences
               | you run into. When switching between clouds I'm usually
               | able to get away with only changing annotations on
               | resources, which is easy enough to put in a values.yml
               | file. I can't remember the last time I had to use a cloud
               | specific CRD. What CRD's do you have to reach for
               | commonly?
               | 
               | Thinking about it; the things I see as very cloud
               | agnostic: Horizontal pod autoscaling, Node autoscaling,
               | Layer 4 loadbalancing, Persistent volumes, Volume
               | snapshots, Certificate managment, External DNS, External
               | secrets, Ingress (when run in cluster, not through a
               | cloud service),
               | 
               | That ends up covering a huge swath of my usecases,
               | probably 80-90%. The main pain points I usually run into
               | are: IAM, Trying to use cloud layer 7 ingress (app
               | loadbalancers?)
               | 
               | I totally agree the underlying implementation if
               | resources can be very different, but that's not the fault
               | of Kubernetes; it's an issue with the implementation from
               | the operator of the K8s cluster. All abstractions are
               | going to be leaky at this level. But for PVCs I feel like
               | storageclasses capture that well, and can be used to pick
               | the level of performance you need per cloud; without
               | having to rewrite the common provision of block device.
        
             | elktown wrote:
             | Something feels very off and mantra-like with the
             | proportionality of how often cloud migration benefits are
             | being presented as something very important to how often
             | that actually happens in practice. Not to even mention that
             | it also assumes that simpler setups are automatically
             | harder to move around between clouds, or at least that
             | there are a significant difference in required effort.
        
               | theossuary wrote:
               | When I say it's easy to move between clouds, I'm not
               | referring to an org needing to pick up everything and
               | move from AWS to GCP. That is rare, and takes quite a bit
               | of rearchitecting no matter what.
               | 
               | When I say something is easy to move, I mean that when I
               | build on top of it, it's easy for users to run it in
               | their cloud of choice with changes in config. It also
               | means I have flexibility with where I choose to run
               | something after I've developed it. For example I develop
               | most stuff against minikube, then deploy it to GCP or a
               | local production k8s. If I was using Terraform I couldn't
               | do that.
        
           | the_duke wrote:
           | > Cloud and terraform gives you those
           | 
           | * your stack almost always ends up closely tied to one cloud
           | provider. I've done and seen cloud migrations. They are so
           | painful and costly that they often just aren't attempted.
           | 
           | * Cloud services make it much harder to run your stack
           | locally and on CI. There are solutions and workarounds, but
           | they are all painful. And you always end up tied to the
           | behaviour of the particular cloud services
           | 
           | > but you don't need one tool to do all of those things
           | 
           | To get the same experience, you do. And I don't see why you
           | would want multiple tools.
           | 
           | If anything, Kubernetes isn't nearly integrated and full-
           | featured enough, because it has too many pluggable parts
           | leading to too much choice and interfacing complexity. Like
           | pluggable ingress, pluggable state database, pluggable
           | networking stack, no simple "end to end app" solution (
           | KNative, etc), ... This overblown flexibility is what leads
           | to most of the pain and perceived complexity, IMO.
        
             | osigurdson wrote:
             | Perhaps a little on the tinfoil hat side of things, but it
             | isn't completely unreasonable to think that some of the FUD
             | could originate from cloud providers. Kubernetes is a
             | commoditizing force to some extent.
        
             | foverzar wrote:
             | > This overblown flexibility is what leads to most of the
             | pain and perceived complexity, IMO.
             | 
             | Huh, I guess you are spot on. My first experience with
             | kubernetes was k3s and I couldn't for a long time figure
             | out what's all the fuss is about and where is all that
             | complexity people talk so much about. But then I tried
             | vanilla kubernetes.
        
           | Too wrote:
           | Far from it. TF is mostly writing static content, maybe read
           | one or two things. It's missing the runtime aspect of it, so
           | are most cloud offerings, without excessive configuration.
           | Rollouts, health probes, logs, service discovery. Just to
           | name a few.
        
         | treflop wrote:
         | Aren't you just describing the basic features of an
         | orchestrator?
         | 
         | Docker Swarm has all those features for example.
         | 
         | (Not that I am recommending Docker Swarm.)
        
         | mad_vill wrote:
         | "Automatic recovery"
         | 
         | That's a joke.
        
         | yjftsjthsd-h wrote:
         | The thing is, "kubernetes" doesn't give you that either. You
         | want a LB? Here's a list of them that you can add to a cluster.
         | But actually pick multiple, because the one you picked in AWS
         | doesn't support bare metal.
        
           | hobofan wrote:
           | > because the one you picked in AWS doesn't support bare
           | metal
           | 
           | That's just because AWS's Kubernetes offering is laughably
           | bad.
           | 
           | There is huge difference in your experience whether you use
           | Kubernetes via GKE (Autopilot) or any other solution (at
           | least as long you don't have a dedicated infrastructure
           | team).
        
           | davkan wrote:
           | Bare metal kubernetes is certainly a lot less complete out of
           | the box when it comes to networking and storage but, people
           | can, and often should, use a managed k8s service which
           | provides all those things out of the box. And if you're on
           | bare metal once the infra team has abstracted away everything
           | into LoadBalancers and StorageClasses it's basically the same
           | experience for end users of the cluster.
        
             | eropple wrote:
             | If you're talking about OpenShift on rented commodity
             | compute, maybe. If you're talking about GKE/AKS/EKS or
             | similar, I disagree wholeheartedly; you're then paying
             | several multiples on the compute _and_ a little extra for
             | Kubernetes.
        
         | osigurdson wrote:
         | Naw, just use system-d, ha-proxy and bash scripts. That is much
         | "simpler" (for some definition of simple).
         | 
         | Kidding of course. If you need anything approximating
         | Kubernetes, use it. If you just need one machine maybe don't.
        
         | p-o wrote:
         | I like to think that most people who are upset at Kubernetes
         | don't hate on all of it. I think the configuration aspect
         | (YAML) and the very high level of abstraction is what get
         | people lost and as a result they get frustrated by it. I've
         | certainly fall in that category while trying to learn how to
         | operate multiple clusters using different topologies and cloud
         | providers.
         | 
         | But from an operational standpoint, when things are working, it
         | usually behaves very well until you hit some rough edge cases
         | (upgrades were much harder to achieve a couple of years back).
         | But rough edges exist everywhere, and when I get to a point
         | where K8s hits a problem, I would think that it would be much
         | worse if I wasn't using it.
        
           | koolba wrote:
           | > I like to think that most people who are upset at
           | Kubernetes don't hate on all of it. I think the configuration
           | aspect (YAML) ...
           | 
           | I question the competence of anyone who does not question
           | (and rag on) the prevalence of templating YAML.
           | 
           | > But rough edges exist everywhere, and when I get to a point
           | where K8s hits a problem, I would think that it would be much
           | worse if I wasn't using it.
           | 
           | Damn straight. It's only bad because everything else is
           | strictly worse.
        
             | dfee wrote:
             | Helm isn't YAML. It's a go template file that should
             | compile to YAML, masquerading as YAML with that extension.
             | 
             | So yaml formatters break it, humans struggle to generate
             | code with proper indents, and it's an insane mess. It's
             | horrendous.
        
           | garrettgrimsley wrote:
           | >I think the configuration aspect (YAML)
           | 
           | What are the reasons to _not_ use JSON rather than YAML? From
           | my admittedly-shallow experience with k8s, I have yet to
           | encounter a situation in which I couldn 't use JSON. Does
           | this issue only pop up once you start using Helm charts?
        
             | kbar13 wrote:
             | at the surface level yaml is a lot easier to read and write
             | for a human. less "s. but once you start using it for
             | complex configuration it becomes unwieldy. but at that
             | point json is also not better than yaml.
             | 
             | after using cdk i think that writing typescript to define
             | infra is a significantly better experience
        
             | smokel wrote:
             | One of the most annoying limitations of JSON is that it
             | does not allow for comments.
        
         | mgaunard wrote:
         | ever heard of nomad?
        
         | politelemon wrote:
         | > and thus doesn't tie you into a specific cloud provider
         | 
         | It ties you to k8s instead, and it ties you to a few company
         | wide heroes, and that is not a 'benefit' as it's being touted
         | here.
         | 
         | Being tied to a cloud is not a horrible situation either. I
         | suspect "being tied to a cloud" is a boogeyman that k8s
         | proponents would like to spread, but just like with k8s, with
         | the right choices, cloud integration is a huge benefit.
        
           | kortilla wrote:
           | Being tied to the cloud is fine if you don't care about
           | money. Eventually companies do
        
         | matrss wrote:
         | > * declarative define your infrastructure
         | 
         | > [...]
         | 
         | > * has a huge amount of pre-packaged software available (helm
         | charts)
         | 
         | > * and most importantly: allows you to stand up mostly the
         | same infrastructure in the cloud, on your own servers (k3s),
         | and locally (KIND), and thus doesn't tie you into a specific
         | cloud provider
         | 
         | NixOS. I have no clue about kubernetes, but I think NixOS even
         | goes much deeper in these points (e.g. kubernetes is at the
         | "application layer" and doesn't concern itself with
         | declaratively managing the OS underneath, if I understand
         | right). The other points seem much more situational, and if
         | needed kubernetes might well be worth it. For something that
         | could be a single server running a handful of services, NixOS
         | is amazing.
        
           | the_duke wrote:
           | I use NixOS, both on servers and on my machines, but it
           | solves a completely orthogonal problem.
           | 
           | Kubernetes manages a cluster, NixOS manages a single machine.
        
             | matrss wrote:
             | I wouldn't say completely orthogonal. E.g. the points I've
             | cited are overlap between the two, and ultimately both are
             | meant to host some kind of services. But yes NixOS by
             | itself manages a single machine (although combined with
             | terraform it can become very convenient to also manage a
             | fleet of NixOS machines). Kubernetes manages services on a
             | cluster, but given how powerful a single machine can be I
             | do think that many of those clusters could also just be one
             | beefy server (and maybe a second one with some fail over
             | mechanism, if needed).
             | 
             | If the cluster is indeed necessary though, I think NixOS
             | can be a great base to stand up a Kubernetes cluster on top
             | of.
        
             | pxc wrote:
             | There are lots of native NixOS tools for managing whole
             | clusters (NixOps, Disnix, Colmena, deploy-rs, Morph, krops,
             | Bento, ...). Lots of people deploy whole fleets of NixOS
             | servers or clusters for specific applications without
             | resorting to Kubernetes. (Kube integrations are also
             | popular, though.) Some of those solutions are very old,
             | too.
             | 
             | Disnix has been around for a long time, probably since
             | before you ever heard of NixOS.
        
         | Lucasoato wrote:
         | There is no easy solution to manage services and
         | infrastructure: people who hate kubernetes complexity often
         | underestimate the efforts of developing on your own all the
         | features that k8s provides.
         | 
         | At the same time, people who suggest everyone to use kubernetes
         | independently on the company maturity often forget how easy it
         | is to run a service on a simple virtual machine.
         | 
         | In the multidimensional space that contains every software
         | project, there is no hyperplane that separates when it's worth
         | to use kubernetes or not. It depends on the company, the
         | employees, the culture, the business.
         | 
         | Of course there are general best practices, like for example if
         | you're just getting started with kubernetes, and already in the
         | cloud, using a managed k8s service from your cloud provider
         | could be a good idea. But again, even for this you're going to
         | find opposing views online.
        
         | g9yuayon wrote:
         | When I reflect what Netflix did back in 2010ish on AWS:
         | 
         | * The declarative infra is EC2/ASG configurations plus Jenkins
         | configurations * Client-side load balancing * ASG for
         | autoscaling and recovery * Amazing observability with a home-
         | grown monitoring system by 4 amazing engineers
         | 
         | Most of all, each of the above item was built and run by one or
         | two people, except the observability stack with four. Oh,
         | standing up a new region was truly a non-event. It just
         | happened and as a member of the cloud platform team I couldn't
         | even recall what I did for the project. It's not that Netflix's
         | infra was better or worse than using k8s. I'm just amazed how
         | happy I have been with an infra built more than 10 years ago,
         | and how simple it was for end users. In that regard, I often
         | question myself what I have missed in the whole movement of k8s
         | platform engineering, other than people do need a robust
         | solution to orchestrate containers.
        
           | p_l wrote:
           | A big chunk was companies that don't have netflix-money
           | having to bin-pack compute for efficiency.
           | 
           | Or at least that's how I got into k8s, because it allowed me
           | to ship for 1/10th the price of my competitor.
        
         | throwawaaarrgh wrote:
         | Yes... And? We don't have to be happy with our lot if it sucks.
        
         | kiitos wrote:
         | There are an enormous number of tools that meet these
         | requirements, most obviously Nomad. But really any competently-
         | designed system, defined in terms of any cloud-agnostic
         | provisioning system (Chef, Puppet, Salt, Ansible, home-grown
         | scripts) would qualify.
         | 
         | And, for the record, observability is something very much
         | unrelated to kubectl or k9s.
        
       | liampulles wrote:
       | Good article. I used to be a k8s zealot (both CKAD and CKA
       | certified) but have come to think that the good parts of k8s are
       | the bare essentials (deployments, services, configmaps) and the
       | rest should be left for exceptional circumstances.
       | 
       | Our team is happy to write raw YAML and use kustomize, because we
       | prefer keeping the config plain and obvious, but we otherwise
       | pretty much follow everything here.
        
       | izietto wrote:
       | > Hand-writing YAML. YAML has enough foot-guns that I avoid it as
       | much as possible. Instead, our Kubernetes resource definitions
       | are created from TypeScript with Pulumi.
       | 
       | LOL so, rather than linting YAML, bring in a whole programming
       | language runtime plus third party library, adding yet another
       | vendor lock, having to maintain versions, project compiling,
       | moving away from K8S, adding mental overhead...
        
         | p_l wrote:
         | Managing structures in programming language is easier than
         | dealing with finicky _optional_ serialization format.
         | 
         | I have drastically reduced the amount of errors, mistakes,
         | bugs, plain old wtf-induced hair pulling, by just mandating
         | avoidance of YAML (and Helm) and using Jsonnet. Sure, there was
         | some up-front work to write library code, but afterwards? I had
         | people introduced to JSonnet with example deployment on one
         | day, and shipping production-ready deployment for another app
         | the next day.
         | 
         | Something we couldn't get with yaml.
        
         | paulgb wrote:
         | We use Pulumi for IAC of non-k8s cloud resources too, so it
         | doesn't introduce anything extra. In reality all but the
         | smallest Kubernetes services will want _something_ other than
         | hand-written YAML: Helm-style templating, HCL, etc. TypeScript
         | gives us type safety, and _composable_ type safety. E.g. we
         | have a function that encapsulates our best practices for
         | speccing a deployment, and we get type safety for free across
         | that function call boundary. Can 't do that with YAML.
        
         | Aurornis wrote:
         | Most devops disaster stories I've heard lately are the result
         | of endless addition of new tools. People join the company, see
         | a problem, and then add another layer of tooling to address it,
         | introducing new problems in the process. Then they leave the
         | company, new people join, see the problems from that new
         | tooling, add yet another layer of tooling, continuing the
         | cycle.
         | 
         | I was talking to someone from a local startup a couple weeks
         | ago who was trying to explain their devops stack. The number of
         | different tools and platforms they were using was in the range
         | of 50 different things, and they were asking for advice about
         | how to integrate yet another thing to solve yet another self-
         | inflicted problem.
         | 
         | It was as though they forgot what the goal was and started
         | trying to collect as much experience with as many different
         | tools as they could.
        
           | izietto wrote:
           | Would you believe that there is a company that is using cdk8s
           | to handle its K8S configuration, and that such
           | "infrastructure as code" repo ("infrastructure as code", this
           | is the current hype) counts 76k YAML LoCs and 24k TypeScript
           | LoCs to manage a bunch of Rails apps together with their
           | related services? Like, some of such apps have less LoC.
        
         | bananapub wrote:
         | yaml is objectively a bad language for complicated
         | configurations, and once you add string formatting on top of
         | it, you now have a complicated and shitty system, yay.
         | 
         | hopefully jsonnet or that apple thing will get more traction
         | and popularlity.
        
       | nusl wrote:
       | k8s is really about you and if it makes sense for your use case.
       | It's not universally bad or universally good, and I don't feel
       | that there is a minimum team size required for it to make sense.
       | 
       | Managing k8s, for me at least, is a lot easier than juggling
       | multiple servers with potentially different hardware, software,
       | or whatever else. It's rare that businesses will have machines
       | that are all identical. Trying to keep adding machines to a pool
       | that you manage manually and keep them running can be very messy
       | and get out of control if you're not on top of it.
       | 
       | k8s can also get out of control though it's also easier to reason
       | about and understand in this context. Eg you have eight machines
       | of varying specs but all they really have installed is what's
       | required to run k8s, so you haven't got as much divergence there.
       | You can then use k8s to schedule work across them or ask
       | questions about the machines.
        
       | liveoneggs wrote:
       | We've found kubernetes to be surprisingly fragile, buggy,
       | inflexible, and strictly imperative.
       | 
       | People make big claims but then it's not declarative enough to
       | look up a resource or build a dependency tree and then your
       | context deadline is exceeded.
        
       | elktown wrote:
       | I think an underestimated issue with k8s (et al) is on a cultural
       | level. Once you let in complex generic things, it doesn't stop
       | there. A chain reaction has started, and before you know it,
       | you've got all kinds of components reinforcing each other, that
       | are suddenly required due to some real, or just perceived,
       | problems that are only there in the first place because of a
       | previous step in the chain reaction.
       | 
       | I remember back when the Cloud first started getting a foothold
       | that what people was drawn to was that it would enable _reducing_
       | complexity of managing the most frustrating things like the load-
       | balancer and the database, albeit at a price of course, but it
       | was still worth it.
       | 
       | Stateless app servers however, was certainly not a large
       | maintenance problem. But somehow we've managed to squeeze in
       | things like k8s in the there anyway, we just needed to evangelize
       | microservices to create a problem that didn't exist before. Now
       | that this is part of the "culture" it's hard to even get beyond
       | hand-wavy rationalizations that microservices is a must,
       | assumingly because it's the initial spark that triggered the
       | whole chain reaction of complexity.
        
         | jupp0r wrote:
         | Cloud providers automate things like lease renewals, dealing
         | with customs and part time labor contract compliance disputes
         | for that datacenter in that Asian country that you don't know
         | the language of.
         | 
         | I'm constantly fascinated how people handwaivingly
         | underestimate the cost and headaches of actually running on
         | prem global infrastructure.
        
           | robertlagrant wrote:
           | > I'm constantly fascinated
           | 
           | To call a halt to your constant fascination: they don't all
           | have that problem. They still get the complexity of cloudy
           | things regardless when they use one.
        
             | jupp0r wrote:
             | They also get some of the complexity of cloudy things when
             | they run their own datacenter. In the end you find stuff
             | like OpenStack which becomes its own nightmare universe.
        
               | eropple wrote:
               | YMMV, but more and more I see people moving to k8s to get
               | _away_ from OpenStack, to varying but generally positive
               | success.
        
           | ricardobeat wrote:
           | There are at least five shades in between on-prem and a
           | managed k8s cloud.
        
             | d0mine wrote:
             | Could you mention three?
        
               | pmalynin wrote:
               | colo racks, rented dedicated servers, ec2 / managed vm
               | offerings?
        
           | hipadev23 wrote:
           | And colo providers solved those hurdles for decades. Let's
           | not act like the only options are cloud, or build your own
           | datacenter.
        
             | bradfox2 wrote:
             | My startup hosts our own training servers in a colo-ed
             | space 10 min from our office. Took less than 40 hours to
             | get moved in, with most of the time tinkering with
             | fortigate network appliance settings.
             | 
             | Cloudflare zero trust for free is a huge timesaver
        
           | kortilla wrote:
           | I'm constantly fascinated by people who think they need on
           | prem global infrastructure when the vast majority of
           | applications both have very loose latency requirements
           | (multiple seconds) or no users outside of the home country.
           | 
           | Two datacenters on opposite sides of the US from different
           | providers will get you more uptime than a cloud provider and
           | is super simple.
        
             | jupp0r wrote:
             | While some of the complexity goes away when it's on prem in
             | to parts of the US, having to order actual hardware,
             | putting it into racks, hiring, training, retaining the
             | people there to debug actual hardware issues when they
             | arise, dealing with HVAC concerns, etc is a lot of
             | complexity that's probably completely outside of your core
             | business expertise but that you'll have to spend mental
             | cycles on when actually operating your own data center.
             | 
             | It's totally worth it for some companies to do that, but
             | you need to have some serious size to be concerned with
             | spending your efforts on lowering your AWS bill by
             | introducing details like that into your own organization
             | when you could alternatively spend those dollars to make
             | your core business run better. Usually your efforts are
             | better spent on the latter unless you are Netflix or Amazon
             | or Google.
        
               | protomikron wrote:
               | Why is it always public cloud (aws, gcp, azure) vs.
               | "bring your own hardware and deploy it in racks".
               | 
               | There are multiple providers that offer VPS and
               | ingress/egress for a fraction of the cost of public
               | clouds and they mostly have good uptime.
        
               | pclmulqdq wrote:
               | I recently rented a rack with a telecom and put some of
               | my own hardware in it (it's custom weird stuff with
               | hardware accelerators and all the FIPS 140 level 4
               | requirements), but even the telecom provider was offering
               | a managed VPS product when I got on the phone with them.
               | 
               | The uptime in these DCs is very good (certainly better
               | than AWS's us-east-1), and you get a very good price with
               | tons of bandwidth. Most datacenter and colo providers can
               | do this now.
               | 
               | I think people believe that "on prem" means actually
               | racking the servers in your closet, but you can get
               | datacenter space with fantastic power, cooling, and
               | security almost anywhere these days.
        
               | Zircom wrote:
               | >I think people believe that "on prem" means actually
               | racking the servers in your closet, but you can get
               | datacenter space with fantastic power, cooling, and
               | security almost anywhere these days.
               | 
               | That's because that is what on prem means. What you're
               | describing is colocating.
        
               | pclmulqdq wrote:
               | When clouds define "on-prem" in opposition to their
               | services (for sales purposes), colo facilities are lumped
               | into that bucket. They're not exactly wrong, except a
               | rack at a colo is an extension of your premises with a
               | landlord who understands your needs.
        
               | jupp0r wrote:
               | It's a spectrum:
               | 
               | On top is AWS lambda or something where you are
               | completely removed from the actual hardware that's
               | running your code.
               | 
               | At the bottom is a free acre of land where you start
               | construction and talk to utilities to get electricity and
               | water there. You build your own data center, hire people
               | to run and extend it, etc.
               | 
               | There is tons of space in between where compromises are
               | made by either paying a provider to do something for you
               | or doing it yourself. Is somebody from the datacenter
               | where you rented a rack or two going in and pressing a
               | reset button after you called them a form of cloud
               | automation? How about you renting a root VM at Hetzner?
               | Is that VM on prem? People who paint these tradeoffs in a
               | black and white matter and don't acknowledge that there
               | are different choices for different companies and
               | scenarios are not doing the discussion a service.
               | 
               | On the other hand, somebody who built their business on
               | AppEngine or CloudFlare workes could look at that other
               | company who is renting a pet pool of EC2 instances and
               | ask if they are even in the cloud or if they are just
               | simulating on-prem.
        
               | cangeroo wrote:
               | Because their arguments are disingenuous.
               | 
               | It reads like propaganda sponsored by the clouds.
               | Scaremongering.
               | 
               | Clouds are incredibly lucrative.
               | 
               | But don't worry. You can make the prices more reasonable
               | by making a 3-year commitment to run old outdated
               | hardware.
        
             | jupp0r wrote:
             | There are tons of examples where low latency is good for
             | business, even small businesses. I'm sure you've seen the
             | studies from Amazon that every 100ms of page load latency
             | is costing them 1% of revenue, etc. Also everything
             | communication is very latency sensitive.
             | 
             | Of course there are plenty of scenarios where latency does
             | not matter at all.
        
               | groestl wrote:
               | So you can trade off 300ms of additional roundtrip time
               | (on anything non-CDNable) at a cost of 3% revenue and
               | reduce your infrastructure complexity a lot
        
               | throwaway22032 wrote:
               | Not every business is based on impulse buys. Amazon is a
               | pretty biased sample there.
        
               | jupp0r wrote:
               | Is this agreeing with "Of course there are plenty of
               | scenarios where latency does not matter at all." or are
               | you trying to make a point?
        
               | threeseed wrote:
               | That latency is correlated with revenue is not exclusive
               | to Amazon.
               | 
               | And many people who aren't impulse buying will not stick
               | around on slow sites.
        
               | pdimitar wrote:
               | Disagreed, once we're not talking a worldwide shop for
               | non-critical buys like Amazon the picture changes
               | dramatically. Many people on local markets have no choice
               | and will stick around no matter how slow the service is.
               | 
               | Evidence: my wife buying our groceries for delivery at
               | home. We have 4-5 choices in our city. All their websites
               | are slow as hell, and I mean adding an item to a cart
               | takes good 5-10 seconds. Search takes 20+ seconds.
               | 
               | She curses at them every time yet there's nothing we can
               | do. The alternative is both of us to travel by foot 20
               | minutes to the local mall and wait on queues. 2-3 times a
               | week. She figured the slow websites are the lesser evil.
        
             | threeseed wrote:
             | > Vast majority of applications have no users outside home
             | country
             | 
             | Any evidence to back this up. Because on the surface seems
             | like a ridiculous statement.
        
           | elktown wrote:
           | Not sure how how you could jump all the way to running your
           | own Asian datacenter from my post. A bit amusing though :). I
           | even wrote that it's worth running the LB/DB in the Cloud?
        
             | jupp0r wrote:
             | Oh it was more of an addition to your point about "reducing
             | complexity of managing the most frustrating things like the
             | load-balancer and the database, albeit at a price of
             | course". There is a whole mountain of complexity that most
             | software engineers never think about when they dream about
             | going back to the good old on prem days.
        
               | elktown wrote:
               | Alright, just feels like taking a bit too far into the
               | exceptions. Even back then only large companies would
               | consider that. Renting servers, renting a server rack
               | (co-location), or even just a in-office server rack for
               | what would be a startup today.
        
       | aeturnum wrote:
       | > _if a human is ever waiting for a pod to start, Kubernetes is
       | the wrong choice._
       | 
       | As someone who is always working "under" a particular set of
       | infrastructure choices I want people who write this kind of
       | article to understand something: the people who dislike
       | particular infrastructure systems are by-in-large those who are
       | working under sub-optimal uses of them. No one who has the space
       | to think about "if their infrastructure choices will create an
       | effect" in the future hates any infrastructure system. Their life
       | is good. They can choose and most everyone agrees that any system
       | can be done well.
       | 
       | The haters come from being in situations where a system has not
       | been done well - where for whatever combination of reasons they
       | are stuck using a system that's the wrong mix of complex /
       | monitorable / fragile / etc. It's true enough that, if that
       | system had been built with more attention to its needs, that
       | people would not hate it - but that's just not how people come to
       | hate k8s (or any other tool).
        
       | shrubble wrote:
       | If Kubernetes is the answer ... you very likely asked the wrong
       | questions.
       | 
       | Reading about JamSocket and what it does, it seems that it
       | essentially lets you run Docker instances inside the Jamsocket
       | infrastructure.
       | 
       | Why not just take Caddy in a clustered configuration, add some
       | modules to control Docker startup/shutdown and reduce your
       | services usage by 50%? As one example.
        
         | paulgb wrote:
         | I'm not sure what you mean by that reducing service usage.
         | 
         | The earliest version of the product really was just nginx
         | behind some containers, but we outgrew the functionality of
         | existing proxies pretty quickly. See e.g. keys
         | (https://plane.dev/developing/keys) which would not be possible
         | with clustered Caddy alone.
        
           | shrubble wrote:
           | My understanding was that K8s itself has overhead, which
           | ultimately has to be paid for, even if using a managed
           | service (it might be included in the cost of what you pay, of
           | course).
           | 
           | I did add the caveat of "with modules" and the idea of
           | sharing values around to different servers would be easy to
           | do, since you have Postgres around as a database to hold
           | those values/statuses.
        
             | paulgb wrote:
             | HTTP proxying is not much of our codebase. I wouldn't want
             | to shoehorn what we're doing into being a module of a proxy
             | service just to avoid writing that part. That proxy doesn't
             | run on Kubernetes currently anyway, so it wouldn't change
             | anything we currently use Kubernetes for.
        
       | siliconc0w wrote:
       | IMO the big win with Kubernetes is helm or operators. If you're
       | going to pay the complexity costs you might as well get the wins
       | which is essentially a huge 'app-store' of popular infrastructure
       | components and an entirely programmatic way to manage your
       | operations (deployments, updates, fail-overs, backups, etc).
       | 
       | For example if you want to setup something complex like Ceph -
       | Rook is a really nice way to do that. It's a very leaky
       | abstraction so you aren't hiding all the complexity of Ceph but
       | the declarative interface is generally a much nicer way to manage
       | Ceph than a boatload of ansible scripts or generally what we had
       | before. The key to understand is that helm or operators don't
       | magically make infrastructure a managed 'turn-key' appliances,
       | you do generally need to understand how the thing works.
        
       | jakupovic wrote:
       | This article talks about using k8s but trying not to use it as
       | much as possible. First example being operators, this is the
       | underlying mechanism that makes k8s possible. To me taking a
       | stance not to use operators but use k8s is less than optimal, or
       | plainly stupid. The whole stack is built on operators, which you
       | inherently trust as you use k8s, but choosing not to use them.
       | Sorry but this is hard to read.
       | 
       | The only thing I learned is about Caddy as a cert-manager
       | replacement, even though I have used, extended and been pretty
       | happy with cert-manager. The rest is hard to read ;(.
        
       | hintymad wrote:
       | When I checked out an operator repo for some stateful services,
       | say, elasticsearch, the repo most likely would contain 10s of
       | thousands of lines of YAML and 10s of thousands lines of Go code.
       | Is this due to essential complexity of implementing auto-pilot of
       | a complex service, or is it due to massive integration with k8s'
       | operators framework?
        
       | patmcc wrote:
       | Missed opportunity to title this the h8r's guide to k8s.
        
       | kube-system wrote:
       | The people who dislike kubernetes are, in my experience, people
       | who don't need to do all of the things kubernetes does. If you
       | just need to run an application, it's not what you want.
        
       | erulabs wrote:
       | I suppose I'm the guy pushing k8s on midsized companies. If there
       | have been unhappy engineers along the way - they've by the vast
       | majority stayed quiet and lied about being happier on surveys.
       | 
       | Yes, k8s is complex. The tool matches the problem: complex. But
       | having a standard is so much better than having a somewhat
       | simpler undocumented chaos. "Kubectl explain X" is a thousand
       | times better than even AWS documentation, which in turn was a
       | game changer compared to that-one-whiteboard-above-Dave's-desk.
       | Standards are tricky, but worth the effort.
       | 
       | Personally I'm also very judicious with operators and CRDs - both
       | can be somewhat hidden to beginners. However, the operator
       | pattern is wonderful. Another amazing feature is ultra simple
       | leader election - genuinely difficult outside of k8s, a 5 minute
       | task inside. I agree with Paul's take here tho of at least being
       | extremely careful about which operators you introduce.
       | 
       | At any rate, yes k8s is more complex than your bash deploy
       | script, of course it is. It's also much more capable and works
       | the same way as it did at all your developers previous jobs.
       | Velocity is the name of the game!
        
         | paulgb wrote:
         | Good point about k8s vs. AWS docs -- a lot of the time people
         | say "just use ECS" or the AWS service of the day, and it will
         | invariably be more confusing to me and more vendor-tied than
         | just doing the thing in k8s.
        
           | p_l wrote:
           | And then if you're unlucky you might hit one of the areas
           | where the AWS documentation has a "teaser" about some
           | functionality that is critical for your project, you spend
           | months looking for the rest of the documentation when initial
           | foray doesn't work, and the highly paid AWS-internal
           | consultants disappear into thin air when asked about the
           | features.
           | 
           | So nearly a year later you end up writing the whole feature
           | from scratch yourself.
        
         | pclmulqdq wrote:
         | I have to say that I don't believe the problem is all that
         | complex unless you make it hard. But on the flip side, if
         | you're a competent Kubernetes person, the correct Kubernetes
         | config is also not that complex.
         | 
         | I think a lot of the reaction here is a result of the age-old
         | issues of "management is pushing software on me that I don't
         | want" and people adopting it without knowing how to use it
         | because it's considered a "best practice."
         | 
         | In other words, the reaction you probably have to an Oracle
         | database is the same reaction that others have to Kubernetes
         | (although Oracle databases are objectively crappy).
        
       | throwitaway222 wrote:
       | Article makes a good point.
       | 
       | Allow k8s disallow any service meshes.
        
       | bedobi wrote:
       | the "best" infra i ever had was a gig where we
       | 
       | * built a jar (it was a jvm shop)
       | 
       | * put it on a docker image
       | 
       | * put that on an ami
       | 
       | * then had a regular aws load balancer that just combined the ami
       | with the correctly specced (for each service) ec2 instances to
       | cope with load
       | 
       | it was SIMPLE + it meant we could super easily spin up the
       | previous version ami + ec2s in case of any issues on deploys (in
       | fact, when deploying, we could keep the previous ones running and
       | just repoint the load balancer to them)
       | 
       | ps putting the jar on a docker image was arguably unnecessary, we
       | did it mostly to avoid "it works on my machine" style problems
        
       ___________________________________________________________________
       (page generated 2024-03-03 23:01 UTC)