[HN Gopher] How we migrated onto K8s in less than 12 months
       ___________________________________________________________________
        
       How we migrated onto K8s in less than 12 months
        
       Author : ianvonseggern
       Score  : 97 points
       Date   : 2024-08-08 16:07 UTC (6 hours ago)
        
 (HTM) web link (www.figma.com)
 (TXT) w3m dump (www.figma.com)
        
       | jb1991 wrote:
       | Can anyone advise what is the most common language used in
       | enterprise settings for interfacing with K8s?
        
         | JohnMakin wrote:
         | IME almost exclusively golang.
        
           | roshbhatia wrote:
           | ++, most controllers are written in go, but there's plenty of
           | client libraries for other languages.
           | 
           | A common pattern you'll see though is skipping writing any
           | sort of code and instead using a higher level dsl-ish
           | configuration usually via yaml, using tools like Kyverno.
        
         | gadflyinyoureye wrote:
         | Depends on what you mean. Helm will control a lot. You can make
         | the yaml file in any language. Also you can admin it from
         | command line tools. So again any language but often zsh or
         | bash.
        
         | cortesoft wrote:
         | A lot of yaml
        
           | yen223 wrote:
           | The kind of yaml that has a lot of {{ }} in them that breaks
           | your syntax highlighter.
        
         | mplewis wrote:
         | I have seen more Terraform than anything else.
        
         | akdor1154 wrote:
         | On the platform consumer side (app infra description) - well
         | schema'd yaml, potentially orchestrated by helm ("templates to
         | hellish extremes") or kustomize ("no templates, this is the
         | hill we will die on").
         | 
         | On the platform integration/hook side (app code doing
         | specialised platform-specific integration stuff, extensions to
         | k8s itself), golang is the lingua franca but bindings for many
         | languages are around and good.
        
       | JohnMakin wrote:
       | I like how this article clearly and articulately states the
       | reasons it gains to benefit from Kubernetes. Many make the jump
       | without knowing what they even stand to gain, or if they need to
       | in the first place - the reasons given here are good.
        
         | nailer wrote:
         | I was about to write the opposite - the logic is poor and
         | circular - but multiple other commenters have already raised
         | this: https://news.ycombinator.com/item?id=41194506
         | https://news.ycombinator.com/item?id=41194420
        
           | JohnMakin wrote:
           | I don't really see those rebuttals as all that valid. The
           | reasons given in this article are completely valid, from my
           | perspective of someone who's worked heavily with
           | Kubernetes/ECS.
           | 
           | Helm, for instance, is a great time saver for installing
           | software. Often software will support nothing but helm. Ease
           | of deployment is a good consideration. Their points on
           | networking are absolutely spot on. The scaling considerations
           | are spot on. Killing/isolating unhealthy containers is
           | completely valid. I could go on a lot more, but I don't see a
           | single point listed as invalid.
        
           | samcat116 wrote:
           | They're quite specific in that they mention that teams would
           | like to make use of existing helm charts for other software
           | products. Telling them to build and maintain definitions for
           | those services from scratch is added work in their mind.
        
       | dijksterhuis wrote:
       | > When applied, Terraform code would spin up a template of what
       | the service should look like by creating an ECS task set with
       | zero instances. Then, the developer would need to deploy the
       | service and clone this template task set [and do a bunch of
       | manual things]
       | 
       | > This meant that something as simple as adding an environment
       | variable required writing and applying Terraform, then running a
       | deploy
       | 
       | This sounds less like a problem with ECS and more like an
       | overcomplication in how they were using terraform + ECS to manage
       | their deployments.
       | 
       | I get the generating templates part for verification prior to
       | live deploys. But this seems... dunno.
        
         | wfleming wrote:
         | Very much agree. I have built infra on ECS with terraform at
         | two companies now, and we have zero manual steps for actions
         | like this, beyond "add the env var to a terraform file, merge
         | it and let CI deploy". The majority of config changes we would
         | make are that process.
        
           | dijksterhuis wrote:
           | Yeah.... thinking about it a bit more i just don't see why
           | they didn't set up their CI to deploy a short lived
           | environment on a push to a feature branch.
           | 
           | To me that seems like the simpler solution.
        
         | roshbhatia wrote:
         | I'm with you here -- ECS deploys are pretty painless and
         | uncomplicated, but I can picture a few scenarios where this
         | ends up being necessary, for ex; if they have a lot of services
         | deployed on ECS and it ends up bloating the size of the
         | Terraform state. That'd slow down plans and applies
         | significantly, which makes sharding the Terraform state by
         | literally cloning the configuration based on a template a lot
         | safer.
        
           | freedomben wrote:
           | > ECS deploys are pretty painless and uncomplicated
           | 
           | Unfortunately in my experience, this is true until it isn't.
           | Once it isn't true, it can quickly become a painful blackbox
           | debugging exercise. If your org is big enough to have
           | dedicated AWS support then they can often get help from
           | engineers, but if you aren't then life can get really
           | complicated.
           | 
           | Still not a bad choice for most apps though, especially if
           | it's just a run-of-the-mill HTTP-based app
        
         | ianvonseggern wrote:
         | Hey, author here, I totally agree that this is not a
         | fundamental limitation of ECS and we could have iterated on
         | this setup and made something better. I intentionally listed
         | this under work we decided to scope into the migration process,
         | and not under the fundamental reasons we undertook the
         | migration because of that distinction.
        
       | Aeolun wrote:
       | Honestly, I find the reasons they name for using Kubernetes
       | flimsy as hell.
       | 
       | "ECS doesn't support helm charts!"
       | 
       | No shit sherlock, that's a thing literally built on Kubernetes.
       | It's like a government RFP that can only be fulfilled by a single
       | client.
        
         | Carrok wrote:
         | > We also encountered many smaller paper cuts, like attempting
         | to gracefully terminate a single poorly behaving EC2 machine
         | when running ECS on EC2. This is easy on Amazon's Elastic
         | Kubernetes Service (EKS), which allows you to simply cordon off
         | the bad node and let the API server move the pods off to
         | another machine while respecting their shutdown routines.
         | 
         | I dunno, that seems like a very good reason to me.
        
           | watermelon0 wrote:
           | I assume that ECS Fargate would solve this, because one
           | misbehaving ECS task would not affect others, and stopping it
           | should still respect the shutdown routines.
        
             | ko_pivot wrote:
             | Fargate is very expensive at scale. Great for small or
             | bursty workloads, but when you're at Figma scale, you
             | almost always go EC2 for cost-effectiveness.
        
         | aranelsurion wrote:
         | To be fair there are many benefits of running on the platform
         | that has the most mindshare.
         | 
         | Unless they are in this space competing against k8s, it's
         | reasonable for them if they want to use Helm charts, to move
         | where they can.
         | 
         | Also, Helm doesn't work with ECS, so doesn't <50 other tools
         | and tech from the CNCF map>.
        
         | cwiggs wrote:
         | I think what they should have said is "there isn't a tool like
         | Helm for ECS" If you want to deploy a full prometheus, grafana,
         | alertmanager, etc stack on ECS, good luck with that, no one has
         | written the task definition for you to consume and override
         | values.
         | 
         | With k8s you can easily deploy a helm chart that will deploy
         | lots of things that all work together fairly easily.
        
         | JohnMakin wrote:
         | It's almost like people factor in a piece of software's tooling
         | environment before they use the software - wild.
        
       | vouwfietsman wrote:
       | Maybe its normal for a company this size, but I have a hard time
       | following much of the decision making around these gigantic
       | migrations or technology efforts because the decisions don't seem
       | to come from any user or company need. There was a similar post
       | from Figma earlier, I think around databases, that left me
       | feeling the same.
       | 
       | For instance: they want to go to k8s because they want to use
       | etcd/helm, which they can't on ECS? Why do you want to use
       | etcd/helm? Is it really this important? Is there really no other
       | way to achieve the goals of the company than exactly like that?
       | 
       | When a decision is founded on a desire of the user, its easy to
       | validate that downstream decisions make sense. When a decision is
       | founded on a technological desire, downstream decisions may make
       | sense in the context of the technical desire, but do they make
       | sense in the context of the user, still?
       | 
       | Either I don't understand organizations of this scale, or it is
       | fundamentally difficult for organizations of this scale to
       | identify and reason about valuable work.
        
         | WaxProlix wrote:
         | People move to K8s (specifically from ECS) so that they can use
         | cloud provider agnostic tooling and products. I suspect a lot
         | of larger company K8s migrations are fueled by a desire to be
         | multicloud or hybrid on-prem, mitigate cost, availability, and
         | lock-in risk.
        
           | timbotron wrote:
           | there's a pretty direct translation from ECS task definition
           | to docker-compose file
        
           | zug_zug wrote:
           | I've heard all of these lip-service justifications before,
           | but I've yet to see anybody actually publish data showing how
           | they saved any money. Would love to be proven wrong by some
           | hard data, but something tells me I won't be.
        
             | nailer wrote:
             | Likewise. I'm not sure Kubernetes famous complexity (and
             | the resulting staff requirements) are worth it to
             | preemptively avoid vendor lockin, and wouldn't be solved
             | more efficiently by migrating to another cloud provider's
             | native tools if the need arises.
        
             | bryanlarsen wrote:
             | I'm confident Figma isn't paying published rates for AWS.
             | The transition might have helped them in their rate
             | negotiations with AWS, or it might not have. Hard data on
             | the money saved would be difficult to attribute.
        
             | jgalt212 wrote:
             | True but if AWS knows your lock-in is less locked-in, I'd
             | bet they'd more flexible when contracts are up for renewal.
             | I mean it's possible the blog post's primary purpose was a
             | shot across bow to their AWS account manager.
        
               | logifail wrote:
               | > it's possible the blog post's primary purpose was a
               | shot across bow to their AWS account manager
               | 
               | Isn't it slightly depressing that this explanation is
               | fairly (the most?) plausible?
        
               | jiggawatts wrote:
               | Our state department of education is one of the biggest
               | networks in the world with about half a million devices.
               | They would occasionally publicly announce a migration to
               | Linux.
               | 
               | This was just a Microsoft licensing negotiation tactic.
               | Before he was CEO, Ballmer flew here to negotiate one of
               | the contracts. The discounts were _epic_.
        
             | tengbretson wrote:
             | There are large swaths of the b2b space where (for whatever
             | reason) being in the same cloud is a hard business
             | requirement.
        
             | vundercind wrote:
             | The vast majority of corporate decisions are never
             | justified by useful data analysis, before or after the
             | fact.
             | 
             | Many are so-analyzed, but usually in ways that anyone who
             | paid attention in high school science or stats classes can
             | tell are so flawed that they're meaningless.
             | 
             | We can't even measure manager efficacy to any useful
             | degree, in nearly all cases. We can come up with numbers,
             | but they don't mean anything. Good luck with anything more
             | complex.
             | 
             | Very small organizations can probably manage to isolate
             | enough variables to know how good or bad some move was in
             | hindsight, if they try and are competent at it (... if).
             | Sometimes an effect is so huge for a large org that it
             | overwhelms confounders and you can be pretty confident that
             | it was at least good or bad, even if the degree is fuzzy.
             | Usually, no.
             | 
             | Big organizations are largely flying blind. This has only
             | gotten worse with the shift from people-who-know-the-work-
             | as-leadership to professional-managers-as-leadership.
        
             | Alupis wrote:
             | Why would you assume it's lip-service?
             | 
             | Being vendor-locked into ECS means you _must_ pay whatever
             | ECS wants... using k8s means you can feasibly pick up and
             | move if you are forced.
             | 
             | Even if it doesn't save money _today_ it might save a
             | tremendous amount in the future and /or provide a much
             | stronger position to negotiate from.
        
               | greener_grass wrote:
               | Great in theory but in practice when you do K8s on AWS,
               | the AWS stuff leaks through and you still have lock-in.
        
               | Alupis wrote:
               | Then don't use the AWS stuff. You can bring your own
               | anything that they provide.
        
               | cwiggs wrote:
               | It doesn't have to be that way though. You can use the
               | AWS ingress controller, or you can use ingress-nginx. You
               | can use external secrets operator and tie it into AWS
               | Secrets manager, or you can tie it into 1pass, or
               | Hashicorp Vault.
               | 
               | Just like picking EKS you have to be aware of the pros
               | and cons of picking the cloud provider tool or not.
               | Luckily the CNCF is doing a lot for reducing vender lock
               | in and I think it will only continue.
        
               | elktown wrote:
               | I don't understand why this "you shouldn't be vendor-
               | locked" rationalization is taken at face value at all?
               | 
               | 1. The time it will take to move to another cloud is
               | proportional to the complexity of your app. For example,
               | if you're a Go shop using managed persistence are you
               | more vendor locked in any meaningful way than k8s? What's
               | the delta here?
               | 
               | 2. Do you really think you can haggle with the fuel-
               | producers like you're MAERKS? No, you're more likely just
               | a car driving around for a gas station with increasingly
               | diminishing returns.
        
               | Alupis wrote:
               | This year alone we've seen significant price increases
               | from web services, including critical ones such as Auth.
               | If you are vendor-locked into, say Auth0, and they
               | increase their price 300%[1]... What choice do you have?
               | What negotiation position do you have? None... They know
               | you cannot leave.
               | 
               | It's even worse when your entire platform is vendor-
               | locked.
               | 
               | There is nothing but upside to working towards a vendor-
               | neutral position. It gives you options. Even if you never
               | use those options, they are there.
               | 
               | > Do you really think you can haggle
               | 
               | At the scale of someone like Figma? Yes, they do
               | negotiate rates - and a competent account manager will
               | understand Figma's position and maximize the revenue they
               | can extract. Now, if the account rep doesn't play ball,
               | Figma can actually move their stuff somewhere else.
               | There's literally nothing but upside.
               | 
               | I swear, it feels like some people are just allergic to
               | anything k8s and actively seek out ways to hate on it.
               | 
               | [1] https://auth0.com/blog/upcoming-pricing-changes-for-
               | the-cust...
        
               | elktown wrote:
               | Why skip point 1 and do some strange tangent on a SaaS
               | product unrelated to using k8s or not?
               | 
               | Most people looking into (and using) k8s that are being
               | told the "you most avoid vendor lock in!" selling point
               | are nowhere near the size where it matters. But I know
               | there's essentially bulk-pricing, as we have it where I
               | work as well. That it's because of picking k8s or not
               | however is an extremely long stretch, and imo mostly
               | rationalization. There's nothing saying that a cloud move
               | _without_ k8s couldn 't be done within the same amount of
               | time. Or that even k8s is the main problem, I imagine it
               | isn't since it's usually supposed to be stateless apps.
        
               | Alupis wrote:
               | The point was about vendor lock, which you asserted is
               | not a good reason to make a move, such as this. The
               | "tangent" about a SaaS product was to make it clear what
               | happens when you build your system in such a way as-to
               | become entirely dependent on that vendor. Just because
               | Auth0 is not part of one of the big "cloud" providers,
               | doesn't make it any less vendor-locky. Almost all of the
               | vendor services offered on the big clouds are extremely
               | vendor-locked and non-portable.
               | 
               | Where you buy compute from is just as big of a deal as
               | where you buy your other SaaS' from. In all of the cases,
               | if you cannot move even if you had to (ie. it'll take 1
               | year+ to move), then you are not in a good position.
               | 
               | Addressing your #1 point - if you use a regular database
               | that happens to be offered by a cloud provider (ie.
               | Postgres, MySQL, MongoDB, etc) then you can pick up and
               | move. If you use something proprietary like CosmoDB, then
               | you are stuck or face significant efforts to migrate.
               | 
               | With k8s, moving to another cloud can be as simple as
               | creating an account and updating your configs to point at
               | the new cluster. You can run every service you need
               | inside your cluster if you wanted. You have freedom of
               | choice and mobility.
               | 
               | > Most people looking into (and using) k8s that are being
               | told the "you most avoid vendor lock in!" selling point
               | are nowhere near the size where it matters.
               | 
               | This is just simply wrong, as highlighted by the SaaS
               | example I provided. If you think you are too small so it
               | doesn't matter, and decide to embrace all of the cloud
               | vendor's proprietary services... what happens to you when
               | that cloud provider decides to change their billing
               | model, or dramatically increases price? You are screwed
               | and have no options but cough up more money.
               | 
               | There's more decisions to make and consider regarding
               | choosing a cloud platform and services than just whatever
               | is easiest to use today - for any size of business.
        
               | watermelon0 wrote:
               | I would assume that the migration from ECS to something
               | else would be a lot easier, compared to migrating from
               | other managed services, such as S3/SQS/Kinesis/DynamoDB,
               | and especially IAM, which ties everything together.
        
               | otterley wrote:
               | Amazon ECS is and always has been free of charge. You pay
               | for the underlying compute and other resources (just like
               | you do with EKS, too), but not the orchestration service.
        
             | WaxProlix wrote:
             | It looks like I'm implying that companies are successful in
             | getting those things from a K8s transition, but I wasn't
             | trying to say that, just thinking of the times when I've
             | seen these migrations happen and relaying the stated aims.
             | I agree, I think it can be a burner of dev time and a
             | burden on the business as devs acquire the new skillset
             | instead of doing more valuable work.
        
           | OptionOfT wrote:
           | Flexibility was a big thing for us. Many different
           | jurisdictions required us to be conscious of where exactly
           | data was stored & processed.
           | 
           | K8s makes this really easy. Don't need to worry whether
           | country X has a local Cloud data center of Vendor Y.
           | 
           | Plus it makes hiring so much easier as you only need to
           | understand the abstraction layer.
           | 
           | We don't hire people for ARM64 or x86. We have abstraction
           | layers. Multiple even.
           | 
           | We'd be fooling us not to use them.
        
           | fazkan wrote:
           | This, most of it, I think is to support on-prem, and cloud-
           | flexibility. Also from the customers point of view, they can
           | now sell the entire figma "box" to controlled industries for
           | a premium.
        
           | teyc wrote:
           | People move to K8s so that their resumes and job ads are
           | cloud provider agnostic. Peoples careers stagnate when their
           | employers platform on a home baked tech, or on specific
           | offerings from cloud providers. Employers find Mmoving to a
           | common platform makes recruiting easier.
        
         | samcat116 wrote:
         | > I have a hard time following much of the decision making
         | around these gigantic migrations or technology efforts because
         | the decisions don't seem to come from any user or company need
         | 
         | I mean the blog post is written by the team deciding the
         | company needs. They explained exactly why they can't easily use
         | etcd on ECS due to technical limitations. They also talked
         | about many other technical limitations that were causing them
         | issues and increasing cost. What else are you expecting?
        
         | Flokoso wrote:
         | Managing 500 or more VMS is a lot of work.
         | 
         | Aline the VM upgrade, auth, backup, log rotation etc.
         | 
         | With k8s I can give everyone a namespace, policies, volumes,
         | have automatic log aggregation due to demon sets and k8s/cloud
         | native stacks.
         | 
         | Self healing and more.
         | 
         | It's hard to describe how much better it is.
        
         | ianvonseggern wrote:
         | Hey, author here, I think you ask a good question and I think
         | you frame it well. I agree that, at least for some major
         | decisions - including this one, "it is fundamentally difficult
         | for organizations of this scale to identify and reason about
         | valuable work."
         | 
         | At its core we are a platform teams building tools, often for
         | other platform teams, that are building tools that support the
         | developers at Figma creating the actual product experience. It
         | is often harder to reason about what the right decisions are
         | when you are further removed from the end user, although it
         | also gives you great leverage. If we do our jobs right the
         | multiplier effect of getting this platform right impacts the
         | ability of every other engineer to do their job efficiently and
         | effectively (many indirectly!).
         | 
         | You bring up good examples of why this is hard. It was
         | certainly an alternative to say sorry we can't support etcd and
         | helm and you will need to find other ways to work around this
         | limitation. This was simply two more data points helping push
         | us toward the conclusion that we were running our Compute
         | platform on the wrong base building blocks.
         | 
         | While difficult to reason about, I do think its still very
         | worth trying to do this reasoning well. It's how as a platform
         | team we ensure we are tackling the right work to get to the
         | best platform we can. Thats why we spent so much time making
         | the decision to go ahead with this and part of why I thought it
         | was an interesting topic to write about.
        
           | felixgallo wrote:
           | I have a constructive recommendation for you and your
           | engineering management for future cases such as this.
           | 
           | First, when some team says "we want to use helm and etcd for
           | some reason and we haven't been able to figure out how to get
           | that working on our existing platform," start by asking them
           | what their actual goal is. It is obscenely unlikely that helm
           | (of all things) is a fundamental requirement to their work.
           | Installing temporal, for example, doesn't require helm and is
           | actually simple, if it turns out that temporal is the best
           | workflow orchestrator for the job and that none of the
           | probably 590 other options will do.
           | 
           | Second, once you have figured out what the actual goal is,
           | and have a buffet of options available, price them out. Doing
           | some napkin math on how many people were involved and how
           | much work had to go into it, it looks to me that what you
           | have spent to completely rearchitect your stack and
           | operations and retrain everyone -- completely discounting
           | opportunity cost -- is likely not to break even in even my
           | most generous estimate of increased productivity for about
           | five years. More likely, the increased cost of the platform
           | switch, the lack of likely actual velocity accrual, and the
           | opportunity cost make this a net-net bad move except for the
           | resumes of all of those involved.
        
           | Spivak wrote:
           | > we can't support etcd and helm and you will need to find
           | other ways to work around this limitation
           | 
           | So am I reading this right that either downstream platform
           | teams or devs wanted to leverage existing helm templates to
           | provision infrastructure and being on ECS locked you out of
           | those and the water eventually boiled over. If so that's a
           | pretty strong statement about the platform effect of k8s.
        
         | wg0 wrote:
         | If you haven't broken down your software into 50+ different
         | separate applications written in 15 different languages using 5
         | different database technologies - you'll find very little use
         | for k8s.
         | 
         | All you need is a way to roll out your artifact to production
         | in a roll over or blue green fashion after the preparations
         | such as required database alterations be it data or schema
         | wise.
        
           | javaunsafe2019 wrote:
           | But you do know which problems the k8s abstraction solves,
           | right? Cause it has nothing to do with many languages nor
           | many services but things like discovery, scaling, failover
           | and automation ...
        
           | imiric wrote:
           | > All you need is a way to roll out your artifact to
           | production in a roll over or blue green fashion after the
           | preparations such as required database alterations be it data
           | or schema wise.
           | 
           | Easier said than done.
           | 
           | You can start by implementing this yourself and thinking how
           | simple it is. But then you find that you also need to decide
           | how to handle different environments, configuration and
           | secret management, rollbacks, failover, load balancing, HA,
           | scaling, and a million other details. And suddenly you find
           | yourself maintaining a hodgepodge of bespoke infrastructure
           | tooling instead of your core product.
           | 
           | K8s isn't for everyone. But it sure helps when someone else
           | has thought about common infrastructure problems and solved
           | them for you.
        
             | mattmanser wrote:
             | You need to remove a lot of things from that list. Almost
             | all of that functionality is available in build tools that
             | have been available for decades. I want to emphasize the
             | DECADES.
             | 
             | And then all you're left with is scaling. Which most
             | business do not need.
             | 
             | Almost everything you've written there is a standard
             | feature of almost any CI toolchain, teamcity, Jenkins,
             | Azure DevOps, etc., etc.
             | 
             | We were doing it before k8s was even written.
        
           | mplewis wrote:
           | Yeah, all you need is a rollout system that supports blue-
           | green! Very easy to homeroll ;)
        
         | friendly_deer wrote:
         | Here's a theory about why at least some of these come about:
         | 
         | https://lethain.com/grand-migration/
        
       | tedunangst wrote:
       | How long will it take to migrate off?
        
         | codetrotter wrote:
         | It's irreversible.
        
       | wrs wrote:
       | A migration with the goal of improving the infrastructure
       | foundation is great. However, I was surprised to see that one of
       | the motivations was to allow teams to use Helm charts rather than
       | converting to Terraform. I haven't seen in practice the
       | consistent ability to actually use random Helm charts unmodified,
       | so by encouraging its use you end up with teams forking and
       | modifying the charts. And Helm is such a horrendous tool, you
       | don't really want to be maintaining your own bespoke Helm charts.
       | IMO you're actually _better off_ rewriting in Terraform so at
       | least your local version is maintainable.
       | 
       | Happy to hear counterexamples, though -- maybe the "indent 4"
       | insanity and multi-level string templating in Helm is gone
       | nowadays?
        
         | smellybigbelly wrote:
         | Our team also suffered from the problems you described of
         | public helm charts. There is always something you need to
         | customise to make things work on your own environment. Our
         | approach has been to use the public helm chart as-is and do any
         | customisation with `kustomize --enable-helm`.
        
         | BobbyJo wrote:
         | Helm is quite often the default supported way of launching
         | containerized third-party products. I have works at two
         | separate startups whose 'on prem' product was offered this way.
        
           | freedomben wrote:
           | Indeed. I try hard to minimize the amount of Helm we use, but
           | a significant amount of tools are only shipped as Helm
           | charts. Fortunately I'm increasingly seeing people provide
           | "raw k8s" yaml, but it's far from universal.
        
         | cwiggs wrote:
         | Helm Charts and Terraform are different things IMO. Terraform
         | is better used to deploying cloud resources (s3 bucket, EKS
         | cluster, EKS workers, RDS, etc). Sure you can manage your k8s
         | workloads with Terraform, but I wouldn't recommend it.
         | Terraform having state when you already have your start in k8s
         | makes working with Terraform + k8s a pain. Helm is purpose
         | built for k8s, Terraform is not.
         | 
         | I'm not a fan of Helm either though, templat-ed yaml sucks, you
         | still have the "indent 4" insanity too. Kustomize is nice when
         | things are simple, but once your app is complex Kustomize is
         | worse than Helm IMO. Try to deploy an app that has a ING, with
         | a TLS cert and external-DNS with Kustomize for multiple
         | environments; you have to patch the resources 3 times instead
         | of just have 1 variable you and use in 3 places.
         | 
         | Helm is popular, Terraform is popular so they both are talked a
         | lot, but IMO there is a tool that is yet to become popular that
         | will replace both of these tools.
        
           | wrs wrote:
           | I agree, I wouldn't generate k8s from Terraform either,
           | that's just the alternative I thought the OP was presenting.
           | But I'd still rather convert charts from Helm to pretty much
           | anything else than maintain them.
        
           | stackskipton wrote:
           | Lack of Variable substitution in Kustomize is downright
           | frustrating. We use Flux so we have the feature anyways, but
           | I wish it was built into Kustomize.
        
         | gouggoug wrote:
         | Talking about helm - I personally have come to profoundly
         | loathe it. It was amazing when it came out and filled a much
         | needed gap.
         | 
         | However it is loaded with so many footguns that I spend my time
         | redoing and debugging others engineers work.
         | 
         | I'm hoping this new tool called << timoni >> picks up steam. It
         | fixes pretty every qualm I have with helm.
         | 
         | So if like me you're looking for a better solution, go check
         | timoni.
        
         | JohnMakin wrote:
         | It's completely cursed, but I've started deploying helm via
         | terraform lately. Many people, ironically me included, find
         | that managing deployments via terraform is an anti pattern.
         | 
         | I'm giving it a try and I don't despise it yet, but it feels
         | gross - application configs are typically far more mutable and
         | dynamic than cloud infrastructure configs, and IME, terraform
         | does not likey super dynamic configs.
        
       | andrewguy9 wrote:
       | I look forward to the blog post where they get off K8, in just 18
       | months.
        
       | surfingdino wrote:
       | ECS makes sense when you are building and breaking stuff. K8s
       | makes sense when you are mature (as an org).
        
       | xiwenc wrote:
       | I'm baffled to see so many anti-k8s sentiments on HN. Is it
       | because most commenters are developers used to services like
       | heroku, fly.io, render.com etc. Or run their apps on VM's?
        
         | elktown wrote:
         | I think some are just pretty sick and tired of the explosion of
         | needless complexity we've seen in the last decade or so in
         | software, and rightly so. This is an industry-wide problem of
         | deeply misaligned incentives (& some amount of ZIRP gold rush),
         | not specific to this particular case - if this one is even a
         | good example of this to begin with.
         | 
         | Honestly, as it stands, I think we'd be seen as pretty useless
         | craftsmen in any other field due to an unhealthy obsession of
         | our tooling and meta-work - consistently throwing any kind of
         | sensible resource usage out of the window in favor of just
         | getting to work with certain tooling. It's some kind of a
         | "Temporarily embarrassed FAANG engineer" situation.
        
           | cwiggs wrote:
           | I agree with this somewhat. The other day I was driving home
           | and I saw a sprinkler head and broke on the side of the road
           | and was spraying water everywhere. It made me think, why
           | aren't sprinkler systems designed with HA in mind? Why aren't
           | there dual water lines with dual sprinkler heads everywhere
           | with an electronic component that detects a break in a line
           | and automatically switches to the backup water line? It's
           | because the downside of having the water spray everywhere,
           | the grass become unhealthy or die is less than how much it
           | would cost to deploy it HA.
           | 
           | In the software/tech industry it's common place to just
           | accept that your app can't be down for any amount of time no
           | matter what. No one checked to see how much more it would
           | cost (engineering time & infra costs) to deploy the app so it
           | would be HA, so no one checked to see if it would be worth
           | it.
           | 
           | I blame this logic on the low interest rates for a decade. I
           | could be wrong.
        
         | maayank wrote:
         | It's one of those technologies where there's merit to use them
         | in some situations but are too often cargo culted.
        
         | caniszczyk wrote:
         | Hating is a sign of success in some ways :)
         | 
         | In some ways, it's nice to see companies move to use mostly
         | open source infrastructure, a lot of it coming from CNCF
         | (https://landscape.cncf.io), ASF and other organizations out
         | there (on top of the random things on github).
        
         | tryauuum wrote:
         | For me it is about VMs. Feel uneasy knowing that any kernel
         | vulnerability will allow a malicious code to escape the
         | container and explore the kubernetes host
         | 
         | There are kata-containers I think, they might solve my angst
         | and make me enjoy k8s
         | 
         | Overall... There's just nothing cool in kubernetes to me.
         | Containers, load balancers, megabytes of yaml -- I've seen it
         | all. Nothing feels interesting enough to try
        
           | stackskipton wrote:
           | vs the Application getting hacked and running lose on the VM?
           | 
           | If you have never dealt with, I have to run these 50
           | containers plus Nginx/CertBot while figuring out which node
           | is best to run it, yea, I can see you not being thrilled
           | about Kubernetes. For the rest of us though, Kubernetes helps
           | out with that easily.
        
       | solatic wrote:
       | I don't get the hate for Kubernetes in this thread. TFA is from
       | _Figma_. You can talk all day long about how early startups just
       | don 't need the kind of management benefits that Kubernetes
       | offers, but the article isn't written by someone working for a
       | startup, it's written by a company that nearly got sold to Adobe
       | for $20 billion.
       | 
       | Y'all really don't think a company like Figma stands to benefit
       | from the flexibility that Kubernetes offers?
        
         | BobbyJo wrote:
         | Kubernetes isn't even that complicated, and first party support
         | from cloud providers often means you're doing something in K8s
         | inleu of doing it in a cloud specific way (like ingress vs
         | cloud specific load balancer setups).
         | 
         | At a certain scale, K8s is the simple option.
         | 
         | I think much of the hate on HN comes from the "ruby on rails is
         | all you need" crowd.
        
           | JohnMakin wrote:
           | > I think much of the hate on HN comes from the "ruby on
           | rails is all you need" crowd.
           | 
           | Maybe - people seem really gungho about serverless solutions
           | here too
        
         | logifail wrote:
         | > it's written by a company that nearly got sold to Adobe for
         | $20 billion
         | 
         | (Apologies if this is a dumb question) but isn't Figma big
         | enough to want to do any of their stuff on their own hardware
         | yet? Why would they still be paying AWS rates?
         | 
         | Or is it the case that a high-profile blog post about K8S and
         | being provider-agnostic gets you sufficient discount on your
         | AWS bill to still be value-for-money?
        
           | jeffbee wrote:
           | There are a lot of ex-Dropbox people at Figma who might have
           | learned firsthand that bringing your stuff on-prem under a
           | theory of saving money is an intensely stupid idea.
        
             | logifail wrote:
             | > There are a lot of ex-Dropbox people at Figma who might
             | have learned firsthand that bringing your stuff on-prem
             | under a theory of saving money is an intensely stupid idea
             | 
             | Well, that's one hypothesis.
             | 
             | Another is that "Every maturing company with predictable
             | products must be exploring ways to move workloads out of
             | the cloud. AWS took your margin and isn't giving it back."
             | ( https://news.ycombinator.com/item?id=35235775 )
        
           | ozim wrote:
           | They are preparing for next blog post in a year - ,,how we
           | cut costs by xx% by moving to our own servers".
        
           | hyperbolablabla wrote:
           | I work for a company making ~$9B in annual revenue and we use
           | AWS for everything. I think a big aspect of that is just
           | developer buy-in, as well as reliability guarantees, and
           | being able to blame Amazon when things do go down
        
           | NomDePlum wrote:
           | Much bigger companies use AWS for very practical well thought
           | out reasons.
           | 
           | Not managing procurement of hardware, upgrades, etc, and a
           | defined standard operating model with accessible
           | documentation and the ability to hire people with experience,
           | and have to hire less people as you are doing less is enough
           | to build a viable and demonstrable business case.
           | 
           | Scale beyond a certain point is hard without support and
           | delegated responsibility.
        
         | cwiggs wrote:
         | k8s is complex, if you don't need the following you probably
         | shouldn't use it:
         | 
         | * Service discovery
         | 
         | * Auto bin packing
         | 
         | * Load Balancing
         | 
         | * Automated rollouts and rollbacks
         | 
         | * Horizonal scaling
         | 
         | * Probably more I forgot about
         | 
         | You also have secret and config management built in. If you use
         | k8s you also have the added benefit of making it easier to move
         | your workloads between clouds and bare metal. As long as you
         | have a k8s cluster you can mostly move your app there.
         | 
         | Problem is most companies I've worked at in the past 10 years
         | needed multiple of the features above, and they decided to roll
         | their own solution with Ansible/Chef, Terraform, ASGs, Packer,
         | custom scripts, custom apps, etc. The solutions have always
         | been worse than what k8s provides, and it's a bespoke tool that
         | you can't hire for.
         | 
         | For what k8s provides, it isn't complex, and it's all
         | documented very well, AND it's extensible so you can build your
         | own apps on top of it.
         | 
         | I think there are more SWE on HN than
         | Infra/Platform/Devops/buzzword engineers. As a result there are
         | a lot of people who don't have a lot of experience managing
         | infra and think that spinning up their docker container on a VM
         | is the same as putting an app in k8s. That's my opinion on why
         | k8s gets so much hate on HN.
        
           | Osiris wrote:
           | Those all seem important to even moderately sized products.
        
             | worldsayshi wrote:
             | As long as your requirements are simple the config doesn't
             | need to be complex either. Not much more than docker-
             | compose.
             | 
             | But once you start using k8s you probably tend to scope
             | creep and find a lot of shiny things to add to your set up.
        
       | breakingcups wrote:
       | I feel so out of touch when I read a blog post which casually
       | mentions 6 CNCF projects with kool names that I've never heard
       | of, for gaining seemingly simple functionality.
       | 
       | I'm really wondering if I'm aging out of professional software
       | development.
        
         | renewiltord wrote:
         | Nah, there's lots of IC work. It just means that you're
         | unfamiliar with one approach to org scaling: abstracting over
         | hardware, logging, retrying handled by platform team.
         | 
         | It's not the only approach so you may well be familiar with
         | others.
        
       | twodave wrote:
       | TL;DR because they already ran everything in containers. Having
       | performed a migration where this wasn't the case, the path from
       | non-containerized to containerized is way more effort than going
       | from containerized non-k8s to k8s.
        
       | _pdp_ wrote:
       | In my own experience, AWS Fargate is easier, more secure and way
       | more robust then running your K8S even with EKS.
        
         | watermelon0 wrote:
         | Do you mean ECS Fargate? Because you can use AWS Fargate with
         | EKS, with some limitations.
        
       | ko_pivot wrote:
       | I'm not surprised that the first reason they state for moving off
       | of ECS was the lack of support for stateful services. The lack of
       | integration between EBS and ECS has always felt really strange to
       | me, considering that AWS already built all the logic to integrate
       | EKS with EBS in a StatefulSet compliant way.
        
         | datatrashfire wrote:
         | https://aws.amazon.com/about-aws/whats-new/2024/01/amazon-ec...
         | 
         | This was actually added beginning of the year. Definitely was
         | on my most wanted list for a while. You could technically use
         | EFS, but that's a very expensive way to run anything IO
         | intensive.
        
           | ko_pivot wrote:
           | This adds support for ephemeral EBS volumes. When a task is
           | created a volume gets created, and when the task is
           | destroyed, for whatever reason, the volume is destroyed too.
           | It has no concept of task identity. If the task needs to be
           | moved to a new host, the volume is destroyed.
        
       | julienmarie wrote:
       | I personally love k8s. I run multiple small but complex custom
       | e-commerce shops and handle all the tech on top of marketing,
       | finance and customer service.
       | 
       | I was running on dedicated servers before. My stack is quite
       | complicated and deploys were a nightmare. In the end the dread of
       | deploying was slowing down the little company.
       | 
       | Learning and moving to k8s took me a month. I run around 25
       | different services ( front ends, product admins, logistics
       | dashboards, delivery routes optimizers, orsm, ERP, recommendation
       | engine, search, etc.... ).
       | 
       | It forced me to clean my act and structure things in a repeatable
       | way. Having all your cluster config in one place allows you to
       | exactly know the state of every service, which version is
       | running.
       | 
       | It allowed me to do rolling deploys with no downtime.
       | 
       | Yes it's complex. As programmers we are used to complex. An Nginx
       | config file is complex as well.
       | 
       | But the more you dive into it the more you understand the
       | architecture if k8s and how it makes sense. It forces you to
       | respect the twelve factors to the letter.
       | 
       | And yes, HA is more than nice, especially when your income is
       | directly linked to the availability and stability of your stack.
       | 
       | And it's not that expensive. I lay around 400 usd a month in
       | hosting.
        
       | xyst wrote:
       | Of course there's no mention of performance loss or gain after
       | migration.
       | 
       | I remember when microservices architecture was the latest hot
       | trend that came off the presses. Small and big firms were racing
       | to redesign/reimplement apps. But most forgot they weren't
       | Google/Netflix/Facebook.
       | 
       | I remember end user experience ended up being _worse_ after the
       | implementation. There was a saturation point where a single micro
       | service called by all of the other micro services would cause
       | complete system meltdown. There was also the case of an
       | "accidental" dependency loop (S1 -> S2 -> S3 -> S1). Company
       | didn't have an easy way to trace logs across different services
       | (way before distributed tracing was a thing). Turns out only a
       | specific condition would trigger the dependency loop (maybe, 1 in
       | 100 requests?).
       | 
       | Good times. Also, job safety.
        
         | api wrote:
         | This is a very fad driven industry. One of the things you earn
         | after being in it for a long time is intuition for spotting
         | fads and gratuitous complexity traps.
        
       ___________________________________________________________________
       (page generated 2024-08-08 23:00 UTC)