[HN Gopher] How we migrated onto K8s in less than 12 months
       ___________________________________________________________________
        
       How we migrated onto K8s in less than 12 months
        
       Author : ianvonseggern
       Score  : 263 points
       Date   : 2024-08-08 16:07 UTC (1 days ago)
        
 (HTM) web link (www.figma.com)
 (TXT) w3m dump (www.figma.com)
        
       | jb1991 wrote:
       | Can anyone advise what is the most common language used in
       | enterprise settings for interfacing with K8s?
        
         | JohnMakin wrote:
         | IME almost exclusively golang.
        
           | roshbhatia wrote:
           | ++, most controllers are written in go, but there's plenty of
           | client libraries for other languages.
           | 
           | A common pattern you'll see though is skipping writing any
           | sort of code and instead using a higher level dsl-ish
           | configuration usually via yaml, using tools like Kyverno.
        
           | angio wrote:
           | I'm seeing a lot of custom operators written in Rust
           | nowadays. Obviously biased because I do a lot of rust myself
           | so people I'm talking to also do rust.
        
         | gadflyinyoureye wrote:
         | Depends on what you mean. Helm will control a lot. You can make
         | the yaml file in any language. Also you can admin it from
         | command line tools. So again any language but often zsh or
         | bash.
        
         | cortesoft wrote:
         | A lot of yaml
        
           | yen223 wrote:
           | The fun kind of yaml that has a lot of {{ }} in them that
           | breaks your syntax highlighter.
        
         | mplewis wrote:
         | I have seen more Terraform than anything else.
        
         | akdor1154 wrote:
         | On the platform consumer side (app infra description) - well
         | schema'd yaml, potentially orchestrated by helm ("templates to
         | hellish extremes") or kustomize ("no templates, this is the
         | hill we will die on").
         | 
         | On the platform integration/hook side (app code doing
         | specialised platform-specific integration stuff, extensions to
         | k8s itself), golang is the lingua franca but bindings for many
         | languages are around and good.
        
         | bithavoc wrote:
         | If you're talking about connecting to Kubernetes and create
         | resources programmatically, Pulumi allows you to interface with
         | it from all the languages they support(js, ts, go, c#, python)
         | including wrapping up Helm charts and inject secrets( my
         | personal favorite).
         | 
         | If you want to build your own Kubernetes Custom Resources and
         | Controllers, Go lang works pretty well for that.
        
       | JohnMakin wrote:
       | I like how this article clearly and articulately states the
       | reasons it gains to benefit from Kubernetes. Many make the jump
       | without knowing what they even stand to gain, or if they need to
       | in the first place - the reasons given here are good.
        
         | nailer wrote:
         | I was about to write the opposite - the logic is poor and
         | circular - but multiple other commenters have already raised
         | this: https://news.ycombinator.com/item?id=41194506
         | https://news.ycombinator.com/item?id=41194420
        
           | JohnMakin wrote:
           | I don't really see those rebuttals as all that valid. The
           | reasons given in this article are completely valid, from my
           | perspective of someone who's worked heavily with
           | Kubernetes/ECS.
           | 
           | Helm, for instance, is a great time saver for installing
           | software. Often software will support nothing but helm. Ease
           | of deployment is a good consideration. Their points on
           | networking are absolutely spot on. The scaling considerations
           | are spot on. Killing/isolating unhealthy containers is
           | completely valid. I could go on a lot more, but I don't see a
           | single point listed as invalid.
        
           | samcat116 wrote:
           | They're quite specific in that they mention that teams would
           | like to make use of existing helm charts for other software
           | products. Telling them to build and maintain definitions for
           | those services from scratch is added work in their mind.
        
       | dijksterhuis wrote:
       | > When applied, Terraform code would spin up a template of what
       | the service should look like by creating an ECS task set with
       | zero instances. Then, the developer would need to deploy the
       | service and clone this template task set [and do a bunch of
       | manual things]
       | 
       | > This meant that something as simple as adding an environment
       | variable required writing and applying Terraform, then running a
       | deploy
       | 
       | This sounds less like a problem with ECS and more like an
       | overcomplication in how they were using terraform + ECS to manage
       | their deployments.
       | 
       | I get the generating templates part for verification prior to
       | live deploys. But this seems... dunno.
        
         | wfleming wrote:
         | Very much agree. I have built infra on ECS with terraform at
         | two companies now, and we have zero manual steps for actions
         | like this, beyond "add the env var to a terraform file, merge
         | it and let CI deploy". The majority of config changes we would
         | make are that process.
        
           | dijksterhuis wrote:
           | Yeah.... thinking about it a bit more i just don't see why
           | they didn't set up their CI to deploy a short lived
           | environment on a push to a feature branch.
           | 
           | To me that seems like the simpler solution.
        
         | roshbhatia wrote:
         | I'm with you here -- ECS deploys are pretty painless and
         | uncomplicated, but I can picture a few scenarios where this
         | ends up being necessary, for ex; if they have a lot of services
         | deployed on ECS and it ends up bloating the size of the
         | Terraform state. That'd slow down plans and applies
         | significantly, which makes sharding the Terraform state by
         | literally cloning the configuration based on a template a lot
         | safer.
        
           | freedomben wrote:
           | > ECS deploys are pretty painless and uncomplicated
           | 
           | Unfortunately in my experience, this is true until it isn't.
           | Once it isn't true, it can quickly become a painful blackbox
           | debugging exercise. If your org is big enough to have
           | dedicated AWS support then they can often get help from
           | engineers, but if you aren't then life can get really
           | complicated.
           | 
           | Still not a bad choice for most apps though, especially if
           | it's just a run-of-the-mill HTTP-based app
        
         | ianvonseggern wrote:
         | Hey, author here, I totally agree that this is not a
         | fundamental limitation of ECS and we could have iterated on
         | this setup and made something better. I intentionally listed
         | this under work we decided to scope into the migration process,
         | and not under the fundamental reasons we undertook the
         | migration because of that distinction.
        
       | Aeolun wrote:
       | Honestly, I find the reasons they name for using Kubernetes
       | flimsy as hell.
       | 
       | "ECS doesn't support helm charts!"
       | 
       | No shit sherlock, that's a thing literally built on Kubernetes.
       | It's like a government RFP that can only be fulfilled by a single
       | client.
        
         | Carrok wrote:
         | > We also encountered many smaller paper cuts, like attempting
         | to gracefully terminate a single poorly behaving EC2 machine
         | when running ECS on EC2. This is easy on Amazon's Elastic
         | Kubernetes Service (EKS), which allows you to simply cordon off
         | the bad node and let the API server move the pods off to
         | another machine while respecting their shutdown routines.
         | 
         | I dunno, that seems like a very good reason to me.
        
           | watermelon0 wrote:
           | I assume that ECS Fargate would solve this, because one
           | misbehaving ECS task would not affect others, and stopping it
           | should still respect the shutdown routines.
        
             | ko_pivot wrote:
             | Fargate is very expensive at scale. Great for small or
             | bursty workloads, but when you're at Figma scale, you
             | almost always go EC2 for cost-effectiveness.
        
               | ihkasfjdkabnsk wrote:
               | this isn't really true. It was very expensive when it was
               | first released but now it's pretty cost competitive with
               | EC2, especially when you consider easier scale down/up.
        
               | Aeolun wrote:
               | I think when you are at Figma scale you should have
               | learned to keep things simpler. At this point I don't
               | think the (slightly) lower costs of EC2 weigh up against
               | the benefits of Fargate.
        
         | aranelsurion wrote:
         | To be fair there are many benefits of running on the platform
         | that has the most mindshare.
         | 
         | Unless they are in this space competing against k8s, it's
         | reasonable for them if they want to use Helm charts, to move
         | where they can.
         | 
         | Also, Helm doesn't work with ECS, so doesn't <50 other tools
         | and tech from the CNCF map>.
        
         | cwiggs wrote:
         | I think what they should have said is "there isn't a tool like
         | Helm for ECS" If you want to deploy a full prometheus, grafana,
         | alertmanager, etc stack on ECS, good luck with that, no one has
         | written the task definition for you to consume and override
         | values.
         | 
         | With k8s you can easily deploy a helm chart that will deploy
         | lots of things that all work together fairly easily.
        
         | JohnMakin wrote:
         | It's almost like people factor in a piece of software's tooling
         | environment before they use the software - wild.
        
         | liveoneggs wrote:
         | recipes and tutorials say "helm" so we need "helm"
        
       | vouwfietsman wrote:
       | Maybe its normal for a company this size, but I have a hard time
       | following much of the decision making around these gigantic
       | migrations or technology efforts because the decisions don't seem
       | to come from any user or company need. There was a similar post
       | from Figma earlier, I think around databases, that left me
       | feeling the same.
       | 
       | For instance: they want to go to k8s because they want to use
       | etcd/helm, which they can't on ECS? Why do you want to use
       | etcd/helm? Is it really this important? Is there really no other
       | way to achieve the goals of the company than exactly like that?
       | 
       | When a decision is founded on a desire of the user, its easy to
       | validate that downstream decisions make sense. When a decision is
       | founded on a technological desire, downstream decisions may make
       | sense in the context of the technical desire, but do they make
       | sense in the context of the user, still?
       | 
       | Either I don't understand organizations of this scale, or it is
       | fundamentally difficult for organizations of this scale to
       | identify and reason about valuable work.
        
         | WaxProlix wrote:
         | People move to K8s (specifically from ECS) so that they can use
         | cloud provider agnostic tooling and products. I suspect a lot
         | of larger company K8s migrations are fueled by a desire to be
         | multicloud or hybrid on-prem, mitigate cost, availability, and
         | lock-in risk.
        
           | timbotron wrote:
           | there's a pretty direct translation from ECS task definition
           | to docker-compose file
        
           | zug_zug wrote:
           | I've heard all of these lip-service justifications before,
           | but I've yet to see anybody actually publish data showing how
           | they saved any money. Would love to be proven wrong by some
           | hard data, but something tells me I won't be.
        
             | nailer wrote:
             | Likewise. I'm not sure Kubernetes famous complexity (and
             | the resulting staff requirements) are worth it to
             | preemptively avoid vendor lockin, and wouldn't be solved
             | more efficiently by migrating to another cloud provider's
             | native tools if the need arises.
        
             | bryanlarsen wrote:
             | I'm confident Figma isn't paying published rates for AWS.
             | The transition might have helped them in their rate
             | negotiations with AWS, or it might not have. Hard data on
             | the money saved would be difficult to attribute.
        
             | jgalt212 wrote:
             | True but if AWS knows your lock-in is less locked-in, I'd
             | bet they'd more flexible when contracts are up for renewal.
             | I mean it's possible the blog post's primary purpose was a
             | shot across bow to their AWS account manager.
        
               | logifail wrote:
               | > it's possible the blog post's primary purpose was a
               | shot across bow to their AWS account manager
               | 
               | Isn't it slightly depressing that this explanation is
               | fairly (the most?) plausible?
        
               | jiggawatts wrote:
               | Our state department of education is one of the biggest
               | networks in the world with about half a million devices.
               | They would occasionally publicly announce a migration to
               | Linux.
               | 
               | This was just a Microsoft licensing negotiation tactic.
               | Before he was CEO, Ballmer flew here to negotiate one of
               | the contracts. The discounts were _epic_.
        
             | tengbretson wrote:
             | There are large swaths of the b2b space where (for whatever
             | reason) being in the same cloud is a hard business
             | requirement.
        
             | vundercind wrote:
             | The vast majority of corporate decisions are never
             | justified by useful data analysis, before or after the
             | fact.
             | 
             | Many are so-analyzed, but usually in ways that anyone who
             | paid attention in high school science or stats classes can
             | tell are so flawed that they're meaningless.
             | 
             | We can't even measure manager efficacy to any useful
             | degree, in nearly all cases. We can come up with numbers,
             | but they don't mean anything. Good luck with anything more
             | complex.
             | 
             | Very small organizations can probably manage to isolate
             | enough variables to know how good or bad some move was in
             | hindsight, if they try and are competent at it (... if).
             | Sometimes an effect is so huge for a large org that it
             | overwhelms confounders and you can be pretty confident that
             | it was at least good or bad, even if the degree is fuzzy.
             | Usually, no.
             | 
             | Big organizations are largely flying blind. This has only
             | gotten worse with the shift from people-who-know-the-work-
             | as-leadership to professional-managers-as-leadership.
        
             | Alupis wrote:
             | Why would you assume it's lip-service?
             | 
             | Being vendor-locked into ECS means you _must_ pay whatever
             | ECS wants... using k8s means you can feasibly pick up and
             | move if you are forced.
             | 
             | Even if it doesn't save money _today_ it might save a
             | tremendous amount in the future and /or provide a much
             | stronger position to negotiate from.
        
               | greener_grass wrote:
               | Great in theory but in practice when you do K8s on AWS,
               | the AWS stuff leaks through and you still have lock-in.
        
               | Alupis wrote:
               | Then don't use the AWS stuff. You can bring your own
               | anything that they provide.
        
               | greener_grass wrote:
               | This requires iron discipline. Maybe with some kind of
               | linter for Terraform / kubectl it could be done.
        
               | cwiggs wrote:
               | It doesn't have to be that way though. You can use the
               | AWS ingress controller, or you can use ingress-nginx. You
               | can use external secrets operator and tie it into AWS
               | Secrets manager, or you can tie it into 1pass, or
               | Hashicorp Vault.
               | 
               | Just like picking EKS you have to be aware of the pros
               | and cons of picking the cloud provider tool or not.
               | Luckily the CNCF is doing a lot for reducing vender lock
               | in and I think it will only continue.
        
               | elktown wrote:
               | I don't understand why this "you shouldn't be vendor-
               | locked" rationalization is taken at face value at all?
               | 
               | 1. The time it will take to move to another cloud is
               | proportional to the complexity of your app. For example,
               | if you're a Go shop using managed persistence are you
               | more vendor locked in any meaningful way than k8s? What's
               | the delta here?
               | 
               | 2. Do you really think you can haggle with the fuel-
               | producers like you're MAERKS? No, you're more likely just
               | a car driving around for a gas station with increasingly
               | diminishing returns.
        
               | Alupis wrote:
               | This year alone we've seen significant price increases
               | from web services, including critical ones such as Auth.
               | If you are vendor-locked into, say Auth0, and they
               | increase their price 300%[1]... What choice do you have?
               | What negotiation position do you have? None... They know
               | you cannot leave.
               | 
               | It's even worse when your entire platform is vendor-
               | locked.
               | 
               | There is nothing but upside to working towards a vendor-
               | neutral position. It gives you options. Even if you never
               | use those options, they are there.
               | 
               | > Do you really think you can haggle
               | 
               | At the scale of someone like Figma? Yes, they do
               | negotiate rates - and a competent account manager will
               | understand Figma's position and maximize the revenue they
               | can extract. Now, if the account rep doesn't play ball,
               | Figma can actually move their stuff somewhere else.
               | There's literally nothing but upside.
               | 
               | I swear, it feels like some people are just allergic to
               | anything k8s and actively seek out ways to hate on it.
               | 
               | [1] https://auth0.com/blog/upcoming-pricing-changes-for-
               | the-cust...
        
               | elktown wrote:
               | Why skip point 1 and do some strange tangent on a SaaS
               | product unrelated to using k8s or not?
               | 
               | Most people looking into (and using) k8s that are being
               | told the "you most avoid vendor lock in!" selling point
               | are nowhere near the size where it matters. But I know
               | there's essentially bulk-pricing, as we have it where I
               | work as well. That it's because of picking k8s or not
               | however is an extremely long stretch, and imo mostly
               | rationalization. There's nothing saying that a cloud move
               | _without_ k8s couldn 't be done within the same amount of
               | time. Or that even k8s is the main problem, I imagine it
               | isn't since it's usually supposed to be stateless apps.
        
               | Alupis wrote:
               | The point was about vendor lock, which you asserted is
               | not a good reason to make a move, such as this. The
               | "tangent" about a SaaS product was to make it clear what
               | happens when you build your system in such a way as-to
               | become entirely dependent on that vendor. Just because
               | Auth0 is not part of one of the big "cloud" providers,
               | doesn't make it any less vendor-locky. Almost all of the
               | vendor services offered on the big clouds are extremely
               | vendor-locked and non-portable.
               | 
               | Where you buy compute from is just as big of a deal as
               | where you buy your other SaaS' from. In all of the cases,
               | if you cannot move even if you had to (ie. it'll take 1
               | year+ to move), then you are not in a good position.
               | 
               | Addressing your #1 point - if you use a regular database
               | that happens to be offered by a cloud provider (ie.
               | Postgres, MySQL, MongoDB, etc) then you can pick up and
               | move. If you use something proprietary like CosmoDB, then
               | you are stuck or face significant efforts to migrate.
               | 
               | With k8s, moving to another cloud can be as simple as
               | creating an account and updating your configs to point at
               | the new cluster. You can run every service you need
               | inside your cluster if you wanted. You have freedom of
               | choice and mobility.
               | 
               | > Most people looking into (and using) k8s that are being
               | told the "you most avoid vendor lock in!" selling point
               | are nowhere near the size where it matters.
               | 
               | This is just simply wrong, as highlighted by the SaaS
               | example I provided. If you think you are too small so it
               | doesn't matter, and decide to embrace all of the cloud
               | vendor's proprietary services... what happens to you when
               | that cloud provider decides to change their billing
               | model, or dramatically increases price? You are screwed
               | and have no options but cough up more money.
               | 
               | There's more decisions to make and consider regarding
               | choosing a cloud platform and services than just whatever
               | is easiest to use today - for any size of business.
               | 
               | I have found that, in general, people are afraid of using
               | k8s because it isn't trivial to understand for most
               | developers. People often mistakenly believe k8s is only
               | useful when you're "google scale". It solves a lot of
               | problems, including reduced vendor-lock.
        
               | watermelon0 wrote:
               | I would assume that the migration from ECS to something
               | else would be a lot easier, compared to migrating from
               | other managed services, such as S3/SQS/Kinesis/DynamoDB,
               | and especially IAM, which ties everything together.
        
               | otterley wrote:
               | Amazon ECS is and always has been free of charge. You pay
               | for the underlying compute and other resources (just like
               | you do with EKS, too), but not the orchestration service.
        
             | WaxProlix wrote:
             | It looks like I'm implying that companies are successful in
             | getting those things from a K8s transition, but I wasn't
             | trying to say that, just thinking of the times when I've
             | seen these migrations happen and relaying the stated aims.
             | I agree, I think it can be a burner of dev time and a
             | burden on the business as devs acquire the new skillset
             | instead of doing more valuable work.
        
           | OptionOfT wrote:
           | Flexibility was a big thing for us. Many different
           | jurisdictions required us to be conscious of where exactly
           | data was stored & processed.
           | 
           | K8s makes this really easy. Don't need to worry whether
           | country X has a local Cloud data center of Vendor Y.
           | 
           | Plus it makes hiring so much easier as you only need to
           | understand the abstraction layer.
           | 
           | We don't hire people for ARM64 or x86. We have abstraction
           | layers. Multiple even.
           | 
           | We'd be fooling us not to use them.
        
           | fazkan wrote:
           | This, most of it, I think is to support on-prem, and cloud-
           | flexibility. Also from the customers point of view, they can
           | now sell the entire figma "box" to controlled industries for
           | a premium.
        
           | teyc wrote:
           | People move to K8s so that their resumes and job ads are
           | cloud provider agnostic. Peoples careers stagnate when their
           | employers platform on a home baked tech, or on specific
           | offerings from cloud providers. Employers find Mmoving to a
           | common platform makes recruiting easier.
        
         | samcat116 wrote:
         | > I have a hard time following much of the decision making
         | around these gigantic migrations or technology efforts because
         | the decisions don't seem to come from any user or company need
         | 
         | I mean the blog post is written by the team deciding the
         | company needs. They explained exactly why they can't easily use
         | etcd on ECS due to technical limitations. They also talked
         | about many other technical limitations that were causing them
         | issues and increasing cost. What else are you expecting?
        
         | Flokoso wrote:
         | Managing 500 or more VMS is a lot of work.
         | 
         | Aline the VM upgrade, auth, backup, log rotation etc.
         | 
         | With k8s I can give everyone a namespace, policies, volumes,
         | have automatic log aggregation due to demon sets and k8s/cloud
         | native stacks.
         | 
         | Self healing and more.
         | 
         | It's hard to describe how much better it is.
        
         | ianvonseggern wrote:
         | Hey, author here, I think you ask a good question and I think
         | you frame it well. I agree that, at least for some major
         | decisions - including this one, "it is fundamentally difficult
         | for organizations of this scale to identify and reason about
         | valuable work."
         | 
         | At its core we are a platform teams building tools, often for
         | other platform teams, that are building tools that support the
         | developers at Figma creating the actual product experience. It
         | is often harder to reason about what the right decisions are
         | when you are further removed from the end user, although it
         | also gives you great leverage. If we do our jobs right the
         | multiplier effect of getting this platform right impacts the
         | ability of every other engineer to do their job efficiently and
         | effectively (many indirectly!).
         | 
         | You bring up good examples of why this is hard. It was
         | certainly an alternative to say sorry we can't support etcd and
         | helm and you will need to find other ways to work around this
         | limitation. This was simply two more data points helping push
         | us toward the conclusion that we were running our Compute
         | platform on the wrong base building blocks.
         | 
         | While difficult to reason about, I do think its still very
         | worth trying to do this reasoning well. It's how as a platform
         | team we ensure we are tackling the right work to get to the
         | best platform we can. Thats why we spent so much time making
         | the decision to go ahead with this and part of why I thought it
         | was an interesting topic to write about.
        
           | felixgallo wrote:
           | I have a constructive recommendation for you and your
           | engineering management for future cases such as this.
           | 
           | First, when some team says "we want to use helm and etcd for
           | some reason and we haven't been able to figure out how to get
           | that working on our existing platform," start by asking them
           | what their actual goal is. It is obscenely unlikely that helm
           | (of all things) is a fundamental requirement to their work.
           | Installing temporal, for example, doesn't require helm and is
           | actually simple, if it turns out that temporal is the best
           | workflow orchestrator for the job and that none of the
           | probably 590 other options will do.
           | 
           | Second, once you have figured out what the actual goal is,
           | and have a buffet of options available, price them out. Doing
           | some napkin math on how many people were involved and how
           | much work had to go into it, it looks to me that what you
           | have spent to completely rearchitect your stack and
           | operations and retrain everyone -- completely discounting
           | opportunity cost -- is likely not to break even in even my
           | most generous estimate of increased productivity for about
           | five years. More likely, the increased cost of the platform
           | switch, the lack of likely actual velocity accrual, and the
           | opportunity cost make this a net-net bad move except for the
           | resumes of all of those involved.
        
           | Spivak wrote:
           | > we can't support etcd and helm and you will need to find
           | other ways to work around this limitation
           | 
           | So am I reading this right that either downstream platform
           | teams or devs wanted to leverage existing helm templates to
           | provision infrastructure and being on ECS locked you out of
           | those and the water eventually boiled over. If so that's a
           | pretty strong statement about the platform effect of k8s.
        
           | vouwfietsman wrote:
           | Hi! Thanks for the thoughtful reply.
           | 
           | I understand what you're saying, the thing that worries me
           | though is that the input you get from other technical teams
           | is very hard to verify. Do you intend to measure the
           | development velocity of the teams before and after the
           | platform change takes effect?
           | 
           | In my experience it is extremely hard to measure the real
           | development velocity (in terms of value-add, not arbitrary
           | story points) of a single team, not to mention a group of
           | teams over time, not to mention _as a result of a change_.
           | 
           | This is not necessarily criticism of Figma, as much as it is
           | criticism of the entire industry maybe.
           | 
           | Do you have an approach for measuring these things?
        
             | felixgallo wrote:
             | You're right that the input from other technical teams is
             | hard to verify. On the other hand, that's fundamental table
             | stakes, especially for a platform team that has a broad
             | impact on an organization. The purpose of the platform is
             | to delight the paying customer, and every change should
             | have a clear and well documented and narrated line of sight
             | to either increasing that delight or decreasing the
             | frustration.
             | 
             | The canonical way to do that is to ensure that the incoming
             | demand comes with both the ask and also the solid
             | justification. Even at top tier organizations, frequently
             | asks are good ideas, sensible ideas, nice ideas, probably
             | correct ideas -- but none of that is good enough/acceptable
             | enough. The proportion of good/sensible/nice/probably
             | correct ideas that are justifiable is about 5% in my lived
             | experience of 38 years in the industry. The onus is on the
             | asking team to provide that full true and complete
             | justification with sufficiently detailed data and in the
             | manner and form that convinces the platform team's
             | leadership. The bar needs to be high and again, has to
             | provide a clear line of sight to improving the life of the
             | paying customer. The platform team has the authority and
             | agency necessary to defend the customer, operations and
             | their time, and can (and often should) say no. It is not
             | the responsibility of the platform team to try to prove or
             | disprove something that someone wants, and it's not
             | 'pushing back' or 'bureaucracy', it's basic sober purpose-
             | of-the-company fundamentals. Time and money are not
             | unlimited. Nothing is free.
             | 
             | Frequently the process of trying to put together the
             | justification reveals to the asking team that they do not
             | in fact have the justification, and they stop there and a
             | disaster is correctly averted.
             | 
             | Sometimes, the asking team is probably right but doesn't
             | have the data to justify the ask. Things like 'Let's move
             | to K8s because it'll be better' are possibly true but also
             | possibly not. Vibes/hacker news/reddit/etc are beguiling to
             | juniors but do not necessarily delight paying customers.
             | The platform team has a bunch of options if they receive
             | something of that form. "No" is valid, but also so is
             | "Maybe" along with a pilot test to perform A/B testing
             | measurements and to try to get the missing data; or even
             | "Yes, but" with a plan to revert the situation if it turns
             | out to be too expensive or ineffective after an
             | incrementally structured phase 1. A lot depends on the
             | judgement of the management and the available bandwidth,
             | opportunity cost, how one-way-door the decision is, etc.
             | 
             | At the end of the day, though, if you are not making a
             | data-driven decision (or the very closest you can get to
             | one) and doing it off naked/unsupported asks/vibes/resume
             | enhancement/reddit/hn/etc, you're putting your paying
             | customer at risk. At best you'll be accidentally correct.
             | Being accidentally correct is the absolute worst kind of
             | correct, because inevitably there will come a time when
             | your luck runs out and you just killed your
             | team/organization/company because you made a wrong choice,
             | your paying customers got a worse/slower-to-improve/etc
             | experience, and they deserted you for a more soberly run
             | competitor.
        
         | wg0 wrote:
         | If you haven't broken down your software into 50+ different
         | separate applications written in 15 different languages using 5
         | different database technologies - you'll find very little use
         | for k8s.
         | 
         | All you need is a way to roll out your artifact to production
         | in a roll over or blue green fashion after the preparations
         | such as required database alterations be it data or schema
         | wise.
        
           | javaunsafe2019 wrote:
           | But you do know which problems the k8s abstraction solves,
           | right? Cause it has nothing to do with many languages nor
           | many services but things like discovery, scaling, failover
           | and automation ...
        
             | wg0 wrote:
             | If all you have one single application listening on port
             | 8080 with SSL terminated elsewhere, why would you need so
             | many abstractions in first place.
        
           | imiric wrote:
           | > All you need is a way to roll out your artifact to
           | production in a roll over or blue green fashion after the
           | preparations such as required database alterations be it data
           | or schema wise.
           | 
           | Easier said than done.
           | 
           | You can start by implementing this yourself and thinking how
           | simple it is. But then you find that you also need to decide
           | how to handle different environments, configuration and
           | secret management, rollbacks, failover, load balancing, HA,
           | scaling, and a million other details. And suddenly you find
           | yourself maintaining a hodgepodge of bespoke infrastructure
           | tooling instead of your core product.
           | 
           | K8s isn't for everyone. But it sure helps when someone else
           | has thought about common infrastructure problems and solved
           | them for you.
        
             | mattmanser wrote:
             | You need to remove a lot of things from that list. Almost
             | all of that functionality is available in build tools that
             | have been available for decades. I want to emphasize the
             | DECADES.
             | 
             | And then all you're left with is scaling. Which most
             | business do not need.
             | 
             | Almost everything you've written there is a standard
             | feature of almost any CI toolchain, teamcity, Jenkins,
             | Azure DevOps, etc., etc.
             | 
             | We were doing it before k8s was even written.
        
               | imiric wrote:
               | > Almost all of that functionality is available in build
               | tools that have been available for decades.
               | 
               | Build tools? These are runtime and operational concerns.
               | No build tool will handle these things.
               | 
               | > And then all you're left with is scaling. Which most
               | business do not need.
               | 
               | Eh, sure they do. They might not need to hyperscale, but
               | they could sure benefit from simple scaling, autoscaling
               | at peak hours, and scaling down to cut costs.
               | 
               | Whether they need k8s specifically to accomplish this is
               | another topic, but every business needs to think about
               | scaling in some way.
               | 
               | > Almost everything you've written there is a standard
               | feature of almost any CI toolchain, teamcity, Jenkins,
               | Azure DevOps, etc., etc.
               | 
               | Huh? Please explain how a CI pipeline will handle load
               | balancing, configuration and secret management, and other
               | operational tasks for your services. You may use it for
               | automating commands that do these things, but CI systems
               | are entirely decoupled from core infrastructure.
               | 
               | > We were doing it before k8s was even written.
               | 
               | Sure. And k8s isn't the absolute solution to these
               | problems. But what it does give you is a unified set of
               | interfaces to solve common infra problems. Whatever
               | solutions we had before, and whatever you choose to
               | compose from disparate tools, will not be as unified and
               | polished as what k8s offers. It's up to you to decide the
               | right trade-off, but I find the head-in-the-sand
               | dismissal of it equally as silly as cargo culting it.
        
           | mplewis wrote:
           | Yeah, all you need is a rollout system that supports blue-
           | green! Very easy to homeroll ;)
        
             | wg0 wrote:
             | Not easy, but already a solved problem.
        
         | friendly_deer wrote:
         | Here's a theory about why at least some of these come about:
         | 
         | https://lethain.com/grand-migration/
        
         | lmm wrote:
         | > For instance: they want to go to k8s because they want to use
         | etcd/helm, which they can't on ECS? Why do you want to use
         | etcd/helm? Is it really this important? Is there really no
         | other way to achieve the goals of the company than exactly like
         | that?
         | 
         | I'm no fan of Helm, but there are surprisingly few good
         | alternatives to etcd (i.e. highly available but consistent
         | datastores, suitable for e.g. the distributed equivalent of a
         | .pid file) - Zookeeper is the only one that comes to mind, and
         | it's a real pain on the ops side of things, requiring ancient
         | JVM versions and being generally flaky even then.
        
       | tedunangst wrote:
       | How long will it take to migrate off?
        
         | codetrotter wrote:
         | It's irreversible.
        
         | hujun wrote:
         | depends on how much "k8s native" code you have, there are
         | application designed to run on k8s which uses a lot of k8s api
         | and also if you app already micro-serviced, it is also not
         | straight forward to change it back
        
       | wrs wrote:
       | A migration with the goal of improving the infrastructure
       | foundation is great. However, I was surprised to see that one of
       | the motivations was to allow teams to use Helm charts rather than
       | converting to Terraform. I haven't seen in practice the
       | consistent ability to actually use random Helm charts unmodified,
       | so by encouraging its use you end up with teams forking and
       | modifying the charts. And Helm is such a horrendous tool, you
       | don't really want to be maintaining your own bespoke Helm charts.
       | IMO you're actually _better off_ rewriting in Terraform so at
       | least your local version is maintainable.
       | 
       | Happy to hear counterexamples, though -- maybe the "indent 4"
       | insanity and multi-level string templating in Helm is gone
       | nowadays?
        
         | smellybigbelly wrote:
         | Our team also suffered from the problems you described of
         | public helm charts. There is always something you need to
         | customise to make things work on your own environment. Our
         | approach has been to use the public helm chart as-is and do any
         | customisation with `kustomize --enable-helm`.
        
         | BobbyJo wrote:
         | Helm is quite often the default supported way of launching
         | containerized third-party products. I have works at two
         | separate startups whose 'on prem' product was offered this way.
        
           | freedomben wrote:
           | Indeed. I try hard to minimize the amount of Helm we use, but
           | a significant amount of tools are only shipped as Helm
           | charts. Fortunately I'm increasingly seeing people provide
           | "raw k8s" yaml, but it's far from universal.
        
         | cwiggs wrote:
         | Helm Charts and Terraform are different things IMO. Terraform
         | is better used to deploying cloud resources (s3 bucket, EKS
         | cluster, EKS workers, RDS, etc). Sure you can manage your k8s
         | workloads with Terraform, but I wouldn't recommend it.
         | Terraform having state when you already have your start in k8s
         | makes working with Terraform + k8s a pain. Helm is purpose
         | built for k8s, Terraform is not.
         | 
         | I'm not a fan of Helm either though, templat-ed yaml sucks, you
         | still have the "indent 4" insanity too. Kustomize is nice when
         | things are simple, but once your app is complex Kustomize is
         | worse than Helm IMO. Try to deploy an app that has a ING, with
         | a TLS cert and external-DNS with Kustomize for multiple
         | environments; you have to patch the resources 3 times instead
         | of just have 1 variable you and use in 3 places.
         | 
         | Helm is popular, Terraform is popular so they both are talked a
         | lot, but IMO there is a tool that is yet to become popular that
         | will replace both of these tools.
        
           | wrs wrote:
           | I agree, I wouldn't generate k8s from Terraform either,
           | that's just the alternative I thought the OP was presenting.
           | But I'd still rather convert charts from Helm to pretty much
           | anything else than maintain them.
        
           | stackskipton wrote:
           | Lack of Variable substitution in Kustomize is downright
           | frustrating. We use Flux so we have the feature anyways, but
           | I wish it was built into Kustomize.
        
             | no_circuit wrote:
             | I don't miss variable substitution at all.
             | 
             | For my setup anything that needs to be variable or secret
             | gets specified in a custom json/yaml file which is read by
             | a plugin which in turn outputs the rendered manifest if I
             | can't write it as a "patch". That way the CI/CD runner can
             | access things like the resolved secrets for production
             | without being accessible by developers without elevated
             | access. It requires some digging but there are even
             | annotations that can be used to control things like if
             | Kustomize should add a hash suffix or not to ConfigMap or
             | Secret manifests you generate with plugins.
        
           | tionate wrote:
           | Re your kustomizen complaint, just create a complete env-
           | specific ingress for each env instead of patching.
           | 
           | - it is not really any more lines - doesn't break if dev
           | upgrades to a different version of the resource (has happened
           | before) - allows you to experiment with dev with other setups
           | (eg additional ingresses, different paths etc) instead of
           | changing a base config which will impact other envs
           | 
           | TLDR patch things that are more or less the same in each env;
           | create complete resources for things that change more.
           | 
           | There is a bit of duplication but it is a lot more simple
           | (see 'simple made easy' - rich hockey) than tracing through
           | patches/templates.
        
         | gouggoug wrote:
         | Talking about helm - I personally have come to profoundly
         | loathe it. It was amazing when it came out and filled a much
         | needed gap.
         | 
         | However it is loaded with so many footguns that I spend my time
         | redoing and debugging others engineers work.
         | 
         | I'm hoping this new tool called << timoni >> picks up steam. It
         | fixes pretty every qualm I have with helm.
         | 
         | So if like me you're looking for a better solution, go check
         | timoni.
        
         | JohnMakin wrote:
         | It's completely cursed, but I've started deploying helm via
         | terraform lately. Many people, ironically me included, find
         | that managing deployments via terraform is an anti pattern.
         | 
         | I'm giving it a try and I don't despise it yet, but it feels
         | gross - application configs are typically far more mutable and
         | dynamic than cloud infrastructure configs, and IME, terraform
         | does not likey super dynamic configs.
        
         | solatic wrote:
         | My current employer (BigCo) has Terraform managing both infra +
         | deployments in Terraform, at (ludicrous) scale. It's a
         | nightmare. The problem with Terraform is that you _must_ plan
         | your workspaces such that you will not exceed the best-practice
         | amount of resources per workspace (~100-200) or else plans will
         | drastically slow down your time-to-deploy, checking stuff like
         | databases and networking that you haven 't touched and have no
         | desire to touch. In practice this means creating a latticework
         | of Terraform workspaces that trigger each other, and there are
         | currently no good open-source tools that support it.
         | 
         | Best practice as I can currently see it is to have Terraform
         | set up what you need for continuous delivery (e.g. ArgoCD) as
         | part of the infrastructure, then use the CD tool to handle day-
         | to-day deployments. Most CD tooling then asks you to package
         | your deployment in something like Helm.
        
           | chucky_z wrote:
           | You can setup dependent stacks in CDKTF. It's far from as
           | clean as a standard TF DAG plan/apply but I'm having a lot of
           | success with it right now. If I were actively using k8s at
           | the moment I would probably setup dependent cluster resources
           | using this method, e.g: ensure a clean, finished CSI daemon
           | deployment before deploying a deployment using that CSI
           | provider :)
        
             | solatic wrote:
             | You're right that CDKTF with dependent stacks is probably
             | better than nothing, but (a) CDKTF's compatibility with
             | OpenTofu depends on a lack of breaking changes in CDKTF,
             | since the OpenTofu team didn't fork CDKTF, so this is a
             | little hairy for building production infrastructure; (b)
             | CDKTF stacks, even when they can run in parallel, still run
             | on the same machine that invoked CDKTF. When you have
             | (ludicrous scale) X number of "stacks", this isn't a good
             | fit. It's something that _should_ be doable in one of the
             | managed Terraform services, but the pricing if you try to
             | do (ludicrous scale) parallelism gets to be _insane_.
        
         | mnahkies wrote:
         | Whilst I'll agree that writing helm charts isn't particularly
         | delightful, consuming them can be.
         | 
         | In our case we have a single application/service base helm
         | chart that provides sane defaults and all our deployments
         | extend from. The amount of helm values config required by the
         | consumers is minimal, and there has been very little occasion
         | for a consumer to include their own templates - the base chart
         | exposes enough knobs to avoid this.
         | 
         | When it comes to third-party charts, many we've been able to
         | deploy as is (sometimes with some PRs upstream to add extra
         | functionality), and occasionally we've needed to wrap/fork
         | them. We've deployed far more third-party charts as-is than not
         | though.
         | 
         | One thing probably worth mentioning w.r.t to maintaining our
         | custom charts is the use of helm unittest
         | (https://github.com/helm-unittest/helm-unittest) - it's been a
         | big help to avoid regressions.
         | 
         | We do manage a few kubernetes resources through terraform,
         | including Argocd (via the helm provider which is rather slow
         | when you have a lot of CRDs), but generally we've found helm
         | chart deployed through Argocd to be much more manageable and
         | productive.
        
       | andrewguy9 wrote:
       | I look forward to the blog post where they get off K8, in just 18
       | months.
        
       | surfingdino wrote:
       | ECS makes sense when you are building and breaking stuff. K8s
       | makes sense when you are mature (as an org).
        
       | xiwenc wrote:
       | I'm baffled to see so many anti-k8s sentiments on HN. Is it
       | because most commenters are developers used to services like
       | heroku, fly.io, render.com etc. Or run their apps on VM's?
        
         | elktown wrote:
         | I think some are just pretty sick and tired of the explosion of
         | needless complexity we've seen in the last decade or so in
         | software, and rightly so. This is an industry-wide problem of
         | deeply misaligned incentives (& some amount of ZIRP gold rush),
         | not specific to this particular case - if this one is even a
         | good example of this to begin with.
         | 
         | Honestly, as it stands, I think we'd be seen as pretty useless
         | craftsmen in any other field due to an unhealthy obsession of
         | our tooling and meta-work - consistently throwing any kind of
         | sensible resource usage out of the window in favor of just
         | getting to work with certain tooling. It's some kind of a
         | "Temporarily embarrassed FAANG engineer" situation.
        
           | cwiggs wrote:
           | I agree with this somewhat. The other day I was driving home
           | and I saw a sprinkler head and broke on the side of the road
           | and was spraying water everywhere. It made me think, why
           | aren't sprinkler systems designed with HA in mind? Why aren't
           | there dual water lines with dual sprinkler heads everywhere
           | with an electronic component that detects a break in a line
           | and automatically switches to the backup water line? It's
           | because the downside of having the water spray everywhere,
           | the grass become unhealthy or die is less than how much it
           | would cost to deploy it HA.
           | 
           | In the software/tech industry it's common place to just
           | accept that your app can't be down for any amount of time no
           | matter what. No one checked to see how much more it would
           | cost (engineering time & infra costs) to deploy the app so it
           | would be HA, so no one checked to see if it would be worth
           | it.
           | 
           | I blame this logic on the low interest rates for a decade. I
           | could be wrong.
        
             | loire280 wrote:
             | This week we had a few minutes of downtime on an internal
             | service because of a node rotation that triggered an alert.
             | The responding engineer started to put together a plan to
             | make the service HA (which would have tripled the cost to
             | serve). I asked how frequently the service went down and
             | how many people would be inconvenienced if it did. They
             | didn't know, but when we checked the metrics it had single-
             | digit minutes of downtime this year and fewer than a dozen
             | daily users. We bumped the threshold on the alert to longer
             | than it takes for a pod to be re-scheduled and resolved the
             | ticket.
        
             | fragmede wrote:
             | Why would wanting redundancy be a ZIRP? Is blaming
             | everything on ZIRP like Mercury was in retrograde but for
             | economics dorks?
        
               | felixgallo wrote:
               | Because the company overhired to the point where people
               | were sitting around dreaming up useless features just to
               | justify their workday.
        
               | consteval wrote:
               | It depends on the cost of complexity you're adding.
               | Adding another database or whatever is really not that
               | complex so yeah sure, go for it.
               | 
               | But a lot of companies are building distributed systems
               | purely because they want this ultra-low downtime.
               | Distributed systems are HARD. You get an entire set of
               | problems you don't get otherwise, and the complexity
               | explodes.
               | 
               | Often, in my opinion, this is not justified. Saving a few
               | minutes of downtime in exchange for making your
               | application orders of magnitude more complex is just not
               | worth it.
               | 
               | Distributed systems solve distributed problems. They're
               | overkill if you just want better uptime or crisis
               | recovery. You can do that with a monolith and a database
               | and get 99.99% of the way there. That's good enough.
        
               | addaon wrote:
               | Redundancy, like most engineering choices, is a
               | cost/benefit tradeoff. If the costs are distorted, the
               | result of the tradeoff study will be distorted from the
               | decisions that would be made in "more normal" times.
        
             | zerkten wrote:
             | You assume that the teams running these systems achieve
             | acceptable uptime and companies aren't making refunds for
             | missed uptime targets when contracts enforce that, or
             | losing customers. There is definitely a vision for HA at
             | many companies, but they are struggling with and without
             | k8s.
        
           | bobobob420 wrote:
           | Any software engineer who thinks K8 is complex shouldn't be a
           | software engineer. It's really not that hard to manage.
        
             | LordKeren wrote:
             | I think the key word is "needless" in terms of complexity.
             | There are a lot of k8 projects that probably could benefit
             | from a simpler orchestration system-- especially at smaller
             | firms
        
               | fragmede wrote:
               | do you have a simpler orchestration system you'd
               | recommend?
        
               | qohen wrote:
               | Nomad, from Hashicorp, comes to mind.
               | 
               | https://www.nomadproject.io/
               | 
               | https://github.com/hashicorp/nomad
        
               | ahoka wrote:
               | How is it more simple?
        
               | Ramiro wrote:
               | Every time I read about Nomad, I wonder the same. I swear
               | I'm not trolling here, I honestly don't get how running
               | Nomad is simpler than Kubernetes. Especially considering
               | that there are substantially more resources and help on
               | Kubernetes than Nomad.
        
               | javadevmtl wrote:
               | For me it was DC/OS with marathon and mesos! It worked,
               | it was a tank and it's model was simple.There was also
               | some nice 3rd party open source systems around Mesos that
               | where also simple to use. Unfortunately Kube won.
               | 
               | While nomad can be interesting again it's a single
               | "smallish" vendor pushing an "open" (see debacle with
               | Teraform) source project.
        
             | javadevmtl wrote:
             | No, it just looks and feels like enterprisy SOAP XML
        
           | methodical wrote:
           | Fair point but I think the key point here is unnecessary
           | complexity versus necessary complexity. Are zero-downtime
           | deployments and load balancing unnecessary? Perhaps for a
           | personal project, but for any company with a consistent
           | userbase I'd argue these are a non-negotiable, or should be
           | anyways. In a situation where this is the expectation, k8s
           | seems like the simplest answer, or near enough to it.
        
           | darby_nine wrote:
           | > It's some kind of a "Temporarily embarrassed FAANG
           | engineer" situation.
           | 
           | FAANG engineers made the same mistake, too, even though the
           | analogy implies comparative competency or value.
        
         | maayank wrote:
         | It's one of those technologies where there's merit to use them
         | in some situations but are too often cargo culted.
        
         | caniszczyk wrote:
         | Hating is a sign of success in some ways :)
         | 
         | In some ways, it's nice to see companies move to use mostly
         | open source infrastructure, a lot of it coming from CNCF
         | (https://landscape.cncf.io), ASF and other organizations out
         | there (on top of the random things on github).
        
         | tryauuum wrote:
         | For me it is about VMs. Feel uneasy knowing that any kernel
         | vulnerability will allow a malicious code to escape the
         | container and explore the kubernetes host
         | 
         | There are kata-containers I think, they might solve my angst
         | and make me enjoy k8s
         | 
         | Overall... There's just nothing cool in kubernetes to me.
         | Containers, load balancers, megabytes of yaml -- I've seen it
         | all. Nothing feels interesting enough to try
        
           | stackskipton wrote:
           | vs the Application getting hacked and running lose on the VM?
           | 
           | If you have never dealt with, I have to run these 50
           | containers plus Nginx/CertBot while figuring out which node
           | is best to run it, yea, I can see you not being thrilled
           | about Kubernetes. For the rest of us though, Kubernetes helps
           | out with that easily.
        
         | moduspol wrote:
         | For me personally, I get a little bit salty about it due to
         | imagined, theoretical business needs of being multi-cloud, or
         | being able to deploy on-prem someday if needed. It's tough to
         | explain just how much longer it'll take, how much more
         | expertise is required, how much more fragile it'll be, and how
         | much more money it'll take to build out on Kubernetes instead
         | of your AWS deployment model of choice (VM images on EC2, or
         | Elastic Beanstalk, or ECS / Fargate, or Lambda).
         | 
         | I don't want to set up or maintain my own ELK stack, or
         | Prometheus. Or wrestle with CNI plugins. Or Kafka. Or high
         | availability Postgres. Or Argo. Or Helm. Or control plane
         | upgrades. I can get up and running with the AWS equivalent
         | almost immediately, with almost no maintenance, and usually
         | with linear costs starting near zero. I can solve business
         | problems so, so much faster and more efficiently. It's the
         | difference between me being able to blow away expectations and
         | my whole team being quarters behind.
         | 
         | That said, when there is a genuine multi-cloud or on-prem
         | requirement, I wouldn't want to do it with anything other than
         | k8s. And it's probably not as bad if you do actually work at a
         | company big enough to have a lot of skilled engineers that
         | understand k8s--that just hasn't been the case anywhere I've
         | worked.
        
           | drawnwren wrote:
           | Genuine question: how are you handling load balancing, log
           | aggregation, failure restart + readiness checks, deployment
           | pipelines, and machine maintenance schedules with these
           | "simple" setups?
           | 
           | Because as annoying as getting the prometheus + loki + tempo
           | + promtail stack going on k8s is --- I don't really believe
           | that writing it from scratch is easier.
        
             | felixgallo wrote:
             | He named the services. Go read about them.
        
               | drawnwren wrote:
               | I'm not sure which services you think were named that
               | solve the problems I mentioned, but none were. You're
               | welcome to go read about them, I do this for a living.
        
             | Bjartr wrote:
             | Depending on use case specifics, Elastic Beanstalk can do
             | that just fine.
        
             | moduspol wrote:
             | * Load balancing is handled pretty well by ALBs, and there
             | are integrations with ECS autoscaling for health checks and
             | similar
             | 
             | * Log aggregation happens out of the box with CloudWatch
             | Logs and CloudWatch Log Insights. It's configurable if you
             | want different behavior
             | 
             | * On ECS, you configure a "service" which describes how
             | many instances of a "task" you want to keep running at a
             | given time. It's the abstraction that handles spinning up
             | new tasks when one fails
             | 
             | * ECS supports ready checks, and (as noted above)
             | integrates with ALB so that requests don't get sent to
             | containers until they pass a readiness check
             | 
             | * Machine maintenance schedules are non-existent if you use
             | ECS / Fargate, or at least they're abstracted from you. As
             | long as your application is built such that it can spin up
             | a new task to replace your old one, it's something that
             | will happen automatically when AWS decommissions the
             | hardware it's running on. If you're using ECS without
             | Fargate, it's as simple as changing the autoscaling group
             | to use a newer AMI. By default, this won't replace all of
             | the old instances, but will use the new AMI when spinning
             | up new instances
             | 
             | But again, though: the biggest selling point is the lack of
             | maintenance / babysitting. If you set up your stack using
             | ECS / Fargate and an ALB five years ago, it's still
             | working, and you've probably done almost nothing to keep it
             | that way.
             | 
             | You might be able to do the same with Kubernetes, but your
             | control plane will be out of date, your OSes will have many
             | missed security updates. Might even need a major version
             | update to the next LTS. Prometheus, Loki, Tempo, Promtail
             | will be behind. Your helm charts will be revisions behind.
             | Newer ones might depend on newer apiVersions that your
             | control plane won't support until you update it. And don't
             | forget to update your CNI plugin across your cluster, too.
             | 
             | It's at least one full time job just keeping all that stuff
             | working and up-to-date. And it takes a lot more know-how
             | than just ECS and ALB.
        
               | NewJazz wrote:
               | It seems like you are comparing ECS to a self-managed
               | Kubernetes cluster. Wouldn't it make more sense to
               | compare to EKS or another managed Kubernetes offering?
               | Many of your points don't apply in that case, especially
               | around updates.
        
               | moduspol wrote:
               | A managed Kubernetes offering removes only some of the
               | pain, and adds more in other areas. You're still on the
               | hook for updating whatever add-ons you're using, though
               | yes, it'll depend on how many you're using, and how
               | painful it will be varies depending on how well your
               | cloud provider handles it.
               | 
               | Most of my managed Kubernetes experience is through
               | Amazon's EKS, and the pain I remember included
               | frustration from the supported Kubernetes versions being
               | behind the upstream versions, lack of visibility for
               | troubleshooting control nodes, and having to explain /
               | understand delays in NIC and EBS appropriation /
               | attachments for pods. Also the ALB ingress controller was
               | something I needed to install and maintain independently
               | (though that may be different now).
               | 
               | Though that was also without us going neck-deep into
               | being vendor agnostic. Using EKS just for the Kubernetes
               | abstractions without trying hard to be vendor agnostic is
               | valid--it's just not what I was comparing above because
               | it was usually that specific business requirement that
               | steered us toward Kubernetes in the first place.
               | 
               | If you ARE using EKS with the intention of keeping as
               | much as possible vendor agnostic, that's also valid, but
               | then now you're including a lot of the stuff I complained
               | about in my other comment: your own metrics stack, your
               | own logging stack, your own alarm stack, your own CNI
               | configuration, etc.
        
               | drawnwren wrote:
               | (Apologies for the snark, someone else made a short
               | snarky comment that I felt was also wrong and I thought
               | this thread was in reply to them before I typed it out --
               | thank you for the reply)
               | 
               | - ALBs -- yeah this is correct. However ALBs have much
               | longer startup/health check times than Envoy/Traefik
               | 
               | - Cloudwatch - this is true, however the "configurable"
               | behavior makes cloudwatch trash out of the box. you get
               | i.e. exceptions split across multiple log entries with
               | the default configure
               | 
               | - ECS tasks - yep, but the failure behavior of tasks is
               | horrible because there're no notifications out of the box
               | (you can configure it)
               | 
               | - Fargate does allow you to avoid maintenance, however it
               | has some very hairy edges like i.e. you can't use any
               | container that expects to know its own ip address on a
               | private vpc without writing a custom script. Networking
               | in general is pretty arcane on Fargate and you're going
               | to have to manually write and maintain the breakages from
               | all this
               | 
               | > You might be able to do the same with Kubernetes, but
               | your control plane will be out of date, your OSes will
               | have many missed security updates. Might even need a
               | major version update to the next LTS. Prometheus, Loki,
               | Tempo, Promtail will be behind. Your helm charts will be
               | revisions behind. Newer ones might depend on newer
               | apiVersions that your control plane won't support until
               | you update it. And don't forget to update your CNI plugin
               | across your cluster, too.
               | 
               | I think maybe you haven't used K8S in years. Karpenter,
               | EKS, + a GitOps (Flux or Argo) makes you get the same
               | machine maintenance feeling as ECS but on K8S without any
               | of the annoyances of dealing with ECS. All your app
               | versions can be pinned or set to follow latest as you
               | prefer. You get rolling updates each time you switch
               | machines (same as ECS, and if you really want to you can
               | run on top of Fargate).
               | 
               | By contrast, if your ECS/Fargate instance fails you
               | haven't mentioned any notifications in your list -- so if
               | you forgot to configure and test that correctly, your ECS
               | could legitimately be stuck on a version of your app code
               | that is 3 years old and you might not know if you haven't
               | inspected the correct part of amazon's arcane interface.
               | 
               | By the way, you're paying per use for all of this.
               | 
               | At the end of the day, I think modern Kubernetes is
               | strictly simpler, cheaper, and better than ECS/Fargate
               | out of the box and has the benefit of not needing to rely
               | on 20 other AWS specific services that each have their
               | own unique ways of failing and running a bill up if you
               | forget to do "that one simple thing everyone who uses
               | this niche service should know".
        
               | mrgaro wrote:
               | ECS+Fargate does give you zero maintenance, both in
               | theory and in practise. As someone, who runs k8s at home
               | and manages two clusters at work, I still do recommend
               | our teams to use ECS+Fargate+ALB if they satisfy their
               | requirements for stateless apps and they all love it
               | because it is literaly zero maintenance, unlike you just
               | described what k8s requires.
               | 
               | Sure there are a lot of great feature with k8s which ECS
               | cannot do, but when ECS does satisfy the requirements, it
               | will require less maintenance, no matter what kind of k8s
               | you compare it against to.
        
           | angio wrote:
           | I think you're just used to AWS services and don't see the
           | complexity there. I tried running some stateful services on
           | ECS once and it took me hours to have something _not_
           | working. In Kubernetes it takes me literally minutes to
           | achieve the same task (+ automatic chart updates with
           | renovatebot).
        
             | moduspol wrote:
             | I'm not saying there's no complexity. It exists, and there
             | are skills to be learned, but once you have the skills,
             | it's not that hard.
             | 
             | Obviously that part's not different from Kubernetes, but
             | here's the part that is: maintenance and upgrades are
             | either completely out of my scope or absolutely minimal. On
             | ECS, it might involve switching to a more recently built
             | AMI every six months or so. AWS is famously good about not
             | making backward incompatible changes to their APIs, so for
             | the most part, things just keep working.
             | 
             | And don't forget you'll need a lot of those AWS skills to
             | run Kubernetes on AWS, too. If you're lucky, you'll get
             | simple use cases working without them. But once PVCs aren't
             | getting mounted, or pods are stuck waiting because you ran
             | out of ENI slots on the box, or requests are timing out
             | somewhere between your ALB and your pods, you're going to
             | be digging into the layer between AWS and Kubernetes to
             | troubleshoot those things.
             | 
             | I run Kubernetes at home for my home lab, and it's not zero
             | maintenance. It takes care and feeding, troubleshooting,
             | and resolution to keep things working over the long term.
             | And that's for my incredibly simple use cases (single node
             | clusters with no shared virtualized network, no virtualized
             | storage, no centralized logs or metrics). I've been in
             | charge of much more involved ones at work and the
             | complexity ceiling is almost unbounded. Running a
             | distributed, scalable container orchestration platform is a
             | lot more involved than piggy backing on ECS (or Lambda).
        
           | mountainriver wrote:
           | I hear a lot of comments that sound like people who used K8s
           | years ago and not since. The clouds have made K8s management
           | stupid simple at this point, you can absolutely get up and
           | running immediately with no worry of upgrades on a modern
           | provider like GKE
        
         | archenemybuntu wrote:
         | Kubernetes itself is built around mostly solid distributed
         | system principles.
         | 
         | It's the ecosystem around it which turns things needlessly
         | complex.
         | 
         | Just because you have kubernetes, you don't necessarily need
         | istio, helm, Argo cd, cilium, and whatever half baked stuff is
         | pushed by CNCF yesterday.
         | 
         | For example take a look at helm. Its templating is atrocious,
         | and if I am still correct, it doesn't have a way to order
         | resources properly except hooks. Sometimes resource A
         | (deployment) depends on resource B (some CRD).
         | 
         | The culture around kubernetes dictates you bring in everything
         | pushed by CNCF. And most of these stuff are half baked MVPs.
         | 
         | ---
         | 
         | The word devops has created expectations that back end
         | developer should be fighting kubernetes if something goes
         | wrong.
         | 
         | ---
         | 
         | Containerization is done poorly by many orgs, no care about
         | security and image size. That's a rant for another day. I
         | suspect this isn't a big reason for kubernetes hate here.
        
       | solatic wrote:
       | I don't get the hate for Kubernetes in this thread. TFA is from
       | _Figma_. You can talk all day long about how early startups just
       | don 't need the kind of management benefits that Kubernetes
       | offers, but the article isn't written by someone working for a
       | startup, it's written by a company that nearly got sold to Adobe
       | for $20 billion.
       | 
       | Y'all really don't think a company like Figma stands to benefit
       | from the flexibility that Kubernetes offers?
        
         | BobbyJo wrote:
         | Kubernetes isn't even that complicated, and first party support
         | from cloud providers often means you're doing something in K8s
         | inleu of doing it in a cloud specific way (like ingress vs
         | cloud specific load balancer setups).
         | 
         | At a certain scale, K8s is the simple option.
         | 
         | I think much of the hate on HN comes from the "ruby on rails is
         | all you need" crowd.
        
           | JohnMakin wrote:
           | > I think much of the hate on HN comes from the "ruby on
           | rails is all you need" crowd.
           | 
           | Maybe - people seem really gungho about serverless solutions
           | here too
        
             | dexwiz wrote:
             | The hype for serverless cooled after that article about
             | Prime Video dropping lambda. No one wants a product that a
             | company won't dogfood. I realize Amazon probably uses
             | lambda elsewhere, but it was still a bad look.
        
               | LordKeren wrote:
               | I think it was much more about one specific use case of
               | lambda that was a bad fit for the prime video team's need
               | and not a rejection of lambda/serverless. TBH, it kind of
               | reflected more poorly on the team than lambda as a
               | product
        
               | cmckn wrote:
               | > Amazon probably uses lambda elsewhere
               | 
               | Yes, you could say that. :)
        
               | JohnMakin wrote:
               | not probably, their lambda service powers much of their
               | control plane.
        
           | eutropia wrote:
           | I guess the ones who quietly ship dozens of rails apps on k8s
           | are too busy getting shit done to stop and share their boring
           | opinions about pragmatically choosing the right tool for the
           | job :)
        
             | BobbyJo wrote:
             | "But you can run your rails app on a single host with
             | embedded SQLite, K8s is unnecessary."
        
               | ffsm8 wrote:
               | And there is truth to that. _Most_ deployments are at
               | that level, and it absolutely is way more performant then
               | the alternative. it just comes with several tradeoffs...
               | But these tradeoffs are usually worth it for deployments
               | with  <10k concurrent users. Which Figma certainly _isn
               | 't_.
               | 
               | Though you probably could still do it, but that's likely
               | more trouble then it's worth
               | 
               | (The 10k is just an arbitrary number I made up, there is
               | no magic number which makes this approach unviable, it
               | all depends on how the users interact with the
               | platform/how often and where the data is inserted)
        
               | threeseed wrote:
               | Always said by people who haven't spent much time in the
               | cloud.
               | 
               | Because single hosts will always go down. Just a question
               | of when.
        
               | BossingAround wrote:
               | I love k8s, but bringing back up a single app that
               | crashed is a very different problem from "our k8s is
               | down" - because if you think your k8s won't go down,
               | you're in for a surprise.
               | 
               | You can view a single k8s also as a single host, which
               | will go down at some point (e.g. a botched upgrade, cloud
               | network partition, or something similar). While much less
               | frequent, also much more difficult to get out of.
               | 
               | Of course, if you have a multi-cloud setup with automatic
               | (and periodically tested!) app migration across clouds,
               | well then... Perhaps that's the answer nowadays.. :)
        
               | solatic wrote:
               | > if you think your k8s won't go down, you're in for a
               | surprise
               | 
               | Kubernetes is a remarkably reliable piece of software.
               | I've administered (large X) number of clusters that often
               | had several years of cluster lifetime, each, everything
               | being upgraded through the relatively frequent Kubernetes
               | release lifecycle. We definitely needed some maintenance
               | windows sometimes, but well, no, Kubernetes didn't
               | unexpectedly crash on us. Maybe I just got lucky, who
               | knows. The closest we ever got was the underlying etcd
               | cluster having heartbeat timeouts due to insufficient
               | hardware, and etcd healed itself when the nodes were
               | reprovisioned.
               | 
               | There's definitely a whole lotta stuff in the Kubernetes
               | ecosystem that isn't nearly as reliable, but that has to
               | be differentiated from Kubernetes itself (and the
               | internal etcd dependency).
               | 
               | > You can view a single k8s also as a single host, which
               | will go down at some point (e.g. a botched upgrade, cloud
               | network partition, or something similar)
               | 
               | The managed Kubernetes services solve the whole "botched
               | upgrade" concern. etcd is designed to tolerate cloud
               | network partitions and recover.
               | 
               | Comparing this to sudden hardware loss on a single-VM app
               | is, quite frankly, insane.
        
               | cyberpunk wrote:
               | Even if your entire control plane disappears your nodes
               | will keep running and likely for enough time to build an
               | entirely new cluster to flip over to.
               | 
               | I don't get it either. It's not hard at all.
        
               | BossingAround wrote:
               | Your nodes & containers keep running, but is your
               | networking up when your control plane is down?
        
               | __turbobrew__ wrote:
               | If you start using more esoteric features the reliability
               | of k8s goes down. Guess what happens when you enable the
               | in place vertical pod scaling feature gate?
               | 
               | It restarts every single container in the cluster at the
               | same time:
               | https://github.com/kubernetes/kubernetes/issues/122028
               | 
               | We have also found data races in the statefulset
               | controller which only occurs when you have thousands of
               | statefulsets.
               | 
               | Overall, if you stay on the beaten path k8s reliability
               | is good.
        
               | kayodelycaon wrote:
               | I've been working with rails since 1.2 and I've never
               | seen anyone actually do this. Every meaningful deployment
               | I've seen uses postgres or mysql. (Or god forbid
               | mongodb.) It takes very little time with yours sol
               | statements
               | 
               | You can run rails on a single host using a database on
               | the same server. I've done it and it works just fine as
               | long as you tune things correctly.
        
               | a_bored_husky wrote:
               | > as long as you tune things correctly
               | 
               | Can you elaborate?
        
               | kayodelycaon wrote:
               | I don't remember the exact details because it was a long
               | time ago, but what I do remember is
               | 
               | - Limiting memory usage and number of connections for
               | mysql
               | 
               | - Tracking maximum memory size of rails application
               | servers so you didn't run out a memory by running too
               | many of them
               | 
               | - Avoid writing unnecessarily memory intensive code (This
               | is pretty easy in ruby if you know what you're doing)
               | 
               | - Avoiding using gems unless they were worth the memory
               | use
               | 
               | - Configuring the frontend webserver to start dropping
               | connections before it ran out of memory (I'm pretty sure
               | that was just a guess)
               | 
               | - Using the frontend webserver to handle traffic whenever
               | possible (mostly redirects)
               | 
               | - Using IP tables to block traffic before hitting the
               | webserver
               | 
               | - Periodically checking memory use and turning off
               | unnecessary services and cronjobs
               | 
               | I had the entire application running on a 512mb VPS with
               | roughly 70mb to spare. It was a little less spare than I
               | wanted but it worked.
               | 
               | Most of this was just rate limiting with extra steps. At
               | the time rails couldn't use threads, so there was a hard
               | limit on the number of concurrent tasks.
               | 
               | When the site went down it was due to rate limiting and
               | not the server locking up. It was possible to ssh in and
               | make firewall adjustments instead of a forced restart.
        
               | a_bored_husky wrote:
               | Thank you.
        
           | bamboozled wrote:
           | Agreed, we're a small team and we benefit greatly from
           | managed k8s (EKS). I have to say the whole ecosystem just
           | continues to improve as far as I can tell and the developer
           | satisfaction is really high with it.
           | 
           | Personally I think k8s is where it's at now. The innovation
           | and open source contributions are immense.
           | 
           | I'm glad we made the switch. I understand the frustrations of
           | the past, but I think it was much harder to use 4+ years ago.
           | Now, I don't see how anyone could mess it up so hard.
        
           | MrDarcy wrote:
           | > Kubernetes isn't even that complicated
           | 
           | I've been struggling to square this sentiment as well. I
           | spend all day in AWS and k8s and k8s is at least an order of
           | magnitude simpler than AWS.
           | 
           | What are all the people who think operating k8s is too
           | complicated operating on? Surely not AWS...
        
             | tbrownaw wrote:
             | The thing you already know tends to be less complicated
             | than the thing you don't know.
        
             | rco8786 wrote:
             | I think "k8s is complicated" and "AWS is even more
             | complicated" can both be true.
             | 
             | Doing _anything_ in AWS is like pulling teeth.
        
             | brainzap wrote:
             | The sum is complex, specially with the custom operators.
        
           | chrischen wrote:
           | There are also a lot of cog-in-the-machine engineers here
           | that totally do not get the bigger picture or the vantage
           | point from another department.
        
           | dorianmariefr wrote:
           | ruby on rails is all you need
        
         | logifail wrote:
         | > it's written by a company that nearly got sold to Adobe for
         | $20 billion
         | 
         | (Apologies if this is a dumb question) but isn't Figma big
         | enough to want to do any of their stuff on their own hardware
         | yet? Why would they still be paying AWS rates?
         | 
         | Or is it the case that a high-profile blog post about K8S and
         | being provider-agnostic gets you sufficient discount on your
         | AWS bill to still be value-for-money?
        
           | jeffbee wrote:
           | There are a lot of ex-Dropbox people at Figma who might have
           | learned firsthand that bringing your stuff on-prem under a
           | theory of saving money is an intensely stupid idea.
        
             | logifail wrote:
             | > There are a lot of ex-Dropbox people at Figma who might
             | have learned firsthand that bringing your stuff on-prem
             | under a theory of saving money is an intensely stupid idea
             | 
             | Well, that's one hypothesis.
             | 
             | Another is that "Every maturing company with predictable
             | products must be exploring ways to move workloads out of
             | the cloud. AWS took your margin and isn't giving it back."
             | ( https://news.ycombinator.com/item?id=35235775 )
        
           | ozim wrote:
           | They are preparing for next blog post in a year - ,,how we
           | cut costs by xx% by moving to our own servers".
        
           | hyperbolablabla wrote:
           | I work for a company making ~$9B in annual revenue and we use
           | AWS for everything. I think a big aspect of that is just
           | developer buy-in, as well as reliability guarantees, and
           | being able to blame Amazon when things do go down
        
             | st3fan wrote:
             | Also, you don't have to worry about half of your stack? The
             | shared responsibility model really works.
        
               | consteval wrote:
               | No, you still do. You just replace those sysadmins with
               | AWS dev ops people. But ultimately your concerns haven't
               | gone down, they've changed. It's true you don't have to
               | worry about hardware. But, then again, you can use coloco
               | datacenters or even VPS.
        
           | NomDePlum wrote:
           | Much bigger companies use AWS for very practical well thought
           | out reasons.
           | 
           | Not managing procurement of hardware, upgrades, etc, and a
           | defined standard operating model with accessible
           | documentation and the ability to hire people with experience,
           | and have to hire less people as you are doing less is enough
           | to build a viable and demonstrable business case.
           | 
           | Scale beyond a certain point is hard without support and
           | delegated responsibility.
        
           | tayo42 wrote:
           | There must be a prohibitively expensive upfront cost to buy
           | enough servers to do this. Plus bringing in all the skill
           | that doesn't exist that can stand up and run something like
           | they would require.
           | 
           | I wonder if as time goes on that skill to use hardware is
           | dissappearing. New engineers don't learn it, and the ones
           | that slowly forget. I'm not that sharp on anything I haven't
           | done in years, even if it's in a related domain.
        
           | j_kao wrote:
           | Companies like Netflix with bigger market caps are still on
           | AWS.
           | 
           | I can imagine the productivity of spinning up elastic cloud
           | resources vs fixed data center resourcing being more
           | important, especially considering how frequently a company
           | like Figma ships new features.
        
           | manquer wrote:
           | A valuation is a just headline number which have no
           | operational bearing.
           | 
           | Their ARR in 2022 was around $400M-450M. Say the infra budget
           | at a typical 10% would be $50M. While it is a lot of money,
           | it is not build your hardware money, also not all of it would
           | be compute budget. They also would be spending on other SaaS
           | apps like say Snowflake etc to special workloads like with
           | GPUs, so not all workloads would be in-house ready. I would
           | surprised if their commodity compute/k8s is more than half
           | their overall budgets.
           | 
           | It is lot more likely to slow product growth to focus on this
           | now, especially since they were/are still growing rapidly.
           | 
           | Larger SaaS companies than them in ARR still find using cloud
           | exclusively is more productive/efficient.
        
           | sangnoir wrote:
           | > Why would they still be paying AWS rates?
           | 
           | They are almost certainly not paying sticker prices. Above a
           | certain size, companies tend to have bespoke prices and SLAs
           | that are negotiated in confidence.
        
           | ijidak wrote:
           | It's a fair question.
           | 
           | Data centers are wildly expensive to operate if you want
           | proper security, redundancy, reliability, recoverability,
           | bandwidth, scale elasticity, etc.
           | 
           | And when I say security, I'm not just talking about software
           | level security, but literal armed guards are needed at the
           | scale of a company like Figma.
           | 
           | Bandwidth at that scale means literally negotiating to buy up
           | enough direct fiber and verifying the routes that fiber takes
           | between data centers.
           | 
           | At one of the companies I worked at, it was not uncommon to
           | lose data center connectivity because a farmer's tractor cut
           | a major fiber line we relied on.
           | 
           | Scalability might include tracking square footage available
           | for new racks in physical buildings.
           | 
           | As long as your company is profitable, at anything but
           | Facebook like scale, it may not be worth the trouble to try
           | to run your own data center.
           | 
           | Even if the cloud doesn't save money, it saves mental energy
           | and focus.
        
             | mcpherrinm wrote:
             | There's a ton of middle ground between a fully managed
             | cloud like AWS and building your own hyperscaler datacenter
             | like Facebook.
             | 
             | Renting a few hundred cabinets from Equinix or Digital
             | Realty is going to potentially be hugely cheaper than AWS,
             | but you probably need a team of people to run it. That can
             | be worthwhile if your growth is predictable and especially
             | if your AWS bandwidth bill is expensive.
             | 
             | But then you're building on bare metal. Gotta deploy your
             | own databases, maybe kubernetes for running workloads, or
             | something like VMware for VMs. And you don't get any
             | managed cloud services, so that's another dozen employees
             | you might need.
        
             | shrubble wrote:
             | This is a 20-years-ago take. If your datacenter provider
             | doesn't have multiple fiber entry into the building with
             | multiple carriers, you chose the wrong provider at this
             | point.
        
         | cwiggs wrote:
         | k8s is complex, if you don't need the following you probably
         | shouldn't use it:
         | 
         | * Service discovery
         | 
         | * Auto bin packing
         | 
         | * Load Balancing
         | 
         | * Automated rollouts and rollbacks
         | 
         | * Horizonal scaling
         | 
         | * Probably more I forgot about
         | 
         | You also have secret and config management built in. If you use
         | k8s you also have the added benefit of making it easier to move
         | your workloads between clouds and bare metal. As long as you
         | have a k8s cluster you can mostly move your app there.
         | 
         | Problem is most companies I've worked at in the past 10 years
         | needed multiple of the features above, and they decided to roll
         | their own solution with Ansible/Chef, Terraform, ASGs, Packer,
         | custom scripts, custom apps, etc. The solutions have always
         | been worse than what k8s provides, and it's a bespoke tool that
         | you can't hire for.
         | 
         | For what k8s provides, it isn't complex, and it's all
         | documented very well, AND it's extensible so you can build your
         | own apps on top of it.
         | 
         | I think there are more SWE on HN than
         | Infra/Platform/Devops/buzzword engineers. As a result there are
         | a lot of people who don't have a lot of experience managing
         | infra and think that spinning up their docker container on a VM
         | is the same as putting an app in k8s. That's my opinion on why
         | k8s gets so much hate on HN.
        
           | Osiris wrote:
           | Those all seem important to even moderately sized products.
        
             | worldsayshi wrote:
             | As long as your requirements are simple the config doesn't
             | need to be complex either. Not much more than docker-
             | compose.
             | 
             | But once you start using k8s you probably tend to scope
             | creep and find a lot of shiny things to add to your set up.
        
             | doctorpangloss wrote:
             | Some ways to tell if someone is a great developer are easy.
             | JetBrains IDE? Ample storage space? Solving problems with
             | the CLI? Consistently formatted code using the language's
             | packaging ecosystem? No comments that look like this:
             | # A verbose comment that starts capitalized, followed by a
             | single line of code, cuing you that it was written by a
             | ChatBot.
             | 
             | Some ways to tell if someone is a great developer is hard.
             | You can't tell if someone is a brilliant shipper of
             | features, choosing exactly the right concerns to worry
             | about at the moment, like doing more website authoring and
             | less devops, with a grand plan for how to make everything
             | cohere later; or, if the guy just doesn't know what the
             | fuck he is doing.
             | 
             | Kubernetes adoption is one of those, hard ones. It isn't a
             | strong, bright signal like using PEP 8 and having a
             | `pyproject.toml` with dependencies declared. So it may be
             | obvious to you, "People adopt Kubernetes over ad-hoc
             | decoupled solutions like Terraform because it has, in a
             | Darwinian way, found the smallest set of easily
             | surmountable concerns that should apply to most good
             | applications." But most people just see, "Ahh! Why can't I
             | just write the method bodies for Python function signatures
             | someone else wrote for me, just like they did in CS50!!!"
        
           | maccard wrote:
           | For anyone who thinks this is a laundry list - running two
           | instances of your app with a database means you need almost
           | all of the above.
           | 
           | The _minute_ you start running containers in the cloud you
           | need to think of "what happens if it goes down/how do I
           | update it/how does it find the database", and you need an
           | orchestrator of some sort, IMO. A managed service (I prefer
           | ECS personally as it's just stupidly simple) is the way to go
           | here.
        
             | hnav wrote:
             | Eh, you can easily deploy containers to EC2/GCE and have an
             | autoscaling group/MIG with healthchecks. That's what I'd be
             | doing for a first pass or if I had a monolith (a lot of
             | business is still deploying a big ball of PHP). K8s really
             | comes into its own once you're running lots of
             | heterogeneous stuff all built by different teams. Software
             | reflects organizational structure so if you don't have a
             | centralized infra team you likely don't want container
             | orchestration since it's basically your own cloud.
        
               | maccard wrote:
               | By containers on EC2 you mean installing docker on AMI's?
               | How do you deploy them?
               | 
               | I really do think Google Cloud Run/Azure Container Apps
               | (and then in AWS-land ECS-on-fargate) is the right
               | solution _especially_ in that case - you just shove a
               | container on and tell it the resources you need and
               | you're done.
        
               | gunapologist99 wrote:
               | From https://stackoverflow.com/questions/24418815/how-do-
               | i-instal... , here's an example that you can just paste
               | into your load balancing LaunchConfig and never have to
               | log into an instance at all (just add your own _runcmd:_
               | section -- and, hey, it 's even YAML like everyone loves)
               | #cloud-config              apt:         sources:
               | docker.list:             source: deb [arch=amd64]
               | https://download.docker.com/linux/ubuntu $RELEASE stable
               | keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
               | packages:         - docker-ce         - docker-ce-cli
        
               | cwiggs wrote:
               | Sure you can use an AWS ASG, but I assume you also tie
               | that into an AWS AlB/NLB. Then you use ACM for certs and
               | now you are locked in to AWS times 3.
               | 
               | Instead you can do those 3 and more in k8s and it would
               | be the same manifests regardless which k8s cluster you
               | deploy to, EKS, AKS, GKE, on prem, etc.
               | 
               | Plus you don't get service discovery across VMs, you
               | don't get a CSI so good luck if your app is stateful. How
               | do you handle secrets, configs? How do you deploy
               | everything, Ansible, Chef? The list goes on and on.
               | 
               | If your app is simple sure, I haven't seen simple app in
               | years.
        
               | maccard wrote:
               | I've never worked anywhere that has benefitted from
               | avoiding lock-in. We would have saved thousands in dev-
               | hours if we just used an ALB instead of tweaking nginx
               | and/or caddy.
               | 
               | Also, if you can't convert an ALB into an Azure Load
               | balancer, then you probably have no business doing any
               | sort of software development.
        
               | MrDarcy wrote:
               | I did this. It's not easier than k8s, GKE, EKS, etc....
               | It's harder cause you have to roll it yourself.
               | 
               | If you do this just use GKE autopilot. It's cheaper and
               | done for you.
        
           | gunapologist99 wrote:
           | It's worth bearing in mind that, although any of these can be
           | accomplished with any number of other products as you point
           | out, LB and Horizontal Scaling, in particular, have been
           | solved problems for more than 25 years (or longer depending
           | on how you count)
           | 
           | For example, even servers (aka instances/vms/vps) with load
           | balancers (aka fabric/mesh/istio/traefik/caddy/nginx/ha
           | proxy/ATS/ALB/ELB/oh just shoot me) in front existed for apps
           | that are LARGER than can fit on a single server (virtually
           | the definition of horizontally scalable). These apps are
           | typically monoliths or perhaps app tiers that have fallen out
           | of style (like the traditional n-tier architecture of app
           | server-cache-database, swap out whatever layers you like).
           | 
           | However, K8s is actually more about microservices. Each
           | microservice can act like a tiny app on its own, but they are
           | often inter-dependent and, especially at the beginning, it's
           | often seen as not cost-effective to dedicate their own
           | servers to them (along with the associate load balancing,
           | redundant and cross-AZ, etc). And you might not even know
           | what the scaling pain points for an app is, so this gives you
           | a way to easily scale up without dedicating slightly
           | expensive instances or support staff to running each cluster;
           | your scale point is on the entire k8s cluster itself.
           | 
           | Even though that is ALL true, it's also true that k8s' sweet
           | spot is actually pretty narrow, and many apps _and teams_
           | probably won 't benefit from it that much (or not at all and
           | it actually ends up being a net negative, and that's not even
           | talking about the much lower security isolation between
           | containers compared to instances; yes, of course, k8s can
           | schedule/orchestrate VMs as well, but no one really does
           | that, unfortunately.)
           | 
           | But, it's always good resume fodder, and it's about the
           | closest thing to a standard in the industry right now, since
           | everyone has convinced themselves that the standard multi-AZ
           | configuration of 2014 is just too expensive or complex to run
           | compared to k8s, or something like that.
        
           | drdaeman wrote:
           | > For what k8s provides, it isn't complex, and it's all
           | documented very well
           | 
           | I had a different experience. Some years ago I wanted to set
           | up a toy K8s cluster over an IPv6-only network. It was a
           | total mess - documentation did not cover this case (at least
           | I have not found it back then) and there was _a lot_ of code
           | to dig through to learn that it was not really supported back
           | then as some code was hardcoded with AF_INET assumptions (I
           | think it 's all fixed nowadays). And maybe it's just me, but
           | I really had much easier time navigating Linux kernel source
           | than digging through K8s and CNI codebases.
           | 
           | This, together with a few very trivial crashes of "normal"
           | non-toy clusters that I've seen (like two nodes suddenly
           | failing to talk to each other, typically for simple textbook
           | reasons like conntrack issues), resulted in an opinion "if
           | something about this breaks, I have very limited ideas what
           | to do, and it's a huge behemoth to learn". So I believe that
           | simple things beat complex contraptions (assuming a simple
           | system can do all you want it to do, of course!) in the long
           | run because of the maintenance costs. Yeah, deploying K8s and
           | running payloads is easy. Long-term maintenance - I'm not
           | convinced that it can be easy, for a system of that scale.
           | 
           | I mean, I try to steer away from K8s until I find a use case
           | for it, but I've heard that when K8s fails, a lot of people
           | just tend to deploy a replacement and migrate all payloads
           | there, because it's easier to do so than troubleshoot. (Could
           | be just my bubble, of course.)
        
           | YZF wrote:
           | There are other out of the box features that are useful:
           | 
           | * Cert manager.
           | 
           | * External-dns.
           | 
           | * Monitoring stack (e.g. Grafana/Prometheus.)
           | 
           | * Overlay network.
           | 
           | * Integration with deployment tooling like ArgoCD or
           | Spinnaker.
           | 
           | * Relatively easy to deploy anything that comes with a helm
           | chart (your database or search engine or whatnot).
           | 
           | * Persistent volume/storage management.
           | 
           | * High availability.
           | 
           | It's also about using containers which mean there's a lot
           | less to manage in hosts.
           | 
           | I'm a fan of k8s. There's a learning curve but there's a huge
           | ecosystem and I also find the docs to be good.
           | 
           | But if you don't need any of it - don't use it! It is
           | targeting a certain scale and beyond.
        
             | epgui wrote:
             | To this I would also add the ability to manage all of your
             | infrastructure with k8s manifests (eg.: crossplane).
        
             | kachapopopow wrote:
             | I started with kubernetes and have never looked back. Being
             | able to bring up a network copy, deploy a clustered
             | database, deploy a distributed fs all in 10 minutes
             | (including the install of k3s or k8s) has been a game-
             | changer for me.
             | 
             | You can run monolithic apps with no downtime restarts quite
             | easily with k8s using rollout restart policy which is very
             | useful when applications take minutes to start.
        
               | BobbyJo wrote:
               | 100%
               | 
               | I can bring up a service, connect it to a
               | postgres/redis/minio instance, and do almost anything
               | locally that I can do in the cloud. It's a massive help
               | for iterating.
               | 
               | There is a learning curve, but you learn it and you can
               | do _so damn much so damn easily_.
        
               | methodical wrote:
               | In the same vein here.
               | 
               | Every time I see one of these posts and the ensuing
               | comments I always get a little bit of inverse imposter
               | syndrome. All of these people saying "Unless you're at
               | 10k users+ scale you don't need k8s". If you're running a
               | personal project with a single-digit user count, then
               | sure, but only purely out of a cost-to-performance metric
               | would I say k8s is unreasonable. Any scale larger,
               | however, and I struggle to reconcile this position with
               | the reality that anything with a consistent user base
               | _should_ have zero-downtime deployments, load balancing,
               | etc. Maybe I 'm just incredibly OOTL, but when did these
               | simple features to implement and essentially free from a
               | cost standpoint become optional? Perhaps I'm just
               | misunderstanding the argument, and the argument is that
               | you should use a Fly or Vercel-esque platform that
               | provides some of these benefits without needing to
               | configure k8s. Still, the problem with this mindset is
               | that vendor lock-in is a lot harder to correct once a
               | platform is in production and being used consistently
               | without prolonged downtime.
               | 
               | Personally, I would do early builds with Fly and once I
               | saw a consistent userbase I'd switch to k8s for scale,
               | but this is purely due to the cost of a minimal k8s
               | instance (especially on GKE or EKS). This, in essence,
               | allows scaling from ~0 to ~1M+ with the only bottleneck
               | being DB scaling (if you're using a single DB like
               | CloudSQL).
               | 
               | Still, I wish I could reconcile my personal disconnect
               | with the majority of people here who regard k8s as overly
               | complicated and unnecessary. Are there really that many
               | shops out there who consider the advantages of k8s above
               | them or are they just achieving the same result in a
               | different manner?
               | 
               | One could certainly learn enough k8s in a weekend to
               | deploy a simple cluster. Now I'm not recommending this
               | for someone's company's production instance, due to the
               | foot guns if improperly configured, but the argument of
               | k8s being too complicated to learn seems unfounded.
               | 
               | /rant
        
               | shakiXBT wrote:
               | I've been in your shoes for quite a long time. By now
               | I've accepted that a lot of folks on HN and other similar
               | forums simply don't know / care about the issue that
               | Kubernetes resolves, or that someone else in their
               | company takes care of those for them
        
               | AndrewKemendo wrote:
               | It's actually much simpler than that
               | 
               | k8s makes it easier to build over engineered
               | architectures for applications that don't need that level
               | of complexity
               | 
               | So while you are correct that it is not actually that
               | difficult to learn and implement K8S it's also almost
               | always completely unnecessary even at the largest scale
               | 
               | given that you can do the largest scale stuff without it
               | and you should do most small scale stuff without it, the
               | number of people for whom all of the risks and costs
               | balancr out is much smaller than the amount that it has
               | been promoted and pushed
               | 
               | And given the fact that orchestration layers are a
               | critical part of infrastructure, handing over or changing
               | the data environment relationship in a multilayer
               | computing environment to such an extent is a non-trivial
               | one-way door
        
               | uaas wrote:
               | With the simplicity and cost of k3s and alternatives it
               | can also make sense for personal projects from day one.
        
           | tbrownaw wrote:
           | > _k8s is complex, if you don 't need the following you
           | probably shouldn't use it:_
           | 
           | I use it (specifically, the canned k3s distro) for running a
           | handful of single-instance things like for example plex on my
           | utility server.
           | 
           | Containers are a very nice UX for isolating apps from the
           | host system, and k8s is a very nice UX for running things
           | made out of containers. Sure it's _designed_ for complex
           | distributed apps with lots of separate pieces, but it still
           | handles the degenerate case (single instance of a single
           | container) just fine.
        
           | st3fan wrote:
           | If you don't need any of those things then your use of k8s
           | just becomes simpler.
           | 
           | I find k8s an extremely nice platform to deploy simple things
           | in that don't need any of the advanced features. All you do
           | is package your programs as containers and write a minimal
           | manifest and there you go. You need to learn a few new
           | things, but the things you do not have to worry about that is
           | a really great return.
           | 
           | Nomad is a good contender in that space but I think HashiCorp
           | is letting it slowly become EOL and there are bascially zero
           | Nomad-As-A-Service providers.
        
             | hylaride wrote:
             | If you don't need any of those things, going for a
             | "serverless" option like fargate or whatever other cloud
             | equivalents exist is a far better value prop. Then you
             | never have to worry about k8s support or upgrades (of
             | course, ECS/fargate is shit in its own ways, in particular
             | the deployments being tied to new task definitions...).
        
         | tracerbulletx wrote:
         | Even on a small project it's actually better imo than tying
         | everything to a platform like netlify or vercel. I have this
         | little notepad app that I deploy to a two node cluster in a
         | github action and its an excellent workflow. The k8s to get
         | everything deployed, provision tls and everything on commit is
         | like 150 lines of mostly boilerplate yaml, I could pretty
         | easily make it support branch previews or whatever too.
         | https://github.com/SteveCastle/modelpad
        
         | saaspirant wrote:
         | Unrelated: What does _TFA_ mean here? Google and GPT didn't
         | help (even with context)
        
           | solatic wrote:
           | The Featured Article.
           | 
           | (or, if you read it in a frustrated voice, The F**ing
           | Article.)
        
             | Xixi wrote:
             | Related acronyms: RTFA (Read The F**ing Article) and RTFM
             | (Read The F**ing Manual). The latter was a very common
             | answer when struggling with Linux in the early 2000s...
        
         | osigurdson wrote:
         | Kubernetes is the most amazing piece of software engineering
         | that I have ever seen. Most of the hate is merely being
         | directed at the learning curve.
        
           | otabdeveloper4 wrote:
           | No, k8s is shit.
           | 
           | It's only useful for the degenerate "run lots of instances of
           | webapp servers running slow interpreted languages" use case.
           | 
           | Trying to do anything else in it is madness.
           | 
           | And for the "webapp servers" use case they could have built
           | something a thousand times simpler and more robust. Serving
           | templated html ain't rocket science. (At least compared to
           | e.g. running an OLAP database cluster.)
        
             | shakiXBT wrote:
             | Could you please bless us with another way to easily
             | orchestrate thousands of containers in a cloud vendor
             | agnostic fashion? Thanks!
             | 
             | Oh, and just in case your first rebuttal is "having
             | thousands of containers means you've already failed" - not
             | everyone works in a mom n pop shop
        
               | otabdeveloper4 wrote:
               | Read my post again.
               | 
               | Just because k8s is the only game in town doesn't mean it
               | is technically any good.
               | 
               | As a technology it is a total shitshow.
               | 
               | Luckily, the problem it solves ("orchestrating" slow
               | webapp containers) is not a problem most professionals
               | care about.
               | 
               | Feature creep of k8s into domains it is utterly
               | unsuitable for because devops wants a pay raise is a
               | different issue.
        
               | shakiXBT wrote:
               | > Orchestrating containers is not a problem most
               | professionals care about
               | 
               | I truly wish you were right, but maybe it's good job
               | security for us professionals!
        
               | osigurdson wrote:
               | >> As a technology it is a total shitshow.
               | 
               | What aspects are you referring to?
               | 
               | >> is not a problem most professionals care about
               | 
               | professional as in True Scotsman?
        
               | otabdeveloper4 wrote:
               | > professional as in True Scotsman?
               | 
               | No, I mean that Kubernetes solves a super narrow and
               | specific problem that most developers do not need to
               | solve.
        
               | candiddevmike wrote:
               | > Oh, and just in case your first rebuttal is "having
               | thousands of containers means you've already failed" -
               | not everyone works in a mom n pop shop
               | 
               | The majority of folks, whether or not they admit it,
               | probably do...
        
             | osigurdson wrote:
             | Does this meet your definition of madness?
             | 
             | https://openai.com/index/scaling-kubernetes-to-7500-nodes/
        
               | otabdeveloper4 wrote:
               | Yeah, they basically spent a shitload of effort
               | developing their own cluster management platform that
               | turns off all the Kubernetes functionality in Kubernetes.
               | 
               | Must be some artifact of hosting on Azure, because I
               | can't imagine any other reason to do something this
               | contorted.
        
           | spmurrayzzz wrote:
           | I agree with respect to admiring it from afar. I've gone
           | through large chunks of the source many times and always have
           | an appreciation for what it does and how it accomplishes it.
           | It has a great, supportive community around it as well (if
           | not a tiny bit proselytizing at times, which doesn't bother
           | me really).
           | 
           | With all that said, while I have no "hate" for the stack, I
           | still have no plans to migrate our container infrastructure
           | to it now or in the foreseeable future. I say that precisely
           | _because_ I 've seen the source, not in spite of it. The net
           | ROI on subsuming that level of complexity for most
           | application ecosystems just doesn't strike me as obvious.
        
           | hylaride wrote:
           | Not to be rude, but K8s has had some very glaring issues,
           | especially early on when the hype was at max.
           | 
           | * Its secrets management was terrible, and for awhile it
           | stored them in plaintext in etcd. * The learning curve was
           | real and that's dangerous as there were no "best practice"
           | guides or lessons learned. There are lots of horror stories
           | of upgrades gone wrong, bugs, etc. Complexity leaves a
           | greater chance of misconfiguration, which can cause security
           | or stability problems. * It was often redundant. If you're in
           | the cloud, you already had load balancers, service discovery,
           | etc. * Upgrades were dangerous and painful in its early days.
           | * It initially had glaring third party tooling integration
           | issues, which made monitoring or package management harder
           | (and led to third party apps like Helm, etc).
           | 
           | A lot of these have been rectified, but a lot of us have been
           | burned by the promise of a tool that google said was used
           | internally, which was a bit of a lie as kubernetes was a
           | rewrite of Borg.
           | 
           | Kubernetes is powerful, but you can do powerful in simple(r)
           | ways, too. If it was truly "the most amazing" it would have
           | been designed to be simple by default with as much complexity
           | needed as everybody's deployments. It wasn't.
        
         | globular-toast wrote:
         | I don't get the hate even if you are a small company. K8s has
         | massively simplified our deployments. It used to be each app
         | had it's own completely different deployment process. Could
         | have been a shell script that SSHed to some VM. Who managed
         | said VM? Did it do its own TLS termination? Fuck knows. Maybe
         | they used Ansible. Great, but that's another tool to learn and
         | do I really need to set up bare metal hosts from scratch for
         | every service. No, so there's probably some other Ansible
         | config somewhere that sets them up. And the secrets are stored
         | where? Etc etc.
         | 
         | People who say "you don't need k8s" never say what you do need.
         | K8s gives us a uniform interface that works for everything. We
         | just have a few YAML files for each app and it just works. We
         | can just chuck new things on there and don't even have to think
         | about networking. Just add a Service and it's magically
         | available with a name to everything in the cluster. I know how
         | to do this stuff from scratch and I do not want to be doing it
         | every single time.
        
           | shakiXBT wrote:
           | if you don't need High Availability you can even deploy to a
           | single node k3s cluster. It's still miles better than having
           | to setup systemd services, an Apache/NGINX proxy, etc. etc.
        
             | globular-toast wrote:
             | Yep, and you can get far with k3s "fake" load balancer
             | (ServiceLB). Then when you need a more "real" cluster
             | basically all the concepts are the same you just move to a
             | new cluster.
        
         | roydivision wrote:
         | One thing I learned when I started learning Kubernetes is that
         | it is two disciplines that overlap, but are distinct none the
         | less:
         | 
         | - Platform build and management - App build and management
         | 
         | Getting a stable K8s cluster up and running is quite different
         | to building and running apps on it. Obviously there is overlap
         | in the knowledge required, but there is a world of difference
         | between using a cloud based cluster over your own home made
         | one.
         | 
         | We are a very small team and opted for cloud managed clusters,
         | which really freed me up to concentrate on how to build and
         | manage applications running on it.
        
         | smitelli wrote:
         | I can only speak for myself, and some of the reasons why K8s
         | has left a bad taste in my mouth:
         | 
         | - It can be complex depending on the third-party controllers
         | and operators in use. If you're not anticipating how they're
         | going to make your resources behave differently than the
         | documentation examples suggest they will, it can be exhausting
         | to trace down what's making them act that way.
         | 
         | - The cluster owners encounter forced software updates that
         | seem to come at the most inopportune times. Yes, staying fresh
         | and new is important, but we have other actual business goals
         | we have to achieve at the same time and -- especially with the
         | current cost-cutting climate -- care and feeding of K8s is
         | never an organizational priority.
         | 
         | - A bunch of the controllers we relied on felt like alpha grade
         | toy software. We went into each control plane update (see
         | previous point) expecting some damn thing to break and require
         | more time investment to get the cluster simply working like it
         | was before.
         | 
         | - While we (cluster owners) _begrudgingly_ updated, software
         | teams that used the cluster absolutely did not. Countless
         | support requests for broken deployments, which were all
         | resolved by hand-holding the team through a Helm chart update
         | that we advised them they 'd need to do months earlier.
         | 
         | - It's really not cheaper than e.g. ECS, again, in my
         | experience.
         | 
         | - Maybe this has/will change with time, but I really didn't see
         | the "onboarding talent is easier because they already know it."
         | They absolutely did not. If you're coming from a shop that used
         | Istio/Argo and move to a Linkerd/Flux shop, congratulations,
         | now there's a bunch to unlearn and relearn.
         | 
         | - K8s is the first environment where I palpably felt like we as
         | an industry reached a point where there were so many layers and
         | layers on top of abstractions of abstractions that it became
         | un-debuggable _in practice_. This is points #1-3 coming
         | together to manifest as weird latency spikes, scaling runaways,
         | and oncall runbooks that were tantamount to  "turn it off and
         | back on."
         | 
         | Were some of these problems organizational? Almost certainly.
         | But K8s had always been sold as this miracle technology that
         | would relieve so many pain points that we would be better off
         | than we had been. In my experience, it did not do that.
        
           | ahoka wrote:
           | What would be the alternative?
        
       | breakingcups wrote:
       | I feel so out of touch when I read a blog post which casually
       | mentions 6 CNCF projects with kool names that I've never heard
       | of, for gaining seemingly simple functionality.
       | 
       | I'm really wondering if I'm aging out of professional software
       | development.
        
         | renewiltord wrote:
         | Nah, there's lots of IC work. It just means that you're
         | unfamiliar with one approach to org scaling: abstracting over
         | hardware, logging, retrying handled by platform team.
         | 
         | It's not the only approach so you may well be familiar with
         | others.
        
       | twodave wrote:
       | TL;DR because they already ran everything in containers. Having
       | performed a migration where this wasn't the case, the path from
       | non-containerized to containerized is way more effort than going
       | from containerized non-k8s to k8s.
        
       | _pdp_ wrote:
       | In my own experience, AWS Fargate is easier, more secure and way
       | more robust then running your K8S even with EKS.
        
         | watermelon0 wrote:
         | Do you mean ECS Fargate? Because you can use AWS Fargate with
         | EKS, with some limitations.
        
           | _pdp_ wrote:
           | Yes, ECS Fargate.
        
       | ko_pivot wrote:
       | I'm not surprised that the first reason they state for moving off
       | of ECS was the lack of support for stateful services. The lack of
       | integration between EBS and ECS has always felt really strange to
       | me, considering that AWS already built all the logic to integrate
       | EKS with EBS in a StatefulSet compliant way.
        
         | datatrashfire wrote:
         | https://aws.amazon.com/about-aws/whats-new/2024/01/amazon-ec...
         | 
         | This was actually added beginning of the year. Definitely was
         | on my most wanted list for a while. You could technically use
         | EFS, but that's a very expensive way to run anything IO
         | intensive.
        
           | ko_pivot wrote:
           | This adds support for ephemeral EBS volumes. When a task is
           | created a volume gets created, and when the task is
           | destroyed, for whatever reason, the volume is destroyed too.
           | It has no concept of task identity. If the task needs to be
           | moved to a new host, the volume is destroyed.
        
       | julienmarie wrote:
       | I personally love k8s. I run multiple small but complex custom
       | e-commerce shops and handle all the tech on top of marketing,
       | finance and customer service.
       | 
       | I was running on dedicated servers before. My stack is quite
       | complicated and deploys were a nightmare. In the end the dread of
       | deploying was slowing down the little company.
       | 
       | Learning and moving to k8s took me a month. I run around 25
       | different services ( front ends, product admins, logistics
       | dashboards, delivery routes optimizers, orsm, ERP, recommendation
       | engine, search, etc.... ).
       | 
       | It forced me to clean my act and structure things in a repeatable
       | way. Having all your cluster config in one place allows you to
       | exactly know the state of every service, which version is
       | running.
       | 
       | It allowed me to do rolling deploys with no downtime.
       | 
       | Yes it's complex. As programmers we are used to complex. An Nginx
       | config file is complex as well.
       | 
       | But the more you dive into it the more you understand the
       | architecture if k8s and how it makes sense. It forces you to
       | respect the twelve factors to the letter.
       | 
       | And yes, HA is more than nice, especially when your income is
       | directly linked to the availability and stability of your stack.
       | 
       | And it's not that expensive. I lay around 400 usd a month in
       | hosting.
        
         | maccard wrote:
         | Figma were running on ECS before, so they weren't just running
         | dedicated servers.
         | 
         | I'm a K8S believer, but it _is_ complicated. It solves hard
         | problems. If you're multi-cloud, it's a no brainer. If you're
         | doing complex infra that you want a 1:1 mapping of locally, it
         | works great.
         | 
         | But if you're less than 100 developers and are deploying
         | containers to just AWS, I think you'd be insane to use EKS over
         | ECS + Fargate in 2024.
        
           | epgui wrote:
           | I don't know if it's just me, but I really don't see how
           | kubernetes is more complex than ECS. Even for a one-man show.
        
             | mrgaro wrote:
             | Kubernetes needs regular updates, just as everything else
             | (unless you carefully freeze your environment and somehow
             | manage the vulnerability risks) and that requires manual
             | work.
             | 
             | ECS+Fargate however does not. If you are a small team
             | managing the entire stack, you need to factor this into
             | accounts. For example EKS forces you to upgrade the cluster
             | to keep in the main kubernetes release cycle, albeit you
             | can delay it somewhat.
             | 
             | I personally run k8s at home and another two at work and I
             | recommend our teams to use ECS+Fargate+ALB if it is enough
             | for them.
        
               | metaltyphoon wrote:
               | > Kubernetes needs regular updates, just as everything
               | else (unless you carefully freeze your environment and
               | somehow manage the vulnerability risks) and that requires
               | manual work
               | 
               | Just use a managed K8s solution that deals with this?
               | AKS, EKS and GKE all do this for you.
        
               | ttymck wrote:
               | It doesn't do everything for you. You still need to
               | update applications that use deprecated APIs.
               | 
               | This sort of "just" thinking is a great way for teams to
               | drown in ops toil.
        
               | metaltyphoon wrote:
               | Are you assuming the workloads have to use K8s APIs?
               | Where is this coming from? If that's not the case can you
               | actually explain with a concrete example?
        
               | Ramiro wrote:
               | I agree with @metaltyphoon on this. Even for small teams,
               | a managed version of Kubernetes takes away most of the
               | pain. I've used both ECS+Fargate and Kubernetes, but
               | these days, I prefer Kubernetes mainly because the
               | ecosystem is way bigger, both vendor and open source.
               | Most of the problems we run into are always one search or
               | open source project away.
        
           | mountainriver wrote:
           | This just feels like a myth to me at this point. Kubernetes
           | isn't hard, the clouds have made is so simple now that it's
           | in no way more difficult than ECS and is way more flexible
        
             | davewritescode wrote:
             | I'm not saying I agree with the comment above you but
             | Kubernetes upgrades and keeping all your addons/vpc stuff
             | up to date can be a never ending slog of one-way upgrades
             | that, when they go wrong, can cause big issues.
        
               | organsnyder wrote:
               | Those are all issues that should be solved by the managed
               | provider.
               | 
               | It's been a while since I spun up a k8s instance on AWS,
               | Azure, or the like, but when I did I was astounded at how
               | many implementation decisions and toil I had to do
               | myself. Hosted k8s should be plug-and-play unless you
               | have a very specialized use-case.
        
       | xyst wrote:
       | Of course there's no mention of performance loss or gain after
       | migration.
       | 
       | I remember when microservices architecture was the latest hot
       | trend that came off the presses. Small and big firms were racing
       | to redesign/reimplement apps. But most forgot they weren't
       | Google/Netflix/Facebook.
       | 
       | I remember end user experience ended up being _worse_ after the
       | implementation. There was a saturation point where a single micro
       | service called by all of the other micro services would cause
       | complete system meltdown. There was also the case of an
       | "accidental" dependency loop (S1 -> S2 -> S3 -> S1). Company
       | didn't have an easy way to trace logs across different services
       | (way before distributed tracing was a thing). Turns out only a
       | specific condition would trigger the dependency loop (maybe, 1 in
       | 100 requests?).
       | 
       | Good times. Also, job safety.
        
         | api wrote:
         | This is a very fad driven industry. One of the things you earn
         | after being in it for a long time is intuition for spotting
         | fads and gratuitous complexity traps.
        
           | makeitdouble wrote:
           | I think that aspect is indirectly covered, as one of the main
           | motivation was to get on a popular platform that helps
           | hiring.
           | 
           | I agree on how it's technically a waste of time to pursue
           | fads, but it's also a huge PITA to have a platform that good
           | engineers actively try to avoid, as their careers would
           | stagnate (even as they themselves know that it's half a fad)
        
             | sangnoir wrote:
             | I avoid working at organisations with NIH syndrome - if
             | they are below a certain size (i.e. they lack a standing
             | dev-eng team to support their homegrown K8s "equivalent").
             | Extra red flags if the said homegrown-system was developed
             | by _that guy[1]_ who 's ostensibly a genius and has very
             | strong opinions about his system. Give me k8s' YAML-hell
             | any day instead, at least that bloat has transferable
             | skills, amd I can actually Google common resolutions.
             | 
             | 1. Has been at org for so long, management condones them
             | flaunting the rules, like pushing straight to prod. Hates
             | the "inefficiency" of open source platforms and purpose-
             | built something "suitable for the company" by themselves,
             | no documentation you have to ask them to fix issues because
             | they don't accept code or suggestions from others. The DSL
             | they developed is inconsistent and has no parser/linter.
        
               | fragmede wrote:
               | yeah. If you think Kubernetes is too complicated, the
               | flip side of that is someone built the simpler thing, but
               | then unfortunately it grew and grew, and now you've got
               | this mess of a system. you could have just used a hosted
               | k8s or k3s system from the start instead of reinventing
               | the wheel.
               | 
               | absoutely start as simple as you can, but plan to move to
               | a hosted kube something asap instead of writing your own
               | base images, unless that's a differentiator for your
               | company.
        
           | keybored wrote:
           | 1. You have to constantly learn to keep up!
           | 
           | 2. Fad-driven
           | 
           | I wonder why I don't often see (1) critiqued on the basis of
           | (2).
        
             | api wrote:
             | There's definitely a connection. Some of the change is
             | improvement, like memory safe languages, but a ton of it is
             | fads and needless complexity invented to provide a reason
             | for some business or consultancy or group in a FAANG to
             | exist. The rest of the industry cargo cults the big
             | companies.
        
         | teleforce wrote:
         | > There was also the case of an "accidental" dependency loop
         | (S1 -> S2 -> S3 -> S1).
         | 
         | The classic dependency loop example that you thought will never
         | encounter again for the rest of your life after OS class
        
         | pram wrote:
         | The best part is when it's all architected to depend on
         | something that becomes essentially a single point of failure,
         | like Kafka.
        
           | bamboozled wrote:
           | Not saying you're wrong, but what is your grand plan ? I've
           | never seen anything perfect.
        
           | intelVISA wrote:
           | Shhh, you're ruining the party!
        
           | arctek wrote:
           | Isn't this somewhat better, at least when it fails it's in a
           | single place?
           | 
           | As someone using Kafka, I'd like to know what the (good)
           | alternatives are if you have suggestions.
        
             | happymellon wrote:
             | It really depends on what your application is.
             | 
             | Where I'm at, most of Kafka usage adds nothing of note and
             | could be replaced with a rest service. It sounds good that
             | Kafka makes everything execute in order, but honestly just
             | making requests block does the same thing.
             | 
             | At least then I could autoscale, which Kafka prevents.
        
             | NortySpock wrote:
             | NATS JetStream seemed to support horizontal scaling (either
             | hierarchical via leaf nodes or a flat RAFT quorum) and back
             | pressure when I played with it.
             | 
             | I found it easy to get up and running, even as a RAFT
             | cluster, but I have not tried to use JetStream mode heavily
             | yet.
        
           | lmm wrote:
           | At least Kafka can be properly master-master HA. How people
           | ever got away with building massively redundant fault-
           | tolerant applications that were completely dependent on a
           | single SQL server, I'll never understand.
        
             | trog wrote:
             | I think because it works pretty close to 100 percent of the
             | time with only the most basic maintenance and care (like
             | making sure you don't run out of disk space and keeping up
             | with security updates). You can go amazingly far with this,
             | and adding read only replicas gets you a lot further with
             | little extra effort.
        
             | slt2021 wrote:
             | stack overflow has no problem serving entire planet from
             | just _four_ SQL Servers
             | (https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
             | ar...)
             | 
             | There is really nothing wrong with a large vertically
             | scaled up SQL server. You need to be either really really
             | large large scale - or really really UNSKILLED in sql as to
             | keep your relational model and working set in SQL so bad
             | that you reach its limits
        
               | HideousKojima wrote:
               | >or really really UNSKILLED in sql as to keep your
               | relational model and working set in SQL so bad that you
               | reach its limits
               | 
               | Sadly that's the case at my current job. Zero thought put
               | into table design, zero effort into even formatting our
               | stored procedures in a remotely readable way, zero
               | attempts to cache data on the application side even when
               | it's glaringly obvious. We actually brought in a
               | consultant to diagnose our SQL Server performance issues
               | (I'm sure we paid a small fortune for that) and the DB
               | team and all of the other higher ups capable of actually
               | enforcing change rejected every last one of his
               | suggestions.
        
             | lelanthran wrote:
             | > How people ever got away with building massively
             | redundant fault-tolerant applications that were completely
             | dependent on a single SQL server, I'll never understand.
             | 
             | It works, with a lower cognitive burden than that of
             | horizontally scaling.
             | 
             | For the loading concern (i.e. is this enough to handle the
             | load):
             | 
             | For most businesses, being able to serve 20k _concurrent_
             | requests is way more than they need _anyway_ : an internal
             | app used by 500k users typically has fewer than 20k
             | concurrent requests in flight at peak.
             | 
             | A cheap VPS running PostgreSQL can easily handle that.[1]
             | 
             | For the "if something breaks" concern:
             | 
             | Each "fault-tolerance" criteria added adds some cost. At
             | some point the cost of being resistant to errors exceeds
             | the cost of downtime. The mechanisms to reduce downtime
             | when the single large SQL server shits the bed (failovers,
             | RO followers, whatever) can reduce that downtime to mere
             | minutes.
             | 
             | What is the benefit to removing 3 minutes of downtime?
             | $100? $1k? $100k? $1m? The business will have to decide
             | what those 3 minutes are worth, and if that worth exceeds
             | the cost of using something other than a single large SQL
             | server.
             | 
             | Until and unless you reach the load and downtime-cost of
             | Google, Amazon, Twitter, FB, Netflix, etc, you're simply
             | prematurely optimising for a scenario that, even in the
             | businesses best-case projections, might never exist.
             | 
             | The best thing to do, TBH, is ask the business for their
             | best-case projections and build to handle 90% of that.
             | 
             | [1] An expensive VPS running PostgreSQL can handle _a lot
             | more_ than you think.
        
               | brazzy wrote:
               | > Each "fault-tolerance" criteria added adds some cost.
               | At some point the cost of being resistant to errors
               | exceeds the cost of downtime.
               | 
               | Not to forget: those costs are not just in money and
               | time, but also in complexity. And added complexity comes
               | with its own downtime risks. It's not that uncommon for
               | systems to go down due to problems with mechanisms or
               | components that would not exist in a simpler, "not fault
               | tolerant" system.
        
               | zelphirkalt wrote:
               | The business can try to decide what those 3min are worth,
               | but ultimately the customers vote by either staying or
               | leaving that service.
        
               | lelanthran wrote:
               | > The business can try to decide what those 3min are
               | worth, but ultimately the customers vote by either
               | staying or leaving that service.
               | 
               | That's still a business decision.
               | 
               | Customers don't vote with their feet based on what tech
               | stack the business chose, they vote based on a range of
               | other factors, few, if none, of which are related to 3m
               | of downtime.
               | 
               | There are little to no services I know off that would
               | lose customers over 3m of downtime per week.
               | 
               | IOW, 3m of downtime is mostly an imaginary problem.
        
               | zelphirkalt wrote:
               | That's really a too broad generalization.
               | 
               | Services that people might leave, because of downtime are
               | for example a git hoster, or a password manager. When
               | people cannot push their commits and this happens
               | multiple times, they may leave for another git hoster. I
               | have seen this very example when gitlab was less stable
               | and often unreachable for a few minutes. When people need
               | some credentials, but cannot reach their online password
               | manager, they cannot work. They cannot trust that service
               | to be available in critical moments. Not being able to
               | access your credentials leaves a very bad impression.
               | Some will look for more reliable ways of storing their
               | credentials.
        
               | Bjartr wrote:
               | The user experience of "often unreachable" means way more
               | than 3m per week in practice.
        
               | skydhash wrote:
               | Why does a password manager needs to be online. I
               | understand the need for synchronization, but being
               | exclusively online is a very bad decision. And git
               | synchronization is basically ssh, and if you mess that up
               | on a regular basis, you have no job being in business in
               | the first place. These are examples, but there's a few
               | things that do not need to be online unless your computer
               | is a thin client or you don't trust it at all.
        
               | scott_w wrote:
               | >> What is the benefit to removing 3 minutes of downtime?
               | 
               | > The business can try to decide what those 3min are
               | worth, but ultimately the customers vote by either
               | staying or leaving that service.
               | 
               | What do you think the business is doing when it evaluates
               | what 3 minutes are worth?
        
               | zelphirkalt wrote:
               | There is no "the business". Businesses do all kinds of
               | f'ed up things and lie to themselves all the time as
               | well.
               | 
               | I don't understand, what people are arguing about here.
               | Are we really arguing about customers making their own
               | choice? Since that is all I stated. The business can jump
               | up and down all it wants, if the customers decide to
               | leave. Is that not very clear?
        
               | lelanthran wrote:
               | > The business can jump up and down all it wants, if the
               | customers decide to leave.
               | 
               | I think the point is that, for a few minutes of downtime,
               | businesses lose so little customers that it's not worth
               | avoiding that downtime.
               | 
               | Just now, we had a 5m period where disney+ stopped
               | responding. We aren't going to cut off our toddler from
               | peppa big and bluey for 5m of downtime _per day_ ,
               | nevermind _per week_.
               | 
               | You appeared to be under the impression that 3m
               | downtime/week is enough to make people leave. This is
               | simply not true, especially for internet services where
               | the users are conditioned to simply wait.
        
               | consteval wrote:
               | True, but what people should understand about databases
               | is they're incredibly mature software. They don't fail,
               | they just don't. It's not like the software we're used to
               | using where "whoopsie! Something broke!" is common.
               | 
               | I've never, in my life, seen an error in SQL Server
               | related to SQL Server. It's always been me, the app code
               | developer.
               | 
               | Now, to be fair, the server itself or the hardware CAN
               | fail. But having active/passive database configurations
               | is simple, tried and tested.
        
               | skydhash wrote:
               | And the server itself can be very resilient if you run
               | something like debian or freebsd. Even on arch, I've seen
               | things fails rarely unless it's fringe/proprietary code
               | (bluetooth, nvidia, the browser and 3d accelerated
               | graphics,...). That presumes you will use boring tech
               | which are heavily tested by people around the world, not
               | something "new" and "hyped" which is still on 0.x
        
               | consteval wrote:
               | I agree 100%. Unfortunately my company is pretty tied to
               | windows and windows server, which is a pain. Upgrading
               | and sysadmin-type work is still very manual and there's a
               | lot of room for human error.
               | 
               | I wish we would use something like Debian and take
               | advantage of tech like systemd. But alas, we're still
               | using COM and Windows Services and we still need to
               | remote desktop in and click around on random GUIs to get
               | stuff to work.
               | 
               | Luckily, SQL Server itself is very stable and reliable.
               | But even SQL Server runs on Linux.
        
               | ClumsyPilot wrote:
               | > For most businesses, being able to serve 20k concurrent
               | requests is way more than they need anyway: an internal
               | app used by 500k
               | 
               | This is a very simple distinction and I am not sure why
               | is it not understood
               | 
               | For some reason people design public apps the same as
               | internal apps
               | 
               | The largest companies employ circa 1 million people -
               | that's Walmart, Amazon, etc. most giants, like Shell,
               | etc. companies have ~ 100k tops. That can be handled by 1
               | beefy server.
               | 
               | Successful consumer facing apps are hundred millions to
               | billion. It's 3 orders of magnitude difference
               | 
               | I have seen a company with 5k employees invest into mega-
               | scalable microservice event driven architecture and I was
               | was thinking - I hope they realise what they are doing
               | and it's just CV-driven development
        
             | hylaride wrote:
             | Because people are beholden to costs and it's often out of
             | our hands when to spend money on redundancy (or opportunity
             | costs elsewhere).
             | 
             | It's less true today when redundancy is baked into SaS
             | products (like AWS Aurora, where even if you have a single
             | database instance, it's easy to spin up a replacement one
             | if the hardware on the one running fails).
        
             | ClumsyPilot wrote:
             | Ow yeah, I am looking at that problem right now
        
           | vergessenmir wrote:
           | I'm sorry,but I can't tell if you're being serious or not
           | since you commented without qualification.
           | 
           | One of the most stable system archictectures I've built was
           | on Kafka AND it was running with microservices managed by
           | teams across multiple geographies and time zones. It was one
           | of the most reliable systems in the bank. There are
           | situations where it isn't appropriate, which can be said for
           | most tech e.g( K8S vs ECS vs Nomad vs bare metal)
           | 
           | Every system has failure characteristics. Kafka's is defined
           | as Consistent and Available and your system architecture
           | needs to take that into consideration. Also the
           | transactionality of tasks across multiple services and
           | process boundaries is important.
           | 
           | Let's not pretend that kubernetes (or the tech of the day) is
           | at fault while completely ignoring the complex architectural
           | considerations that are being juggled
        
             | pram wrote:
             | Basically because people end up engineering their
             | microservices as a shim to funnel data into the "magical
             | black hole"
             | 
             | From my experience most microservices aren't engineered to
             | handle back pressure. If there is a sudden upsurge in
             | traffic or data the Kafka cluster is expected to absorb all
             | of the throughput. If the cluster starts having IO issues
             | then literally everything in your "distributed" application
             | is now slowly failing until the consumers/brokers can catch
             | up.
        
         | jamesfinlayson wrote:
         | > There was a saturation point where a single micro service
         | called by all of the other micro services would cause complete
         | system meltdown.
         | 
         | Yep - saw that at a company recently - something in AWS was
         | running a little slower than usual which cascaded to cause
         | massive failures. Dozens of people were trying to get to the
         | bottom of it, it mysteriously fixed itself and no one could
         | offer any good explanation.
        
           | spyspy wrote:
           | Any company that doesn't have some form of distributed
           | tracing in this day and age is acting with pure negligence
           | IMO. Literally flying blind.
        
             | bboygravity wrote:
             | Distributed tracing?
             | 
             | Is that the same as what Elixir/Erlang call supervision
             | trees?
        
               | williamdclt wrote:
               | Likely about Opentelemetry
        
             | ahoka wrote:
             | Oh, come on! Just make sure your architecture is sound. If
             | you need to run an expensive data analysis cluster
             | connected to massive streams of collected call information
             | to see that you have loops on your architecture, then you
             | have a bigger issue.
        
               | hylaride wrote:
               | I don't know if you're being sarcastic, which if you are:
               | heh.
               | 
               | But to the point, if you're going to build a distributed
               | system, you need tools to track problems across the
               | distributed system that also works across teams. A poorly
               | performing service could be caused by up/downstream
               | components and doing that without some kind of tracing is
               | hard even if your stack is linear.
               | 
               | The same is true for a giant monolithic app, but the
               | sophisticated tools are just different.
        
             | consteval wrote:
             | Solution: don't build a distributed system. Just have a
             | computer somewhere running .NET or Java or something. If
             | you really want data integrity and safety, just make the
             | data layer distributed.
             | 
             | There's very little reason to distribute application code.
             | It's very, very rare that the limiting factor in an
             | application is compute. Typically, it's the data layer,
             | which you can change independently of your application.
        
               | MaKey wrote:
               | I have yet to personally see an application where
               | distribution of its parts was beneficial. For most
               | applications a boring monolith works totally fine.
        
               | consteval wrote:
               | I'm sure it exists when the problem itself is
               | distributed. For example, I can imagine something like
               | YouTube would require a complex distributed system.
               | 
               | But I think very few problems fit into that archetype.
               | Instead, people build distributed systems for reliability
               | and integrity. But it's overkill, because you bring all
               | the baggage and complexity of distributed computing. This
               | area is incredibly difficult. I view it similar to
               | parallelism. If you can avoid it for your problem, then
               | avoid it. If you really can't, then take a less complex
               | approach. There's no reason to jump to "scale to X
               | threads and every thread is unaware of where it's
               | running" type solutions, because those are complex.
        
         | jiggawatts wrote:
         | Even if the microservices platform is running at 1% capacity,
         | it's guaranteed to have worse performance than almost any
         | monolith architecture.
         | 
         | It's very rare to meet a developer who has even the vaguest
         | notion of what an RPC call costs in terms of microseconds.
         | 
         | Fewer still that know about issues such as head-of-line
         | blocking, the effects of load balancer modes such as hash
         | versus round-robin, or the CPU overheads of protocol ser/des.
         | 
         | If you have an architecture that involves about 5 hops combined
         | with sidecars, envoys, reverse proxies, and multiple zones
         | you're almost certainly spending 99% to 99.9% of the wall clock
         | time _just waiting_. The useful compute time can rapidly start
         | to approach zero.
         | 
         | This is how you end up with apps like Jira taking a solid
         | minute to show an empty form.
        
           | throwaway48540 wrote:
           | Are you talking about cloud Jira? I use it daily and it's
           | very quick, even search results appear immediately...
        
             | nikau wrote:
             | Have you ever used an old ticketing system like remedy?
             | Ugly as sin, but screens appear instantly.
             | 
             | I think Web apps have been around so long now people have
             | forgotten how unresponsive things are vs old 2 tier stuff.
        
               | BeefWellington wrote:
               | There's a definite vibe these days of "this is how it's
               | always been" when it really hasn't.
        
               | throwaway48540 wrote:
               | The SPA Jira cloud is faster than anything server
               | rendered for me, my connection is shit. In Jira I can at
               | least move to static forms quickly and I'm not
               | downloading the entire page on every move.
        
               | chuckadams wrote:
               | I've used Remedy in three different shops, and the
               | experience varied dramatically. The entry screen might
               | have popped instantly, but RPC timeouts on submission
               | were common. Tab order for controls was the order they
               | were added, not position on the screen. Remedy could be
               | pleasant, but it was very dependent on a competent admin
               | to set up and maintain it.
        
             | abrookewood wrote:
             | Maybe you're being honest, but you're using a throwaway
             | account and I use Cloud Jira every day. It's slow and
             | bloated and drives me crazy.
        
               | kahmeal wrote:
               | Has a LOT to do with how it's configured and what plugins
               | are installed.
        
               | jiggawatts wrote:
               | It _really doesn 't_. This is the excuse trotted out by
               | Atlassian staff when defending their products in public
               | forums, essentially "corporate propaganda". They have a
               | history of gaslighting users, either telling them to
               | disregard the evidence of their own lying eyes, or that
               | it's _their own fault_ for using the product wrong
               | somehow.
               | 
               | I tested the Jira cloud service with a new, blank
               | account. Zero data, zero customisations, zero security
               | rules. _Empty._
               | 
               | Almost all basic operations took tens of seconds to run,
               | even when run repeatedly to warm up any internal caches.
               | Opening a new issue ticket form was especially bad,
               | taking nearly a _minute_.
               | 
               | Other Atlassian excuses included: corporate web proxy
               | servers (I have none), slow Internet (gigabit fibre),
               | slow PCs (gaming laptop on "high performance" settings),
               | browser security plugins (none), etc...
        
               | ffsm8 wrote:
               | > Opening a new issue ticket form was especially bad,
               | taking nearly a minute.
               | 
               | At that point, something mustve been wrong with your
               | instance. I'd never call jira fast, but the new ticked
               | dialog on a unconfigured instance opens within <10s
               | (which is absolutely horrendous performance to be clear.
               | Anything more then 200-500ms is.)
        
               | jiggawatts wrote:
               | Cloud Jira is notably slower than on-prem Jira, which
               | takes on the order of 10 seconds like you said.
        
               | ffsm8 wrote:
               | That does not mirror my own experience. And it's very
               | easy to validate. Just create a free jira cloud instance,
               | takes about 1 minute (
               | https://www.atlassian.com/software/jira/try ) and click
               | new issue.
               | 
               | it's open within 1-2 sec (which is still bad performance,
               | objectively speaking. It's an empty instance after all
               | and already >1s )
        
               | jiggawatts wrote:
               | Ah, there's a new UI now. The last time I tested this,
               | the entire look & feel was different and everything was
               | in slow motion.
               | 
               | It's still sluggish compared to a desktop app from the
               | 1990s, but it's _much_ faster than just a couple of years
               | ago.
        
               | throwaway48540 wrote:
               | This is just not true. The create new issue form appears
               | nearly immediately. I have created two tickets right now
               | - in less than a minute including the writing of few
               | sentences.
        
               | hadrien01 wrote:
               | Atlassian themselves don't use JIRA Cloud. They use the
               | datacenter edition (on-premise) for their public bug
               | tracker, and it's sooooo much faster than the Cloud
               | version: https://jira.atlassian.com/browse/
        
             | lmm wrote:
             | With cloud Jira you get thrown on a shared instance with no
             | control over who you're sharing with, so it's random
             | whether yours will be fast or extremely slow.
        
             | Lucasoato wrote:
             | I'm starting to think that these people praising Jira are
             | just part of an architected PR campaign that tries to deny
             | what's evident to the end users: Jira is slow, bloated in
             | many cases almost unusable.
        
           | randomdata wrote:
           | _> It 's very rare to meet a developer who has even the
           | vaguest notion of what an RPC call costs in terms of
           | microseconds._
           | 
           | To be fair, small time units are difficult to internalize.
           | Just look at what happens when someone finds out that it
           | takes tens of nanoseconds to call a C function in Go (gc).
           | They regularly conclude that it's completely unusable, and
           | not just in a tight loop with an unfathomable number of
           | calls, but even for a single call in their program that runs
           | once per day. You can flat out tell another developer exactly
           | how many microseconds the RPC is going to add and they still
           | aren't apt to get it.
           | 
           | It is not rare to find developers who understand that RPC has
           | a higher cost than a local function, though, and with enough
           | understanding of that to know that there could be a problem
           | if overused. Where they often fall down, however, is when the
           | tools and frameworks try to hide the complexity by making RPC
           | _look_ like a local function. It then becomes easy to miss
           | that there is additional overheard to consider. Make the
           | complexity explicit and you won 't find many developers
           | oblivious to it.
        
             | j16sdiz wrote:
             | Those time cost need to contextualize with time budgets for
             | each service. Without that, it is always somebody else's
             | problem in a RPC world.
        
             | HideousKojima wrote:
             | >Just look at what happens when someone finds out that it
             | takes tens of nanoseconds to call a C function in Go (gc).
             | 
             | I'm not too familiar with Go but my default assumption is
             | that it's just used as a convenient excuse to avoid
             | learning how to do FFI.
        
           | lmm wrote:
           | > This is how you end up with apps like Jira taking a solid
           | minute to show an empty form.
           | 
           | Nah. Jira was horrible and slow long before the microservice
           | trend.
        
             | p_l wrote:
             | JIRA slowness usually involved under provisioned server
             | resources in my experience
        
           | otabdeveloper4 wrote:
           | It's actually worse than what you said. In 2024 the network
           | is the only resource we can't upgrade on demand. There are
           | physical limits we can't change. (I.e., there are only so
           | many wires connecting your machines, and any significant
           | upgrade involves building your own data center.)
           | 
           | So really eventually we'll all be optimizing around network
           | interfaces as the bottleneck.
        
             | jiggawatts wrote:
             | _cough_ speed of light _cough_
        
         | beeandapenguin wrote:
         | Extremely slow times - from development to production, backend
         | to frontend. Depending on how bad things are, you might catch
         | the microservice guys complaining over microseconds from a team
         | downstream, in front of a FE dev who's spent his week
         | optimizing the monotonically-increasing JS bundles with code
         | splitting heuristics.
         | 
         | Of course, it was because the client app recently went over the
         | 100MB JS budget. Which they decided to make because the last
         | time that happened, customers abroad reported seeing "white
         | screens". International conversion dropped sharply not long
         | after that.
         | 
         | It's pretty silly. So ya, good times indeed. Time to learn k8s.
        
         | dangus wrote:
         | This article specifically mentions that they are not running
         | microservices and has pretty clearly defined motivations for
         | making the migration.
        
         | ec109685 wrote:
         | Why would moving to Kubernered from ECS introduce performance
         | issues?
         | 
         | They already has their architecture, largely, and just moved it
         | over to K8s.
         | 
         | They even mention they aren't a micro service company.
        
         | trhway wrote:
         | >there's no mention of performance loss or gain after migration
         | 
         | to illustrate performance cost i usually ask people what ping
         | they have say from one component/pod/service to another and to
         | compare that value to what ping they'd get between 2 linux
         | boxes sitting on that gorgeous 10Gb/40Gb/100Gb or even 1000Gb
         | network that they are running their modern microservices
         | architecture over.
        
         | ellieh wrote:
         | I imagine because the article mentions:
         | 
         | > More broadly, we're not a microservices company, and we don't
         | plan to become one
        
           | dgb23 wrote:
           | It seems the "micro" implies that services are separated by
           | high level business terms, like "payment" or "inventory" with
           | each having their own databases instead of computational
           | terms like "storage", "load balancing" or "preprocessing"
           | etc.
           | 
           | Is this generally correct? How well is this term defined?
           | 
           | If yes, then I'm not surprised this type of design has become
           | a target for frustration. State is smeared across system,
           | which implies a lot of messaging and arbitrary connections
           | between services.
           | 
           | That type of design is useful if you are an application
           | platform (or similar) where you have no say in what the
           | individual entities are, and actually have no idea what they
           | will be.
           | 
           | But if you have the birds-eye view and implement all of it,
           | then why would you do that?
        
         | rco8786 wrote:
         | Or cost increase or decrease!
        
         | crossroadsguy wrote:
         | I see this in Android. Every few years (sometimes multiple
         | times in a year) a new arch. becomes the fad and every TDH dev
         | starts hankering for one way or the other on the lines "why are
         | you not doing X..". Problem is, at immature firms (specially
         | startups) the director and senior manager level leaders happily
         | agree for the re-write i.e re-archs because they usually leave
         | every season for the next place and they get to talk about that
         | new thing which probably didn't last beyond their stay, or
         | might have conked off before that.
         | 
         | And the "test cases" porn! Goodness! People want "coverage" and
         | that's it. It's a box to tick, irrespective of whether those
         | test cases and they way those are written are actually
         | meaningful in anyway. There are more like "let's have
         | dependency injection everywhere" charade.
        
         | alexpotato wrote:
         | As often, Grug has a great line about microservices:
         | grug wonder why big brain take hardest problem, factoring
         | system correctly, and introduce network call too
         | seem very confusing to grug
         | 
         | https://grugbrain.dev/#grug-on-microservices
        
           | kiesel wrote:
           | Thanks for the link to this awesome page!
        
           | Escapado wrote:
           | Thanks for that link, I genuinely laughed out loud while
           | reading some of those points! Love the presentation and what
           | a wonderful reality check I can't agree with more.
        
         | tsss wrote:
         | > single micro service called by all of the other micro
         | services
         | 
         | So they didn't do microservices correctly. Big surprise.
        
           | zbentley wrote:
           | I mean, that pattern is pretty common in the micro service
           | world. Services for things like authz, locking,
           | logging/tracing, etc. are often centralized SPOFs.
           | 
           | There are certainly ways to mitigate the SPOFiness of each of
           | those cases, but that doesn't make having them an
           | antipattern.
        
         | jknoepfler wrote:
         | One of the points of a microservice architecture (on k8s or
         | otherwise) is that you can easily horizontally scale the
         | component that was under pressure without having to scale out a
         | monolithic application... that just sounds like people being
         | clueless, not a failure of microservice architecture...
        
         | okr wrote:
         | And if it was a single application, how would that solved it?
         | You still would have that loop, no?
         | 
         | Personally, i think, it does not have to be hundreds of
         | microservices, basically each function a service. But i see it
         | more as the web in the internet. Things are sometimes not
         | reachable or overloaded. I think that is normal life.
        
       | jmdots wrote:
       | Please just use it as a warehouse scale computer and don't make
       | mode groups into pets.
        
       | jokethrowaway wrote:
       | In which universe migrating from docker containers in ECS to
       | Kubernetes is an effort measured in years?
        
       | strivingtobe wrote:
       | > At the time we did not auto-scale any of our containerized
       | services and were spending a lot of unnecessary money to keep
       | services provisioned such that they could always handle peak
       | load, even on nights and weekends when our traffic is much lower.
       | 
       | Huh? You've been running on AWS for how long and haven't been
       | using auto scaling AT ALL? How was this not priority number one
       | for the company to fix? You're just intentionally burning money
       | at that point!
       | 
       | > While there is some support for auto-scaling on ECS, the
       | Kubernetes ecosystem has robust open source offerings such as
       | Keda for auto-scaling. In addition to simple triggers like CPU
       | utilization, Keda supports scaling on the length of an AWS Simple
       | Queue Service (SQS) queue as well as any custom metrics from
       | Datadog.
       | 
       | ECS autoscaling is easy, and supports these things. Fair play if
       | you just really wanted to use CNCF projects, but this just seems
       | like you didn't really utilize your previous infrastructure very
       | well.
        
       | kachapopopow wrote:
       | It appears to me that people don't really understand kubernetes
       | here.
       | 
       | Kubernetes does not mean microservices, it does not mean
       | containerization and isolation, hell, it doesn't even mean
       | service discovery most of the time.
       | 
       | The default smallest kubernetes installation provides you two
       | things: kubelet (the scheduling agent) and kubeapi.
       | 
       | What do these two allow you to do? KubeApi provides an API to
       | interact with kubelet instances by telling them to do with
       | manifests.
       | 
       | That's all, that's all kubernetes is, just a dumb agent with some
       | default bootstrap behavior that allows you to interact with a
       | backend database.
       | 
       | Now, let's get into kubernetes default extensions:
       | 
       | - CoreDNS - linking service names to service addresses.
       | 
       | - KubeProxy - routing traffic from host to services.
       | 
       | - CNI(many options) - Networking between various service
       | resources.
       | 
       | After that, kubernetes is whatever you want it to be. It can be
       | something that you can use to spawn few test databases. Deploy an
       | entire production-certified clustered databases. A full
       | distributed fs with automatic device discovery. Deploy backend
       | services if you want to take advantage of service discovery,
       | autoscaling and networking. Or it can be something as small as
       | deploying monitoring (such as node-exporter) to every instance.
       | 
       | And as a bonus, it allows you to do it from the comfort of your
       | own local computer.
       | 
       | This article says that figma migrated necessary services to
       | kubernetes to improve developer experience and clearly said that
       | things that don't need to be kubernetes aren't. For all we know
       | they still run their services in raw instances and only use
       | kubernetes for their storage and databases. And to add to all of
       | that, kubernetes doesn't care where it runs, which is a great way
       | to increase competition between cloud providers lowering the
       | costs for all.
        
         | tbrownaw wrote:
         | Is it very common to use it without containers?
        
           | otabdeveloper4 wrote:
           | It is impossible.
        
             | lisnake wrote:
             | it's possible with virtlet or kubevirt
        
           | darby_nine wrote:
           | You can use VMs instead. I don't think the distinction
           | matters very much though.
        
           | kachapopopow wrote:
           | I run it that way on my windows machines, the image is
           | downloaded and executed directly.
        
         | lmm wrote:
         | Kubernetes absolutely means containerisation in practice. There
         | is no other supported way of doing things with it. And "fake"
         | abstraction where you pretend something is generic but it's
         | actually not is one of the easiest ways to overcomplicate
         | anything.
        
           | kachapopopow wrote:
           | If you disable security policy you and remount to pid 1 you
           | escape any encapsulation. Or you can use a k8s implementation
           | that just extracts the image and runs it.
        
         | consteval wrote:
         | To be fair, at the small scales you're talking about (maybe 1-2
         | machines) systemd does the same stuff, just better with less
         | complexity. And there's various much simpler ways to automate
         | your deployments.
         | 
         | If you don't have a distributed system then personally I think
         | k8s makes no sense.
        
           | kachapopopow wrote:
           | I did say with few machines it can be overkill, but when you
           | have more than a dozen of 2-3 machines or 6+ machines it gets
           | overwhelming really fast. Kubernetes in it's smallest form is
           | around 50MiB of memory and 0.1cpu.
        
       | ec109685 wrote:
       | > At high-growth companies, resources are precious
       | 
       | Yeah, at those low growth companies, you have unlimited resources
       | /s
        
       | datadeft wrote:
       | > Migrating onto Kubernetes can take years
       | 
       | What a heck am I reading? For who? I am not sure why companies
       | even bother with such migrations. Where is the business value?
       | Where is the gain for the customer? Is this one of those "L'art
       | pour l'art" project that Figma does it just because they can?
        
         | xorcist wrote:
         | It solves the "we have recently been acquired and have a lot of
         | resources that we must put to use" problem.
        
         | kevstev wrote:
         | FWIW... I was pretty taken aback by this statement as well- and
         | also the "brag" that they moved onto K8s in less than a year.
         | At a very well established firm ~30 years old and with the
         | baggage that came with it, we moved to K8s in far less time-
         | though we made zero attempt to move everything to k8s, just
         | stuff that could benefit from it. Our pitch was more or less-
         | move to k8s and when we do the planned datacenter move at the
         | end of the year, you don't have to do anything aside from a
         | checkout. Otherwise you will have to redeploy your apps to new
         | machines or VMs and deal with all the headache around that. Or
         | you could just containerize now if you aren't already and we
         | take care of the rest. Most migrated and were very happy with
         | the results.
         | 
         | There was plenty of services that were latency sensitive or in
         | the HPC realm where it made no sense to force a migration
         | though, and there was no attempt to force them to shoehorn in.
        
       | rayrrr wrote:
       | Just out of curiosity, is there any other modern system or
       | service that anyone here can think of, where anyone in their
       | right might would brag about migrating to it in less than a year?
        
         | jjice wrote:
         | It's a hard question to answer. Not all systems are equal in
         | size, scope, and impact. K8s as a system is often the core of
         | your infra, meaning everything running will be impacted. That
         | coupled with their team constraints in the article make it
         | sound like a year isn't awful.
         | 
         | One system I can think of off the top of my head is when Amazon
         | moved away from Oracle to fully Amazon/OSS RDBMSs a while ago,
         | but that was multi year I think. If they could have done it in
         | less than a year, they'd definitely be bragging.
        
         | therealdrag0 wrote:
         | I've seen many migrations take over a year. It's less about the
         | technology and more about your tech debt, integration
         | complexity, and resourcing.
        
       | 05bmckay wrote:
       | I don't think this is the flex they think it is...
        
       | ravedave5 wrote:
       | Completely left out of this post and most of the conversation is
       | that being on K8 makes it much, much easier to go multi-cloud. K8
       | is k8.
        
       | syngrog66 wrote:
       | k8s and "12 months" -> my priors likely confirmed. ha
        
       | sjkoelle wrote:
       | The title alone is a teensie bit hilarious
        
       | Ramiro wrote:
       | I love reading these "reports from the field"; I always pick up a
       | thing or two. Thanks for sharing @ianvonseggern!
        
       ___________________________________________________________________
       (page generated 2024-08-09 23:02 UTC)