[HN Gopher] How we migrated onto K8s in less than 12 months
___________________________________________________________________
How we migrated onto K8s in less than 12 months
Author : ianvonseggern
Score : 263 points
Date : 2024-08-08 16:07 UTC (1 days ago)
(HTM) web link (www.figma.com)
(TXT) w3m dump (www.figma.com)
| jb1991 wrote:
| Can anyone advise what is the most common language used in
| enterprise settings for interfacing with K8s?
| JohnMakin wrote:
| IME almost exclusively golang.
| roshbhatia wrote:
| ++, most controllers are written in go, but there's plenty of
| client libraries for other languages.
|
| A common pattern you'll see though is skipping writing any
| sort of code and instead using a higher level dsl-ish
| configuration usually via yaml, using tools like Kyverno.
| angio wrote:
| I'm seeing a lot of custom operators written in Rust
| nowadays. Obviously biased because I do a lot of rust myself
| so people I'm talking to also do rust.
| gadflyinyoureye wrote:
| Depends on what you mean. Helm will control a lot. You can make
| the yaml file in any language. Also you can admin it from
| command line tools. So again any language but often zsh or
| bash.
| cortesoft wrote:
| A lot of yaml
| yen223 wrote:
| The fun kind of yaml that has a lot of {{ }} in them that
| breaks your syntax highlighter.
| mplewis wrote:
| I have seen more Terraform than anything else.
| akdor1154 wrote:
| On the platform consumer side (app infra description) - well
| schema'd yaml, potentially orchestrated by helm ("templates to
| hellish extremes") or kustomize ("no templates, this is the
| hill we will die on").
|
| On the platform integration/hook side (app code doing
| specialised platform-specific integration stuff, extensions to
| k8s itself), golang is the lingua franca but bindings for many
| languages are around and good.
| bithavoc wrote:
| If you're talking about connecting to Kubernetes and create
| resources programmatically, Pulumi allows you to interface with
| it from all the languages they support(js, ts, go, c#, python)
| including wrapping up Helm charts and inject secrets( my
| personal favorite).
|
| If you want to build your own Kubernetes Custom Resources and
| Controllers, Go lang works pretty well for that.
| JohnMakin wrote:
| I like how this article clearly and articulately states the
| reasons it gains to benefit from Kubernetes. Many make the jump
| without knowing what they even stand to gain, or if they need to
| in the first place - the reasons given here are good.
| nailer wrote:
| I was about to write the opposite - the logic is poor and
| circular - but multiple other commenters have already raised
| this: https://news.ycombinator.com/item?id=41194506
| https://news.ycombinator.com/item?id=41194420
| JohnMakin wrote:
| I don't really see those rebuttals as all that valid. The
| reasons given in this article are completely valid, from my
| perspective of someone who's worked heavily with
| Kubernetes/ECS.
|
| Helm, for instance, is a great time saver for installing
| software. Often software will support nothing but helm. Ease
| of deployment is a good consideration. Their points on
| networking are absolutely spot on. The scaling considerations
| are spot on. Killing/isolating unhealthy containers is
| completely valid. I could go on a lot more, but I don't see a
| single point listed as invalid.
| samcat116 wrote:
| They're quite specific in that they mention that teams would
| like to make use of existing helm charts for other software
| products. Telling them to build and maintain definitions for
| those services from scratch is added work in their mind.
| dijksterhuis wrote:
| > When applied, Terraform code would spin up a template of what
| the service should look like by creating an ECS task set with
| zero instances. Then, the developer would need to deploy the
| service and clone this template task set [and do a bunch of
| manual things]
|
| > This meant that something as simple as adding an environment
| variable required writing and applying Terraform, then running a
| deploy
|
| This sounds less like a problem with ECS and more like an
| overcomplication in how they were using terraform + ECS to manage
| their deployments.
|
| I get the generating templates part for verification prior to
| live deploys. But this seems... dunno.
| wfleming wrote:
| Very much agree. I have built infra on ECS with terraform at
| two companies now, and we have zero manual steps for actions
| like this, beyond "add the env var to a terraform file, merge
| it and let CI deploy". The majority of config changes we would
| make are that process.
| dijksterhuis wrote:
| Yeah.... thinking about it a bit more i just don't see why
| they didn't set up their CI to deploy a short lived
| environment on a push to a feature branch.
|
| To me that seems like the simpler solution.
| roshbhatia wrote:
| I'm with you here -- ECS deploys are pretty painless and
| uncomplicated, but I can picture a few scenarios where this
| ends up being necessary, for ex; if they have a lot of services
| deployed on ECS and it ends up bloating the size of the
| Terraform state. That'd slow down plans and applies
| significantly, which makes sharding the Terraform state by
| literally cloning the configuration based on a template a lot
| safer.
| freedomben wrote:
| > ECS deploys are pretty painless and uncomplicated
|
| Unfortunately in my experience, this is true until it isn't.
| Once it isn't true, it can quickly become a painful blackbox
| debugging exercise. If your org is big enough to have
| dedicated AWS support then they can often get help from
| engineers, but if you aren't then life can get really
| complicated.
|
| Still not a bad choice for most apps though, especially if
| it's just a run-of-the-mill HTTP-based app
| ianvonseggern wrote:
| Hey, author here, I totally agree that this is not a
| fundamental limitation of ECS and we could have iterated on
| this setup and made something better. I intentionally listed
| this under work we decided to scope into the migration process,
| and not under the fundamental reasons we undertook the
| migration because of that distinction.
| Aeolun wrote:
| Honestly, I find the reasons they name for using Kubernetes
| flimsy as hell.
|
| "ECS doesn't support helm charts!"
|
| No shit sherlock, that's a thing literally built on Kubernetes.
| It's like a government RFP that can only be fulfilled by a single
| client.
| Carrok wrote:
| > We also encountered many smaller paper cuts, like attempting
| to gracefully terminate a single poorly behaving EC2 machine
| when running ECS on EC2. This is easy on Amazon's Elastic
| Kubernetes Service (EKS), which allows you to simply cordon off
| the bad node and let the API server move the pods off to
| another machine while respecting their shutdown routines.
|
| I dunno, that seems like a very good reason to me.
| watermelon0 wrote:
| I assume that ECS Fargate would solve this, because one
| misbehaving ECS task would not affect others, and stopping it
| should still respect the shutdown routines.
| ko_pivot wrote:
| Fargate is very expensive at scale. Great for small or
| bursty workloads, but when you're at Figma scale, you
| almost always go EC2 for cost-effectiveness.
| ihkasfjdkabnsk wrote:
| this isn't really true. It was very expensive when it was
| first released but now it's pretty cost competitive with
| EC2, especially when you consider easier scale down/up.
| Aeolun wrote:
| I think when you are at Figma scale you should have
| learned to keep things simpler. At this point I don't
| think the (slightly) lower costs of EC2 weigh up against
| the benefits of Fargate.
| aranelsurion wrote:
| To be fair there are many benefits of running on the platform
| that has the most mindshare.
|
| Unless they are in this space competing against k8s, it's
| reasonable for them if they want to use Helm charts, to move
| where they can.
|
| Also, Helm doesn't work with ECS, so doesn't <50 other tools
| and tech from the CNCF map>.
| cwiggs wrote:
| I think what they should have said is "there isn't a tool like
| Helm for ECS" If you want to deploy a full prometheus, grafana,
| alertmanager, etc stack on ECS, good luck with that, no one has
| written the task definition for you to consume and override
| values.
|
| With k8s you can easily deploy a helm chart that will deploy
| lots of things that all work together fairly easily.
| JohnMakin wrote:
| It's almost like people factor in a piece of software's tooling
| environment before they use the software - wild.
| liveoneggs wrote:
| recipes and tutorials say "helm" so we need "helm"
| vouwfietsman wrote:
| Maybe its normal for a company this size, but I have a hard time
| following much of the decision making around these gigantic
| migrations or technology efforts because the decisions don't seem
| to come from any user or company need. There was a similar post
| from Figma earlier, I think around databases, that left me
| feeling the same.
|
| For instance: they want to go to k8s because they want to use
| etcd/helm, which they can't on ECS? Why do you want to use
| etcd/helm? Is it really this important? Is there really no other
| way to achieve the goals of the company than exactly like that?
|
| When a decision is founded on a desire of the user, its easy to
| validate that downstream decisions make sense. When a decision is
| founded on a technological desire, downstream decisions may make
| sense in the context of the technical desire, but do they make
| sense in the context of the user, still?
|
| Either I don't understand organizations of this scale, or it is
| fundamentally difficult for organizations of this scale to
| identify and reason about valuable work.
| WaxProlix wrote:
| People move to K8s (specifically from ECS) so that they can use
| cloud provider agnostic tooling and products. I suspect a lot
| of larger company K8s migrations are fueled by a desire to be
| multicloud or hybrid on-prem, mitigate cost, availability, and
| lock-in risk.
| timbotron wrote:
| there's a pretty direct translation from ECS task definition
| to docker-compose file
| zug_zug wrote:
| I've heard all of these lip-service justifications before,
| but I've yet to see anybody actually publish data showing how
| they saved any money. Would love to be proven wrong by some
| hard data, but something tells me I won't be.
| nailer wrote:
| Likewise. I'm not sure Kubernetes famous complexity (and
| the resulting staff requirements) are worth it to
| preemptively avoid vendor lockin, and wouldn't be solved
| more efficiently by migrating to another cloud provider's
| native tools if the need arises.
| bryanlarsen wrote:
| I'm confident Figma isn't paying published rates for AWS.
| The transition might have helped them in their rate
| negotiations with AWS, or it might not have. Hard data on
| the money saved would be difficult to attribute.
| jgalt212 wrote:
| True but if AWS knows your lock-in is less locked-in, I'd
| bet they'd more flexible when contracts are up for renewal.
| I mean it's possible the blog post's primary purpose was a
| shot across bow to their AWS account manager.
| logifail wrote:
| > it's possible the blog post's primary purpose was a
| shot across bow to their AWS account manager
|
| Isn't it slightly depressing that this explanation is
| fairly (the most?) plausible?
| jiggawatts wrote:
| Our state department of education is one of the biggest
| networks in the world with about half a million devices.
| They would occasionally publicly announce a migration to
| Linux.
|
| This was just a Microsoft licensing negotiation tactic.
| Before he was CEO, Ballmer flew here to negotiate one of
| the contracts. The discounts were _epic_.
| tengbretson wrote:
| There are large swaths of the b2b space where (for whatever
| reason) being in the same cloud is a hard business
| requirement.
| vundercind wrote:
| The vast majority of corporate decisions are never
| justified by useful data analysis, before or after the
| fact.
|
| Many are so-analyzed, but usually in ways that anyone who
| paid attention in high school science or stats classes can
| tell are so flawed that they're meaningless.
|
| We can't even measure manager efficacy to any useful
| degree, in nearly all cases. We can come up with numbers,
| but they don't mean anything. Good luck with anything more
| complex.
|
| Very small organizations can probably manage to isolate
| enough variables to know how good or bad some move was in
| hindsight, if they try and are competent at it (... if).
| Sometimes an effect is so huge for a large org that it
| overwhelms confounders and you can be pretty confident that
| it was at least good or bad, even if the degree is fuzzy.
| Usually, no.
|
| Big organizations are largely flying blind. This has only
| gotten worse with the shift from people-who-know-the-work-
| as-leadership to professional-managers-as-leadership.
| Alupis wrote:
| Why would you assume it's lip-service?
|
| Being vendor-locked into ECS means you _must_ pay whatever
| ECS wants... using k8s means you can feasibly pick up and
| move if you are forced.
|
| Even if it doesn't save money _today_ it might save a
| tremendous amount in the future and /or provide a much
| stronger position to negotiate from.
| greener_grass wrote:
| Great in theory but in practice when you do K8s on AWS,
| the AWS stuff leaks through and you still have lock-in.
| Alupis wrote:
| Then don't use the AWS stuff. You can bring your own
| anything that they provide.
| greener_grass wrote:
| This requires iron discipline. Maybe with some kind of
| linter for Terraform / kubectl it could be done.
| cwiggs wrote:
| It doesn't have to be that way though. You can use the
| AWS ingress controller, or you can use ingress-nginx. You
| can use external secrets operator and tie it into AWS
| Secrets manager, or you can tie it into 1pass, or
| Hashicorp Vault.
|
| Just like picking EKS you have to be aware of the pros
| and cons of picking the cloud provider tool or not.
| Luckily the CNCF is doing a lot for reducing vender lock
| in and I think it will only continue.
| elktown wrote:
| I don't understand why this "you shouldn't be vendor-
| locked" rationalization is taken at face value at all?
|
| 1. The time it will take to move to another cloud is
| proportional to the complexity of your app. For example,
| if you're a Go shop using managed persistence are you
| more vendor locked in any meaningful way than k8s? What's
| the delta here?
|
| 2. Do you really think you can haggle with the fuel-
| producers like you're MAERKS? No, you're more likely just
| a car driving around for a gas station with increasingly
| diminishing returns.
| Alupis wrote:
| This year alone we've seen significant price increases
| from web services, including critical ones such as Auth.
| If you are vendor-locked into, say Auth0, and they
| increase their price 300%[1]... What choice do you have?
| What negotiation position do you have? None... They know
| you cannot leave.
|
| It's even worse when your entire platform is vendor-
| locked.
|
| There is nothing but upside to working towards a vendor-
| neutral position. It gives you options. Even if you never
| use those options, they are there.
|
| > Do you really think you can haggle
|
| At the scale of someone like Figma? Yes, they do
| negotiate rates - and a competent account manager will
| understand Figma's position and maximize the revenue they
| can extract. Now, if the account rep doesn't play ball,
| Figma can actually move their stuff somewhere else.
| There's literally nothing but upside.
|
| I swear, it feels like some people are just allergic to
| anything k8s and actively seek out ways to hate on it.
|
| [1] https://auth0.com/blog/upcoming-pricing-changes-for-
| the-cust...
| elktown wrote:
| Why skip point 1 and do some strange tangent on a SaaS
| product unrelated to using k8s or not?
|
| Most people looking into (and using) k8s that are being
| told the "you most avoid vendor lock in!" selling point
| are nowhere near the size where it matters. But I know
| there's essentially bulk-pricing, as we have it where I
| work as well. That it's because of picking k8s or not
| however is an extremely long stretch, and imo mostly
| rationalization. There's nothing saying that a cloud move
| _without_ k8s couldn 't be done within the same amount of
| time. Or that even k8s is the main problem, I imagine it
| isn't since it's usually supposed to be stateless apps.
| Alupis wrote:
| The point was about vendor lock, which you asserted is
| not a good reason to make a move, such as this. The
| "tangent" about a SaaS product was to make it clear what
| happens when you build your system in such a way as-to
| become entirely dependent on that vendor. Just because
| Auth0 is not part of one of the big "cloud" providers,
| doesn't make it any less vendor-locky. Almost all of the
| vendor services offered on the big clouds are extremely
| vendor-locked and non-portable.
|
| Where you buy compute from is just as big of a deal as
| where you buy your other SaaS' from. In all of the cases,
| if you cannot move even if you had to (ie. it'll take 1
| year+ to move), then you are not in a good position.
|
| Addressing your #1 point - if you use a regular database
| that happens to be offered by a cloud provider (ie.
| Postgres, MySQL, MongoDB, etc) then you can pick up and
| move. If you use something proprietary like CosmoDB, then
| you are stuck or face significant efforts to migrate.
|
| With k8s, moving to another cloud can be as simple as
| creating an account and updating your configs to point at
| the new cluster. You can run every service you need
| inside your cluster if you wanted. You have freedom of
| choice and mobility.
|
| > Most people looking into (and using) k8s that are being
| told the "you most avoid vendor lock in!" selling point
| are nowhere near the size where it matters.
|
| This is just simply wrong, as highlighted by the SaaS
| example I provided. If you think you are too small so it
| doesn't matter, and decide to embrace all of the cloud
| vendor's proprietary services... what happens to you when
| that cloud provider decides to change their billing
| model, or dramatically increases price? You are screwed
| and have no options but cough up more money.
|
| There's more decisions to make and consider regarding
| choosing a cloud platform and services than just whatever
| is easiest to use today - for any size of business.
|
| I have found that, in general, people are afraid of using
| k8s because it isn't trivial to understand for most
| developers. People often mistakenly believe k8s is only
| useful when you're "google scale". It solves a lot of
| problems, including reduced vendor-lock.
| watermelon0 wrote:
| I would assume that the migration from ECS to something
| else would be a lot easier, compared to migrating from
| other managed services, such as S3/SQS/Kinesis/DynamoDB,
| and especially IAM, which ties everything together.
| otterley wrote:
| Amazon ECS is and always has been free of charge. You pay
| for the underlying compute and other resources (just like
| you do with EKS, too), but not the orchestration service.
| WaxProlix wrote:
| It looks like I'm implying that companies are successful in
| getting those things from a K8s transition, but I wasn't
| trying to say that, just thinking of the times when I've
| seen these migrations happen and relaying the stated aims.
| I agree, I think it can be a burner of dev time and a
| burden on the business as devs acquire the new skillset
| instead of doing more valuable work.
| OptionOfT wrote:
| Flexibility was a big thing for us. Many different
| jurisdictions required us to be conscious of where exactly
| data was stored & processed.
|
| K8s makes this really easy. Don't need to worry whether
| country X has a local Cloud data center of Vendor Y.
|
| Plus it makes hiring so much easier as you only need to
| understand the abstraction layer.
|
| We don't hire people for ARM64 or x86. We have abstraction
| layers. Multiple even.
|
| We'd be fooling us not to use them.
| fazkan wrote:
| This, most of it, I think is to support on-prem, and cloud-
| flexibility. Also from the customers point of view, they can
| now sell the entire figma "box" to controlled industries for
| a premium.
| teyc wrote:
| People move to K8s so that their resumes and job ads are
| cloud provider agnostic. Peoples careers stagnate when their
| employers platform on a home baked tech, or on specific
| offerings from cloud providers. Employers find Mmoving to a
| common platform makes recruiting easier.
| samcat116 wrote:
| > I have a hard time following much of the decision making
| around these gigantic migrations or technology efforts because
| the decisions don't seem to come from any user or company need
|
| I mean the blog post is written by the team deciding the
| company needs. They explained exactly why they can't easily use
| etcd on ECS due to technical limitations. They also talked
| about many other technical limitations that were causing them
| issues and increasing cost. What else are you expecting?
| Flokoso wrote:
| Managing 500 or more VMS is a lot of work.
|
| Aline the VM upgrade, auth, backup, log rotation etc.
|
| With k8s I can give everyone a namespace, policies, volumes,
| have automatic log aggregation due to demon sets and k8s/cloud
| native stacks.
|
| Self healing and more.
|
| It's hard to describe how much better it is.
| ianvonseggern wrote:
| Hey, author here, I think you ask a good question and I think
| you frame it well. I agree that, at least for some major
| decisions - including this one, "it is fundamentally difficult
| for organizations of this scale to identify and reason about
| valuable work."
|
| At its core we are a platform teams building tools, often for
| other platform teams, that are building tools that support the
| developers at Figma creating the actual product experience. It
| is often harder to reason about what the right decisions are
| when you are further removed from the end user, although it
| also gives you great leverage. If we do our jobs right the
| multiplier effect of getting this platform right impacts the
| ability of every other engineer to do their job efficiently and
| effectively (many indirectly!).
|
| You bring up good examples of why this is hard. It was
| certainly an alternative to say sorry we can't support etcd and
| helm and you will need to find other ways to work around this
| limitation. This was simply two more data points helping push
| us toward the conclusion that we were running our Compute
| platform on the wrong base building blocks.
|
| While difficult to reason about, I do think its still very
| worth trying to do this reasoning well. It's how as a platform
| team we ensure we are tackling the right work to get to the
| best platform we can. Thats why we spent so much time making
| the decision to go ahead with this and part of why I thought it
| was an interesting topic to write about.
| felixgallo wrote:
| I have a constructive recommendation for you and your
| engineering management for future cases such as this.
|
| First, when some team says "we want to use helm and etcd for
| some reason and we haven't been able to figure out how to get
| that working on our existing platform," start by asking them
| what their actual goal is. It is obscenely unlikely that helm
| (of all things) is a fundamental requirement to their work.
| Installing temporal, for example, doesn't require helm and is
| actually simple, if it turns out that temporal is the best
| workflow orchestrator for the job and that none of the
| probably 590 other options will do.
|
| Second, once you have figured out what the actual goal is,
| and have a buffet of options available, price them out. Doing
| some napkin math on how many people were involved and how
| much work had to go into it, it looks to me that what you
| have spent to completely rearchitect your stack and
| operations and retrain everyone -- completely discounting
| opportunity cost -- is likely not to break even in even my
| most generous estimate of increased productivity for about
| five years. More likely, the increased cost of the platform
| switch, the lack of likely actual velocity accrual, and the
| opportunity cost make this a net-net bad move except for the
| resumes of all of those involved.
| Spivak wrote:
| > we can't support etcd and helm and you will need to find
| other ways to work around this limitation
|
| So am I reading this right that either downstream platform
| teams or devs wanted to leverage existing helm templates to
| provision infrastructure and being on ECS locked you out of
| those and the water eventually boiled over. If so that's a
| pretty strong statement about the platform effect of k8s.
| vouwfietsman wrote:
| Hi! Thanks for the thoughtful reply.
|
| I understand what you're saying, the thing that worries me
| though is that the input you get from other technical teams
| is very hard to verify. Do you intend to measure the
| development velocity of the teams before and after the
| platform change takes effect?
|
| In my experience it is extremely hard to measure the real
| development velocity (in terms of value-add, not arbitrary
| story points) of a single team, not to mention a group of
| teams over time, not to mention _as a result of a change_.
|
| This is not necessarily criticism of Figma, as much as it is
| criticism of the entire industry maybe.
|
| Do you have an approach for measuring these things?
| felixgallo wrote:
| You're right that the input from other technical teams is
| hard to verify. On the other hand, that's fundamental table
| stakes, especially for a platform team that has a broad
| impact on an organization. The purpose of the platform is
| to delight the paying customer, and every change should
| have a clear and well documented and narrated line of sight
| to either increasing that delight or decreasing the
| frustration.
|
| The canonical way to do that is to ensure that the incoming
| demand comes with both the ask and also the solid
| justification. Even at top tier organizations, frequently
| asks are good ideas, sensible ideas, nice ideas, probably
| correct ideas -- but none of that is good enough/acceptable
| enough. The proportion of good/sensible/nice/probably
| correct ideas that are justifiable is about 5% in my lived
| experience of 38 years in the industry. The onus is on the
| asking team to provide that full true and complete
| justification with sufficiently detailed data and in the
| manner and form that convinces the platform team's
| leadership. The bar needs to be high and again, has to
| provide a clear line of sight to improving the life of the
| paying customer. The platform team has the authority and
| agency necessary to defend the customer, operations and
| their time, and can (and often should) say no. It is not
| the responsibility of the platform team to try to prove or
| disprove something that someone wants, and it's not
| 'pushing back' or 'bureaucracy', it's basic sober purpose-
| of-the-company fundamentals. Time and money are not
| unlimited. Nothing is free.
|
| Frequently the process of trying to put together the
| justification reveals to the asking team that they do not
| in fact have the justification, and they stop there and a
| disaster is correctly averted.
|
| Sometimes, the asking team is probably right but doesn't
| have the data to justify the ask. Things like 'Let's move
| to K8s because it'll be better' are possibly true but also
| possibly not. Vibes/hacker news/reddit/etc are beguiling to
| juniors but do not necessarily delight paying customers.
| The platform team has a bunch of options if they receive
| something of that form. "No" is valid, but also so is
| "Maybe" along with a pilot test to perform A/B testing
| measurements and to try to get the missing data; or even
| "Yes, but" with a plan to revert the situation if it turns
| out to be too expensive or ineffective after an
| incrementally structured phase 1. A lot depends on the
| judgement of the management and the available bandwidth,
| opportunity cost, how one-way-door the decision is, etc.
|
| At the end of the day, though, if you are not making a
| data-driven decision (or the very closest you can get to
| one) and doing it off naked/unsupported asks/vibes/resume
| enhancement/reddit/hn/etc, you're putting your paying
| customer at risk. At best you'll be accidentally correct.
| Being accidentally correct is the absolute worst kind of
| correct, because inevitably there will come a time when
| your luck runs out and you just killed your
| team/organization/company because you made a wrong choice,
| your paying customers got a worse/slower-to-improve/etc
| experience, and they deserted you for a more soberly run
| competitor.
| wg0 wrote:
| If you haven't broken down your software into 50+ different
| separate applications written in 15 different languages using 5
| different database technologies - you'll find very little use
| for k8s.
|
| All you need is a way to roll out your artifact to production
| in a roll over or blue green fashion after the preparations
| such as required database alterations be it data or schema
| wise.
| javaunsafe2019 wrote:
| But you do know which problems the k8s abstraction solves,
| right? Cause it has nothing to do with many languages nor
| many services but things like discovery, scaling, failover
| and automation ...
| wg0 wrote:
| If all you have one single application listening on port
| 8080 with SSL terminated elsewhere, why would you need so
| many abstractions in first place.
| imiric wrote:
| > All you need is a way to roll out your artifact to
| production in a roll over or blue green fashion after the
| preparations such as required database alterations be it data
| or schema wise.
|
| Easier said than done.
|
| You can start by implementing this yourself and thinking how
| simple it is. But then you find that you also need to decide
| how to handle different environments, configuration and
| secret management, rollbacks, failover, load balancing, HA,
| scaling, and a million other details. And suddenly you find
| yourself maintaining a hodgepodge of bespoke infrastructure
| tooling instead of your core product.
|
| K8s isn't for everyone. But it sure helps when someone else
| has thought about common infrastructure problems and solved
| them for you.
| mattmanser wrote:
| You need to remove a lot of things from that list. Almost
| all of that functionality is available in build tools that
| have been available for decades. I want to emphasize the
| DECADES.
|
| And then all you're left with is scaling. Which most
| business do not need.
|
| Almost everything you've written there is a standard
| feature of almost any CI toolchain, teamcity, Jenkins,
| Azure DevOps, etc., etc.
|
| We were doing it before k8s was even written.
| imiric wrote:
| > Almost all of that functionality is available in build
| tools that have been available for decades.
|
| Build tools? These are runtime and operational concerns.
| No build tool will handle these things.
|
| > And then all you're left with is scaling. Which most
| business do not need.
|
| Eh, sure they do. They might not need to hyperscale, but
| they could sure benefit from simple scaling, autoscaling
| at peak hours, and scaling down to cut costs.
|
| Whether they need k8s specifically to accomplish this is
| another topic, but every business needs to think about
| scaling in some way.
|
| > Almost everything you've written there is a standard
| feature of almost any CI toolchain, teamcity, Jenkins,
| Azure DevOps, etc., etc.
|
| Huh? Please explain how a CI pipeline will handle load
| balancing, configuration and secret management, and other
| operational tasks for your services. You may use it for
| automating commands that do these things, but CI systems
| are entirely decoupled from core infrastructure.
|
| > We were doing it before k8s was even written.
|
| Sure. And k8s isn't the absolute solution to these
| problems. But what it does give you is a unified set of
| interfaces to solve common infra problems. Whatever
| solutions we had before, and whatever you choose to
| compose from disparate tools, will not be as unified and
| polished as what k8s offers. It's up to you to decide the
| right trade-off, but I find the head-in-the-sand
| dismissal of it equally as silly as cargo culting it.
| mplewis wrote:
| Yeah, all you need is a rollout system that supports blue-
| green! Very easy to homeroll ;)
| wg0 wrote:
| Not easy, but already a solved problem.
| friendly_deer wrote:
| Here's a theory about why at least some of these come about:
|
| https://lethain.com/grand-migration/
| lmm wrote:
| > For instance: they want to go to k8s because they want to use
| etcd/helm, which they can't on ECS? Why do you want to use
| etcd/helm? Is it really this important? Is there really no
| other way to achieve the goals of the company than exactly like
| that?
|
| I'm no fan of Helm, but there are surprisingly few good
| alternatives to etcd (i.e. highly available but consistent
| datastores, suitable for e.g. the distributed equivalent of a
| .pid file) - Zookeeper is the only one that comes to mind, and
| it's a real pain on the ops side of things, requiring ancient
| JVM versions and being generally flaky even then.
| tedunangst wrote:
| How long will it take to migrate off?
| codetrotter wrote:
| It's irreversible.
| hujun wrote:
| depends on how much "k8s native" code you have, there are
| application designed to run on k8s which uses a lot of k8s api
| and also if you app already micro-serviced, it is also not
| straight forward to change it back
| wrs wrote:
| A migration with the goal of improving the infrastructure
| foundation is great. However, I was surprised to see that one of
| the motivations was to allow teams to use Helm charts rather than
| converting to Terraform. I haven't seen in practice the
| consistent ability to actually use random Helm charts unmodified,
| so by encouraging its use you end up with teams forking and
| modifying the charts. And Helm is such a horrendous tool, you
| don't really want to be maintaining your own bespoke Helm charts.
| IMO you're actually _better off_ rewriting in Terraform so at
| least your local version is maintainable.
|
| Happy to hear counterexamples, though -- maybe the "indent 4"
| insanity and multi-level string templating in Helm is gone
| nowadays?
| smellybigbelly wrote:
| Our team also suffered from the problems you described of
| public helm charts. There is always something you need to
| customise to make things work on your own environment. Our
| approach has been to use the public helm chart as-is and do any
| customisation with `kustomize --enable-helm`.
| BobbyJo wrote:
| Helm is quite often the default supported way of launching
| containerized third-party products. I have works at two
| separate startups whose 'on prem' product was offered this way.
| freedomben wrote:
| Indeed. I try hard to minimize the amount of Helm we use, but
| a significant amount of tools are only shipped as Helm
| charts. Fortunately I'm increasingly seeing people provide
| "raw k8s" yaml, but it's far from universal.
| cwiggs wrote:
| Helm Charts and Terraform are different things IMO. Terraform
| is better used to deploying cloud resources (s3 bucket, EKS
| cluster, EKS workers, RDS, etc). Sure you can manage your k8s
| workloads with Terraform, but I wouldn't recommend it.
| Terraform having state when you already have your start in k8s
| makes working with Terraform + k8s a pain. Helm is purpose
| built for k8s, Terraform is not.
|
| I'm not a fan of Helm either though, templat-ed yaml sucks, you
| still have the "indent 4" insanity too. Kustomize is nice when
| things are simple, but once your app is complex Kustomize is
| worse than Helm IMO. Try to deploy an app that has a ING, with
| a TLS cert and external-DNS with Kustomize for multiple
| environments; you have to patch the resources 3 times instead
| of just have 1 variable you and use in 3 places.
|
| Helm is popular, Terraform is popular so they both are talked a
| lot, but IMO there is a tool that is yet to become popular that
| will replace both of these tools.
| wrs wrote:
| I agree, I wouldn't generate k8s from Terraform either,
| that's just the alternative I thought the OP was presenting.
| But I'd still rather convert charts from Helm to pretty much
| anything else than maintain them.
| stackskipton wrote:
| Lack of Variable substitution in Kustomize is downright
| frustrating. We use Flux so we have the feature anyways, but
| I wish it was built into Kustomize.
| no_circuit wrote:
| I don't miss variable substitution at all.
|
| For my setup anything that needs to be variable or secret
| gets specified in a custom json/yaml file which is read by
| a plugin which in turn outputs the rendered manifest if I
| can't write it as a "patch". That way the CI/CD runner can
| access things like the resolved secrets for production
| without being accessible by developers without elevated
| access. It requires some digging but there are even
| annotations that can be used to control things like if
| Kustomize should add a hash suffix or not to ConfigMap or
| Secret manifests you generate with plugins.
| tionate wrote:
| Re your kustomizen complaint, just create a complete env-
| specific ingress for each env instead of patching.
|
| - it is not really any more lines - doesn't break if dev
| upgrades to a different version of the resource (has happened
| before) - allows you to experiment with dev with other setups
| (eg additional ingresses, different paths etc) instead of
| changing a base config which will impact other envs
|
| TLDR patch things that are more or less the same in each env;
| create complete resources for things that change more.
|
| There is a bit of duplication but it is a lot more simple
| (see 'simple made easy' - rich hockey) than tracing through
| patches/templates.
| gouggoug wrote:
| Talking about helm - I personally have come to profoundly
| loathe it. It was amazing when it came out and filled a much
| needed gap.
|
| However it is loaded with so many footguns that I spend my time
| redoing and debugging others engineers work.
|
| I'm hoping this new tool called << timoni >> picks up steam. It
| fixes pretty every qualm I have with helm.
|
| So if like me you're looking for a better solution, go check
| timoni.
| JohnMakin wrote:
| It's completely cursed, but I've started deploying helm via
| terraform lately. Many people, ironically me included, find
| that managing deployments via terraform is an anti pattern.
|
| I'm giving it a try and I don't despise it yet, but it feels
| gross - application configs are typically far more mutable and
| dynamic than cloud infrastructure configs, and IME, terraform
| does not likey super dynamic configs.
| solatic wrote:
| My current employer (BigCo) has Terraform managing both infra +
| deployments in Terraform, at (ludicrous) scale. It's a
| nightmare. The problem with Terraform is that you _must_ plan
| your workspaces such that you will not exceed the best-practice
| amount of resources per workspace (~100-200) or else plans will
| drastically slow down your time-to-deploy, checking stuff like
| databases and networking that you haven 't touched and have no
| desire to touch. In practice this means creating a latticework
| of Terraform workspaces that trigger each other, and there are
| currently no good open-source tools that support it.
|
| Best practice as I can currently see it is to have Terraform
| set up what you need for continuous delivery (e.g. ArgoCD) as
| part of the infrastructure, then use the CD tool to handle day-
| to-day deployments. Most CD tooling then asks you to package
| your deployment in something like Helm.
| chucky_z wrote:
| You can setup dependent stacks in CDKTF. It's far from as
| clean as a standard TF DAG plan/apply but I'm having a lot of
| success with it right now. If I were actively using k8s at
| the moment I would probably setup dependent cluster resources
| using this method, e.g: ensure a clean, finished CSI daemon
| deployment before deploying a deployment using that CSI
| provider :)
| solatic wrote:
| You're right that CDKTF with dependent stacks is probably
| better than nothing, but (a) CDKTF's compatibility with
| OpenTofu depends on a lack of breaking changes in CDKTF,
| since the OpenTofu team didn't fork CDKTF, so this is a
| little hairy for building production infrastructure; (b)
| CDKTF stacks, even when they can run in parallel, still run
| on the same machine that invoked CDKTF. When you have
| (ludicrous scale) X number of "stacks", this isn't a good
| fit. It's something that _should_ be doable in one of the
| managed Terraform services, but the pricing if you try to
| do (ludicrous scale) parallelism gets to be _insane_.
| mnahkies wrote:
| Whilst I'll agree that writing helm charts isn't particularly
| delightful, consuming them can be.
|
| In our case we have a single application/service base helm
| chart that provides sane defaults and all our deployments
| extend from. The amount of helm values config required by the
| consumers is minimal, and there has been very little occasion
| for a consumer to include their own templates - the base chart
| exposes enough knobs to avoid this.
|
| When it comes to third-party charts, many we've been able to
| deploy as is (sometimes with some PRs upstream to add extra
| functionality), and occasionally we've needed to wrap/fork
| them. We've deployed far more third-party charts as-is than not
| though.
|
| One thing probably worth mentioning w.r.t to maintaining our
| custom charts is the use of helm unittest
| (https://github.com/helm-unittest/helm-unittest) - it's been a
| big help to avoid regressions.
|
| We do manage a few kubernetes resources through terraform,
| including Argocd (via the helm provider which is rather slow
| when you have a lot of CRDs), but generally we've found helm
| chart deployed through Argocd to be much more manageable and
| productive.
| andrewguy9 wrote:
| I look forward to the blog post where they get off K8, in just 18
| months.
| surfingdino wrote:
| ECS makes sense when you are building and breaking stuff. K8s
| makes sense when you are mature (as an org).
| xiwenc wrote:
| I'm baffled to see so many anti-k8s sentiments on HN. Is it
| because most commenters are developers used to services like
| heroku, fly.io, render.com etc. Or run their apps on VM's?
| elktown wrote:
| I think some are just pretty sick and tired of the explosion of
| needless complexity we've seen in the last decade or so in
| software, and rightly so. This is an industry-wide problem of
| deeply misaligned incentives (& some amount of ZIRP gold rush),
| not specific to this particular case - if this one is even a
| good example of this to begin with.
|
| Honestly, as it stands, I think we'd be seen as pretty useless
| craftsmen in any other field due to an unhealthy obsession of
| our tooling and meta-work - consistently throwing any kind of
| sensible resource usage out of the window in favor of just
| getting to work with certain tooling. It's some kind of a
| "Temporarily embarrassed FAANG engineer" situation.
| cwiggs wrote:
| I agree with this somewhat. The other day I was driving home
| and I saw a sprinkler head and broke on the side of the road
| and was spraying water everywhere. It made me think, why
| aren't sprinkler systems designed with HA in mind? Why aren't
| there dual water lines with dual sprinkler heads everywhere
| with an electronic component that detects a break in a line
| and automatically switches to the backup water line? It's
| because the downside of having the water spray everywhere,
| the grass become unhealthy or die is less than how much it
| would cost to deploy it HA.
|
| In the software/tech industry it's common place to just
| accept that your app can't be down for any amount of time no
| matter what. No one checked to see how much more it would
| cost (engineering time & infra costs) to deploy the app so it
| would be HA, so no one checked to see if it would be worth
| it.
|
| I blame this logic on the low interest rates for a decade. I
| could be wrong.
| loire280 wrote:
| This week we had a few minutes of downtime on an internal
| service because of a node rotation that triggered an alert.
| The responding engineer started to put together a plan to
| make the service HA (which would have tripled the cost to
| serve). I asked how frequently the service went down and
| how many people would be inconvenienced if it did. They
| didn't know, but when we checked the metrics it had single-
| digit minutes of downtime this year and fewer than a dozen
| daily users. We bumped the threshold on the alert to longer
| than it takes for a pod to be re-scheduled and resolved the
| ticket.
| fragmede wrote:
| Why would wanting redundancy be a ZIRP? Is blaming
| everything on ZIRP like Mercury was in retrograde but for
| economics dorks?
| felixgallo wrote:
| Because the company overhired to the point where people
| were sitting around dreaming up useless features just to
| justify their workday.
| consteval wrote:
| It depends on the cost of complexity you're adding.
| Adding another database or whatever is really not that
| complex so yeah sure, go for it.
|
| But a lot of companies are building distributed systems
| purely because they want this ultra-low downtime.
| Distributed systems are HARD. You get an entire set of
| problems you don't get otherwise, and the complexity
| explodes.
|
| Often, in my opinion, this is not justified. Saving a few
| minutes of downtime in exchange for making your
| application orders of magnitude more complex is just not
| worth it.
|
| Distributed systems solve distributed problems. They're
| overkill if you just want better uptime or crisis
| recovery. You can do that with a monolith and a database
| and get 99.99% of the way there. That's good enough.
| addaon wrote:
| Redundancy, like most engineering choices, is a
| cost/benefit tradeoff. If the costs are distorted, the
| result of the tradeoff study will be distorted from the
| decisions that would be made in "more normal" times.
| zerkten wrote:
| You assume that the teams running these systems achieve
| acceptable uptime and companies aren't making refunds for
| missed uptime targets when contracts enforce that, or
| losing customers. There is definitely a vision for HA at
| many companies, but they are struggling with and without
| k8s.
| bobobob420 wrote:
| Any software engineer who thinks K8 is complex shouldn't be a
| software engineer. It's really not that hard to manage.
| LordKeren wrote:
| I think the key word is "needless" in terms of complexity.
| There are a lot of k8 projects that probably could benefit
| from a simpler orchestration system-- especially at smaller
| firms
| fragmede wrote:
| do you have a simpler orchestration system you'd
| recommend?
| qohen wrote:
| Nomad, from Hashicorp, comes to mind.
|
| https://www.nomadproject.io/
|
| https://github.com/hashicorp/nomad
| ahoka wrote:
| How is it more simple?
| Ramiro wrote:
| Every time I read about Nomad, I wonder the same. I swear
| I'm not trolling here, I honestly don't get how running
| Nomad is simpler than Kubernetes. Especially considering
| that there are substantially more resources and help on
| Kubernetes than Nomad.
| javadevmtl wrote:
| For me it was DC/OS with marathon and mesos! It worked,
| it was a tank and it's model was simple.There was also
| some nice 3rd party open source systems around Mesos that
| where also simple to use. Unfortunately Kube won.
|
| While nomad can be interesting again it's a single
| "smallish" vendor pushing an "open" (see debacle with
| Teraform) source project.
| javadevmtl wrote:
| No, it just looks and feels like enterprisy SOAP XML
| methodical wrote:
| Fair point but I think the key point here is unnecessary
| complexity versus necessary complexity. Are zero-downtime
| deployments and load balancing unnecessary? Perhaps for a
| personal project, but for any company with a consistent
| userbase I'd argue these are a non-negotiable, or should be
| anyways. In a situation where this is the expectation, k8s
| seems like the simplest answer, or near enough to it.
| darby_nine wrote:
| > It's some kind of a "Temporarily embarrassed FAANG
| engineer" situation.
|
| FAANG engineers made the same mistake, too, even though the
| analogy implies comparative competency or value.
| maayank wrote:
| It's one of those technologies where there's merit to use them
| in some situations but are too often cargo culted.
| caniszczyk wrote:
| Hating is a sign of success in some ways :)
|
| In some ways, it's nice to see companies move to use mostly
| open source infrastructure, a lot of it coming from CNCF
| (https://landscape.cncf.io), ASF and other organizations out
| there (on top of the random things on github).
| tryauuum wrote:
| For me it is about VMs. Feel uneasy knowing that any kernel
| vulnerability will allow a malicious code to escape the
| container and explore the kubernetes host
|
| There are kata-containers I think, they might solve my angst
| and make me enjoy k8s
|
| Overall... There's just nothing cool in kubernetes to me.
| Containers, load balancers, megabytes of yaml -- I've seen it
| all. Nothing feels interesting enough to try
| stackskipton wrote:
| vs the Application getting hacked and running lose on the VM?
|
| If you have never dealt with, I have to run these 50
| containers plus Nginx/CertBot while figuring out which node
| is best to run it, yea, I can see you not being thrilled
| about Kubernetes. For the rest of us though, Kubernetes helps
| out with that easily.
| moduspol wrote:
| For me personally, I get a little bit salty about it due to
| imagined, theoretical business needs of being multi-cloud, or
| being able to deploy on-prem someday if needed. It's tough to
| explain just how much longer it'll take, how much more
| expertise is required, how much more fragile it'll be, and how
| much more money it'll take to build out on Kubernetes instead
| of your AWS deployment model of choice (VM images on EC2, or
| Elastic Beanstalk, or ECS / Fargate, or Lambda).
|
| I don't want to set up or maintain my own ELK stack, or
| Prometheus. Or wrestle with CNI plugins. Or Kafka. Or high
| availability Postgres. Or Argo. Or Helm. Or control plane
| upgrades. I can get up and running with the AWS equivalent
| almost immediately, with almost no maintenance, and usually
| with linear costs starting near zero. I can solve business
| problems so, so much faster and more efficiently. It's the
| difference between me being able to blow away expectations and
| my whole team being quarters behind.
|
| That said, when there is a genuine multi-cloud or on-prem
| requirement, I wouldn't want to do it with anything other than
| k8s. And it's probably not as bad if you do actually work at a
| company big enough to have a lot of skilled engineers that
| understand k8s--that just hasn't been the case anywhere I've
| worked.
| drawnwren wrote:
| Genuine question: how are you handling load balancing, log
| aggregation, failure restart + readiness checks, deployment
| pipelines, and machine maintenance schedules with these
| "simple" setups?
|
| Because as annoying as getting the prometheus + loki + tempo
| + promtail stack going on k8s is --- I don't really believe
| that writing it from scratch is easier.
| felixgallo wrote:
| He named the services. Go read about them.
| drawnwren wrote:
| I'm not sure which services you think were named that
| solve the problems I mentioned, but none were. You're
| welcome to go read about them, I do this for a living.
| Bjartr wrote:
| Depending on use case specifics, Elastic Beanstalk can do
| that just fine.
| moduspol wrote:
| * Load balancing is handled pretty well by ALBs, and there
| are integrations with ECS autoscaling for health checks and
| similar
|
| * Log aggregation happens out of the box with CloudWatch
| Logs and CloudWatch Log Insights. It's configurable if you
| want different behavior
|
| * On ECS, you configure a "service" which describes how
| many instances of a "task" you want to keep running at a
| given time. It's the abstraction that handles spinning up
| new tasks when one fails
|
| * ECS supports ready checks, and (as noted above)
| integrates with ALB so that requests don't get sent to
| containers until they pass a readiness check
|
| * Machine maintenance schedules are non-existent if you use
| ECS / Fargate, or at least they're abstracted from you. As
| long as your application is built such that it can spin up
| a new task to replace your old one, it's something that
| will happen automatically when AWS decommissions the
| hardware it's running on. If you're using ECS without
| Fargate, it's as simple as changing the autoscaling group
| to use a newer AMI. By default, this won't replace all of
| the old instances, but will use the new AMI when spinning
| up new instances
|
| But again, though: the biggest selling point is the lack of
| maintenance / babysitting. If you set up your stack using
| ECS / Fargate and an ALB five years ago, it's still
| working, and you've probably done almost nothing to keep it
| that way.
|
| You might be able to do the same with Kubernetes, but your
| control plane will be out of date, your OSes will have many
| missed security updates. Might even need a major version
| update to the next LTS. Prometheus, Loki, Tempo, Promtail
| will be behind. Your helm charts will be revisions behind.
| Newer ones might depend on newer apiVersions that your
| control plane won't support until you update it. And don't
| forget to update your CNI plugin across your cluster, too.
|
| It's at least one full time job just keeping all that stuff
| working and up-to-date. And it takes a lot more know-how
| than just ECS and ALB.
| NewJazz wrote:
| It seems like you are comparing ECS to a self-managed
| Kubernetes cluster. Wouldn't it make more sense to
| compare to EKS or another managed Kubernetes offering?
| Many of your points don't apply in that case, especially
| around updates.
| moduspol wrote:
| A managed Kubernetes offering removes only some of the
| pain, and adds more in other areas. You're still on the
| hook for updating whatever add-ons you're using, though
| yes, it'll depend on how many you're using, and how
| painful it will be varies depending on how well your
| cloud provider handles it.
|
| Most of my managed Kubernetes experience is through
| Amazon's EKS, and the pain I remember included
| frustration from the supported Kubernetes versions being
| behind the upstream versions, lack of visibility for
| troubleshooting control nodes, and having to explain /
| understand delays in NIC and EBS appropriation /
| attachments for pods. Also the ALB ingress controller was
| something I needed to install and maintain independently
| (though that may be different now).
|
| Though that was also without us going neck-deep into
| being vendor agnostic. Using EKS just for the Kubernetes
| abstractions without trying hard to be vendor agnostic is
| valid--it's just not what I was comparing above because
| it was usually that specific business requirement that
| steered us toward Kubernetes in the first place.
|
| If you ARE using EKS with the intention of keeping as
| much as possible vendor agnostic, that's also valid, but
| then now you're including a lot of the stuff I complained
| about in my other comment: your own metrics stack, your
| own logging stack, your own alarm stack, your own CNI
| configuration, etc.
| drawnwren wrote:
| (Apologies for the snark, someone else made a short
| snarky comment that I felt was also wrong and I thought
| this thread was in reply to them before I typed it out --
| thank you for the reply)
|
| - ALBs -- yeah this is correct. However ALBs have much
| longer startup/health check times than Envoy/Traefik
|
| - Cloudwatch - this is true, however the "configurable"
| behavior makes cloudwatch trash out of the box. you get
| i.e. exceptions split across multiple log entries with
| the default configure
|
| - ECS tasks - yep, but the failure behavior of tasks is
| horrible because there're no notifications out of the box
| (you can configure it)
|
| - Fargate does allow you to avoid maintenance, however it
| has some very hairy edges like i.e. you can't use any
| container that expects to know its own ip address on a
| private vpc without writing a custom script. Networking
| in general is pretty arcane on Fargate and you're going
| to have to manually write and maintain the breakages from
| all this
|
| > You might be able to do the same with Kubernetes, but
| your control plane will be out of date, your OSes will
| have many missed security updates. Might even need a
| major version update to the next LTS. Prometheus, Loki,
| Tempo, Promtail will be behind. Your helm charts will be
| revisions behind. Newer ones might depend on newer
| apiVersions that your control plane won't support until
| you update it. And don't forget to update your CNI plugin
| across your cluster, too.
|
| I think maybe you haven't used K8S in years. Karpenter,
| EKS, + a GitOps (Flux or Argo) makes you get the same
| machine maintenance feeling as ECS but on K8S without any
| of the annoyances of dealing with ECS. All your app
| versions can be pinned or set to follow latest as you
| prefer. You get rolling updates each time you switch
| machines (same as ECS, and if you really want to you can
| run on top of Fargate).
|
| By contrast, if your ECS/Fargate instance fails you
| haven't mentioned any notifications in your list -- so if
| you forgot to configure and test that correctly, your ECS
| could legitimately be stuck on a version of your app code
| that is 3 years old and you might not know if you haven't
| inspected the correct part of amazon's arcane interface.
|
| By the way, you're paying per use for all of this.
|
| At the end of the day, I think modern Kubernetes is
| strictly simpler, cheaper, and better than ECS/Fargate
| out of the box and has the benefit of not needing to rely
| on 20 other AWS specific services that each have their
| own unique ways of failing and running a bill up if you
| forget to do "that one simple thing everyone who uses
| this niche service should know".
| mrgaro wrote:
| ECS+Fargate does give you zero maintenance, both in
| theory and in practise. As someone, who runs k8s at home
| and manages two clusters at work, I still do recommend
| our teams to use ECS+Fargate+ALB if they satisfy their
| requirements for stateless apps and they all love it
| because it is literaly zero maintenance, unlike you just
| described what k8s requires.
|
| Sure there are a lot of great feature with k8s which ECS
| cannot do, but when ECS does satisfy the requirements, it
| will require less maintenance, no matter what kind of k8s
| you compare it against to.
| angio wrote:
| I think you're just used to AWS services and don't see the
| complexity there. I tried running some stateful services on
| ECS once and it took me hours to have something _not_
| working. In Kubernetes it takes me literally minutes to
| achieve the same task (+ automatic chart updates with
| renovatebot).
| moduspol wrote:
| I'm not saying there's no complexity. It exists, and there
| are skills to be learned, but once you have the skills,
| it's not that hard.
|
| Obviously that part's not different from Kubernetes, but
| here's the part that is: maintenance and upgrades are
| either completely out of my scope or absolutely minimal. On
| ECS, it might involve switching to a more recently built
| AMI every six months or so. AWS is famously good about not
| making backward incompatible changes to their APIs, so for
| the most part, things just keep working.
|
| And don't forget you'll need a lot of those AWS skills to
| run Kubernetes on AWS, too. If you're lucky, you'll get
| simple use cases working without them. But once PVCs aren't
| getting mounted, or pods are stuck waiting because you ran
| out of ENI slots on the box, or requests are timing out
| somewhere between your ALB and your pods, you're going to
| be digging into the layer between AWS and Kubernetes to
| troubleshoot those things.
|
| I run Kubernetes at home for my home lab, and it's not zero
| maintenance. It takes care and feeding, troubleshooting,
| and resolution to keep things working over the long term.
| And that's for my incredibly simple use cases (single node
| clusters with no shared virtualized network, no virtualized
| storage, no centralized logs or metrics). I've been in
| charge of much more involved ones at work and the
| complexity ceiling is almost unbounded. Running a
| distributed, scalable container orchestration platform is a
| lot more involved than piggy backing on ECS (or Lambda).
| mountainriver wrote:
| I hear a lot of comments that sound like people who used K8s
| years ago and not since. The clouds have made K8s management
| stupid simple at this point, you can absolutely get up and
| running immediately with no worry of upgrades on a modern
| provider like GKE
| archenemybuntu wrote:
| Kubernetes itself is built around mostly solid distributed
| system principles.
|
| It's the ecosystem around it which turns things needlessly
| complex.
|
| Just because you have kubernetes, you don't necessarily need
| istio, helm, Argo cd, cilium, and whatever half baked stuff is
| pushed by CNCF yesterday.
|
| For example take a look at helm. Its templating is atrocious,
| and if I am still correct, it doesn't have a way to order
| resources properly except hooks. Sometimes resource A
| (deployment) depends on resource B (some CRD).
|
| The culture around kubernetes dictates you bring in everything
| pushed by CNCF. And most of these stuff are half baked MVPs.
|
| ---
|
| The word devops has created expectations that back end
| developer should be fighting kubernetes if something goes
| wrong.
|
| ---
|
| Containerization is done poorly by many orgs, no care about
| security and image size. That's a rant for another day. I
| suspect this isn't a big reason for kubernetes hate here.
| solatic wrote:
| I don't get the hate for Kubernetes in this thread. TFA is from
| _Figma_. You can talk all day long about how early startups just
| don 't need the kind of management benefits that Kubernetes
| offers, but the article isn't written by someone working for a
| startup, it's written by a company that nearly got sold to Adobe
| for $20 billion.
|
| Y'all really don't think a company like Figma stands to benefit
| from the flexibility that Kubernetes offers?
| BobbyJo wrote:
| Kubernetes isn't even that complicated, and first party support
| from cloud providers often means you're doing something in K8s
| inleu of doing it in a cloud specific way (like ingress vs
| cloud specific load balancer setups).
|
| At a certain scale, K8s is the simple option.
|
| I think much of the hate on HN comes from the "ruby on rails is
| all you need" crowd.
| JohnMakin wrote:
| > I think much of the hate on HN comes from the "ruby on
| rails is all you need" crowd.
|
| Maybe - people seem really gungho about serverless solutions
| here too
| dexwiz wrote:
| The hype for serverless cooled after that article about
| Prime Video dropping lambda. No one wants a product that a
| company won't dogfood. I realize Amazon probably uses
| lambda elsewhere, but it was still a bad look.
| LordKeren wrote:
| I think it was much more about one specific use case of
| lambda that was a bad fit for the prime video team's need
| and not a rejection of lambda/serverless. TBH, it kind of
| reflected more poorly on the team than lambda as a
| product
| cmckn wrote:
| > Amazon probably uses lambda elsewhere
|
| Yes, you could say that. :)
| JohnMakin wrote:
| not probably, their lambda service powers much of their
| control plane.
| eutropia wrote:
| I guess the ones who quietly ship dozens of rails apps on k8s
| are too busy getting shit done to stop and share their boring
| opinions about pragmatically choosing the right tool for the
| job :)
| BobbyJo wrote:
| "But you can run your rails app on a single host with
| embedded SQLite, K8s is unnecessary."
| ffsm8 wrote:
| And there is truth to that. _Most_ deployments are at
| that level, and it absolutely is way more performant then
| the alternative. it just comes with several tradeoffs...
| But these tradeoffs are usually worth it for deployments
| with <10k concurrent users. Which Figma certainly _isn
| 't_.
|
| Though you probably could still do it, but that's likely
| more trouble then it's worth
|
| (The 10k is just an arbitrary number I made up, there is
| no magic number which makes this approach unviable, it
| all depends on how the users interact with the
| platform/how often and where the data is inserted)
| threeseed wrote:
| Always said by people who haven't spent much time in the
| cloud.
|
| Because single hosts will always go down. Just a question
| of when.
| BossingAround wrote:
| I love k8s, but bringing back up a single app that
| crashed is a very different problem from "our k8s is
| down" - because if you think your k8s won't go down,
| you're in for a surprise.
|
| You can view a single k8s also as a single host, which
| will go down at some point (e.g. a botched upgrade, cloud
| network partition, or something similar). While much less
| frequent, also much more difficult to get out of.
|
| Of course, if you have a multi-cloud setup with automatic
| (and periodically tested!) app migration across clouds,
| well then... Perhaps that's the answer nowadays.. :)
| solatic wrote:
| > if you think your k8s won't go down, you're in for a
| surprise
|
| Kubernetes is a remarkably reliable piece of software.
| I've administered (large X) number of clusters that often
| had several years of cluster lifetime, each, everything
| being upgraded through the relatively frequent Kubernetes
| release lifecycle. We definitely needed some maintenance
| windows sometimes, but well, no, Kubernetes didn't
| unexpectedly crash on us. Maybe I just got lucky, who
| knows. The closest we ever got was the underlying etcd
| cluster having heartbeat timeouts due to insufficient
| hardware, and etcd healed itself when the nodes were
| reprovisioned.
|
| There's definitely a whole lotta stuff in the Kubernetes
| ecosystem that isn't nearly as reliable, but that has to
| be differentiated from Kubernetes itself (and the
| internal etcd dependency).
|
| > You can view a single k8s also as a single host, which
| will go down at some point (e.g. a botched upgrade, cloud
| network partition, or something similar)
|
| The managed Kubernetes services solve the whole "botched
| upgrade" concern. etcd is designed to tolerate cloud
| network partitions and recover.
|
| Comparing this to sudden hardware loss on a single-VM app
| is, quite frankly, insane.
| cyberpunk wrote:
| Even if your entire control plane disappears your nodes
| will keep running and likely for enough time to build an
| entirely new cluster to flip over to.
|
| I don't get it either. It's not hard at all.
| BossingAround wrote:
| Your nodes & containers keep running, but is your
| networking up when your control plane is down?
| __turbobrew__ wrote:
| If you start using more esoteric features the reliability
| of k8s goes down. Guess what happens when you enable the
| in place vertical pod scaling feature gate?
|
| It restarts every single container in the cluster at the
| same time:
| https://github.com/kubernetes/kubernetes/issues/122028
|
| We have also found data races in the statefulset
| controller which only occurs when you have thousands of
| statefulsets.
|
| Overall, if you stay on the beaten path k8s reliability
| is good.
| kayodelycaon wrote:
| I've been working with rails since 1.2 and I've never
| seen anyone actually do this. Every meaningful deployment
| I've seen uses postgres or mysql. (Or god forbid
| mongodb.) It takes very little time with yours sol
| statements
|
| You can run rails on a single host using a database on
| the same server. I've done it and it works just fine as
| long as you tune things correctly.
| a_bored_husky wrote:
| > as long as you tune things correctly
|
| Can you elaborate?
| kayodelycaon wrote:
| I don't remember the exact details because it was a long
| time ago, but what I do remember is
|
| - Limiting memory usage and number of connections for
| mysql
|
| - Tracking maximum memory size of rails application
| servers so you didn't run out a memory by running too
| many of them
|
| - Avoid writing unnecessarily memory intensive code (This
| is pretty easy in ruby if you know what you're doing)
|
| - Avoiding using gems unless they were worth the memory
| use
|
| - Configuring the frontend webserver to start dropping
| connections before it ran out of memory (I'm pretty sure
| that was just a guess)
|
| - Using the frontend webserver to handle traffic whenever
| possible (mostly redirects)
|
| - Using IP tables to block traffic before hitting the
| webserver
|
| - Periodically checking memory use and turning off
| unnecessary services and cronjobs
|
| I had the entire application running on a 512mb VPS with
| roughly 70mb to spare. It was a little less spare than I
| wanted but it worked.
|
| Most of this was just rate limiting with extra steps. At
| the time rails couldn't use threads, so there was a hard
| limit on the number of concurrent tasks.
|
| When the site went down it was due to rate limiting and
| not the server locking up. It was possible to ssh in and
| make firewall adjustments instead of a forced restart.
| a_bored_husky wrote:
| Thank you.
| bamboozled wrote:
| Agreed, we're a small team and we benefit greatly from
| managed k8s (EKS). I have to say the whole ecosystem just
| continues to improve as far as I can tell and the developer
| satisfaction is really high with it.
|
| Personally I think k8s is where it's at now. The innovation
| and open source contributions are immense.
|
| I'm glad we made the switch. I understand the frustrations of
| the past, but I think it was much harder to use 4+ years ago.
| Now, I don't see how anyone could mess it up so hard.
| MrDarcy wrote:
| > Kubernetes isn't even that complicated
|
| I've been struggling to square this sentiment as well. I
| spend all day in AWS and k8s and k8s is at least an order of
| magnitude simpler than AWS.
|
| What are all the people who think operating k8s is too
| complicated operating on? Surely not AWS...
| tbrownaw wrote:
| The thing you already know tends to be less complicated
| than the thing you don't know.
| rco8786 wrote:
| I think "k8s is complicated" and "AWS is even more
| complicated" can both be true.
|
| Doing _anything_ in AWS is like pulling teeth.
| brainzap wrote:
| The sum is complex, specially with the custom operators.
| chrischen wrote:
| There are also a lot of cog-in-the-machine engineers here
| that totally do not get the bigger picture or the vantage
| point from another department.
| dorianmariefr wrote:
| ruby on rails is all you need
| logifail wrote:
| > it's written by a company that nearly got sold to Adobe for
| $20 billion
|
| (Apologies if this is a dumb question) but isn't Figma big
| enough to want to do any of their stuff on their own hardware
| yet? Why would they still be paying AWS rates?
|
| Or is it the case that a high-profile blog post about K8S and
| being provider-agnostic gets you sufficient discount on your
| AWS bill to still be value-for-money?
| jeffbee wrote:
| There are a lot of ex-Dropbox people at Figma who might have
| learned firsthand that bringing your stuff on-prem under a
| theory of saving money is an intensely stupid idea.
| logifail wrote:
| > There are a lot of ex-Dropbox people at Figma who might
| have learned firsthand that bringing your stuff on-prem
| under a theory of saving money is an intensely stupid idea
|
| Well, that's one hypothesis.
|
| Another is that "Every maturing company with predictable
| products must be exploring ways to move workloads out of
| the cloud. AWS took your margin and isn't giving it back."
| ( https://news.ycombinator.com/item?id=35235775 )
| ozim wrote:
| They are preparing for next blog post in a year - ,,how we
| cut costs by xx% by moving to our own servers".
| hyperbolablabla wrote:
| I work for a company making ~$9B in annual revenue and we use
| AWS for everything. I think a big aspect of that is just
| developer buy-in, as well as reliability guarantees, and
| being able to blame Amazon when things do go down
| st3fan wrote:
| Also, you don't have to worry about half of your stack? The
| shared responsibility model really works.
| consteval wrote:
| No, you still do. You just replace those sysadmins with
| AWS dev ops people. But ultimately your concerns haven't
| gone down, they've changed. It's true you don't have to
| worry about hardware. But, then again, you can use coloco
| datacenters or even VPS.
| NomDePlum wrote:
| Much bigger companies use AWS for very practical well thought
| out reasons.
|
| Not managing procurement of hardware, upgrades, etc, and a
| defined standard operating model with accessible
| documentation and the ability to hire people with experience,
| and have to hire less people as you are doing less is enough
| to build a viable and demonstrable business case.
|
| Scale beyond a certain point is hard without support and
| delegated responsibility.
| tayo42 wrote:
| There must be a prohibitively expensive upfront cost to buy
| enough servers to do this. Plus bringing in all the skill
| that doesn't exist that can stand up and run something like
| they would require.
|
| I wonder if as time goes on that skill to use hardware is
| dissappearing. New engineers don't learn it, and the ones
| that slowly forget. I'm not that sharp on anything I haven't
| done in years, even if it's in a related domain.
| j_kao wrote:
| Companies like Netflix with bigger market caps are still on
| AWS.
|
| I can imagine the productivity of spinning up elastic cloud
| resources vs fixed data center resourcing being more
| important, especially considering how frequently a company
| like Figma ships new features.
| manquer wrote:
| A valuation is a just headline number which have no
| operational bearing.
|
| Their ARR in 2022 was around $400M-450M. Say the infra budget
| at a typical 10% would be $50M. While it is a lot of money,
| it is not build your hardware money, also not all of it would
| be compute budget. They also would be spending on other SaaS
| apps like say Snowflake etc to special workloads like with
| GPUs, so not all workloads would be in-house ready. I would
| surprised if their commodity compute/k8s is more than half
| their overall budgets.
|
| It is lot more likely to slow product growth to focus on this
| now, especially since they were/are still growing rapidly.
|
| Larger SaaS companies than them in ARR still find using cloud
| exclusively is more productive/efficient.
| sangnoir wrote:
| > Why would they still be paying AWS rates?
|
| They are almost certainly not paying sticker prices. Above a
| certain size, companies tend to have bespoke prices and SLAs
| that are negotiated in confidence.
| ijidak wrote:
| It's a fair question.
|
| Data centers are wildly expensive to operate if you want
| proper security, redundancy, reliability, recoverability,
| bandwidth, scale elasticity, etc.
|
| And when I say security, I'm not just talking about software
| level security, but literal armed guards are needed at the
| scale of a company like Figma.
|
| Bandwidth at that scale means literally negotiating to buy up
| enough direct fiber and verifying the routes that fiber takes
| between data centers.
|
| At one of the companies I worked at, it was not uncommon to
| lose data center connectivity because a farmer's tractor cut
| a major fiber line we relied on.
|
| Scalability might include tracking square footage available
| for new racks in physical buildings.
|
| As long as your company is profitable, at anything but
| Facebook like scale, it may not be worth the trouble to try
| to run your own data center.
|
| Even if the cloud doesn't save money, it saves mental energy
| and focus.
| mcpherrinm wrote:
| There's a ton of middle ground between a fully managed
| cloud like AWS and building your own hyperscaler datacenter
| like Facebook.
|
| Renting a few hundred cabinets from Equinix or Digital
| Realty is going to potentially be hugely cheaper than AWS,
| but you probably need a team of people to run it. That can
| be worthwhile if your growth is predictable and especially
| if your AWS bandwidth bill is expensive.
|
| But then you're building on bare metal. Gotta deploy your
| own databases, maybe kubernetes for running workloads, or
| something like VMware for VMs. And you don't get any
| managed cloud services, so that's another dozen employees
| you might need.
| shrubble wrote:
| This is a 20-years-ago take. If your datacenter provider
| doesn't have multiple fiber entry into the building with
| multiple carriers, you chose the wrong provider at this
| point.
| cwiggs wrote:
| k8s is complex, if you don't need the following you probably
| shouldn't use it:
|
| * Service discovery
|
| * Auto bin packing
|
| * Load Balancing
|
| * Automated rollouts and rollbacks
|
| * Horizonal scaling
|
| * Probably more I forgot about
|
| You also have secret and config management built in. If you use
| k8s you also have the added benefit of making it easier to move
| your workloads between clouds and bare metal. As long as you
| have a k8s cluster you can mostly move your app there.
|
| Problem is most companies I've worked at in the past 10 years
| needed multiple of the features above, and they decided to roll
| their own solution with Ansible/Chef, Terraform, ASGs, Packer,
| custom scripts, custom apps, etc. The solutions have always
| been worse than what k8s provides, and it's a bespoke tool that
| you can't hire for.
|
| For what k8s provides, it isn't complex, and it's all
| documented very well, AND it's extensible so you can build your
| own apps on top of it.
|
| I think there are more SWE on HN than
| Infra/Platform/Devops/buzzword engineers. As a result there are
| a lot of people who don't have a lot of experience managing
| infra and think that spinning up their docker container on a VM
| is the same as putting an app in k8s. That's my opinion on why
| k8s gets so much hate on HN.
| Osiris wrote:
| Those all seem important to even moderately sized products.
| worldsayshi wrote:
| As long as your requirements are simple the config doesn't
| need to be complex either. Not much more than docker-
| compose.
|
| But once you start using k8s you probably tend to scope
| creep and find a lot of shiny things to add to your set up.
| doctorpangloss wrote:
| Some ways to tell if someone is a great developer are easy.
| JetBrains IDE? Ample storage space? Solving problems with
| the CLI? Consistently formatted code using the language's
| packaging ecosystem? No comments that look like this:
| # A verbose comment that starts capitalized, followed by a
| single line of code, cuing you that it was written by a
| ChatBot.
|
| Some ways to tell if someone is a great developer is hard.
| You can't tell if someone is a brilliant shipper of
| features, choosing exactly the right concerns to worry
| about at the moment, like doing more website authoring and
| less devops, with a grand plan for how to make everything
| cohere later; or, if the guy just doesn't know what the
| fuck he is doing.
|
| Kubernetes adoption is one of those, hard ones. It isn't a
| strong, bright signal like using PEP 8 and having a
| `pyproject.toml` with dependencies declared. So it may be
| obvious to you, "People adopt Kubernetes over ad-hoc
| decoupled solutions like Terraform because it has, in a
| Darwinian way, found the smallest set of easily
| surmountable concerns that should apply to most good
| applications." But most people just see, "Ahh! Why can't I
| just write the method bodies for Python function signatures
| someone else wrote for me, just like they did in CS50!!!"
| maccard wrote:
| For anyone who thinks this is a laundry list - running two
| instances of your app with a database means you need almost
| all of the above.
|
| The _minute_ you start running containers in the cloud you
| need to think of "what happens if it goes down/how do I
| update it/how does it find the database", and you need an
| orchestrator of some sort, IMO. A managed service (I prefer
| ECS personally as it's just stupidly simple) is the way to go
| here.
| hnav wrote:
| Eh, you can easily deploy containers to EC2/GCE and have an
| autoscaling group/MIG with healthchecks. That's what I'd be
| doing for a first pass or if I had a monolith (a lot of
| business is still deploying a big ball of PHP). K8s really
| comes into its own once you're running lots of
| heterogeneous stuff all built by different teams. Software
| reflects organizational structure so if you don't have a
| centralized infra team you likely don't want container
| orchestration since it's basically your own cloud.
| maccard wrote:
| By containers on EC2 you mean installing docker on AMI's?
| How do you deploy them?
|
| I really do think Google Cloud Run/Azure Container Apps
| (and then in AWS-land ECS-on-fargate) is the right
| solution _especially_ in that case - you just shove a
| container on and tell it the resources you need and
| you're done.
| gunapologist99 wrote:
| From https://stackoverflow.com/questions/24418815/how-do-
| i-instal... , here's an example that you can just paste
| into your load balancing LaunchConfig and never have to
| log into an instance at all (just add your own _runcmd:_
| section -- and, hey, it 's even YAML like everyone loves)
| #cloud-config apt: sources:
| docker.list: source: deb [arch=amd64]
| https://download.docker.com/linux/ubuntu $RELEASE stable
| keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
| packages: - docker-ce - docker-ce-cli
| cwiggs wrote:
| Sure you can use an AWS ASG, but I assume you also tie
| that into an AWS AlB/NLB. Then you use ACM for certs and
| now you are locked in to AWS times 3.
|
| Instead you can do those 3 and more in k8s and it would
| be the same manifests regardless which k8s cluster you
| deploy to, EKS, AKS, GKE, on prem, etc.
|
| Plus you don't get service discovery across VMs, you
| don't get a CSI so good luck if your app is stateful. How
| do you handle secrets, configs? How do you deploy
| everything, Ansible, Chef? The list goes on and on.
|
| If your app is simple sure, I haven't seen simple app in
| years.
| maccard wrote:
| I've never worked anywhere that has benefitted from
| avoiding lock-in. We would have saved thousands in dev-
| hours if we just used an ALB instead of tweaking nginx
| and/or caddy.
|
| Also, if you can't convert an ALB into an Azure Load
| balancer, then you probably have no business doing any
| sort of software development.
| MrDarcy wrote:
| I did this. It's not easier than k8s, GKE, EKS, etc....
| It's harder cause you have to roll it yourself.
|
| If you do this just use GKE autopilot. It's cheaper and
| done for you.
| gunapologist99 wrote:
| It's worth bearing in mind that, although any of these can be
| accomplished with any number of other products as you point
| out, LB and Horizontal Scaling, in particular, have been
| solved problems for more than 25 years (or longer depending
| on how you count)
|
| For example, even servers (aka instances/vms/vps) with load
| balancers (aka fabric/mesh/istio/traefik/caddy/nginx/ha
| proxy/ATS/ALB/ELB/oh just shoot me) in front existed for apps
| that are LARGER than can fit on a single server (virtually
| the definition of horizontally scalable). These apps are
| typically monoliths or perhaps app tiers that have fallen out
| of style (like the traditional n-tier architecture of app
| server-cache-database, swap out whatever layers you like).
|
| However, K8s is actually more about microservices. Each
| microservice can act like a tiny app on its own, but they are
| often inter-dependent and, especially at the beginning, it's
| often seen as not cost-effective to dedicate their own
| servers to them (along with the associate load balancing,
| redundant and cross-AZ, etc). And you might not even know
| what the scaling pain points for an app is, so this gives you
| a way to easily scale up without dedicating slightly
| expensive instances or support staff to running each cluster;
| your scale point is on the entire k8s cluster itself.
|
| Even though that is ALL true, it's also true that k8s' sweet
| spot is actually pretty narrow, and many apps _and teams_
| probably won 't benefit from it that much (or not at all and
| it actually ends up being a net negative, and that's not even
| talking about the much lower security isolation between
| containers compared to instances; yes, of course, k8s can
| schedule/orchestrate VMs as well, but no one really does
| that, unfortunately.)
|
| But, it's always good resume fodder, and it's about the
| closest thing to a standard in the industry right now, since
| everyone has convinced themselves that the standard multi-AZ
| configuration of 2014 is just too expensive or complex to run
| compared to k8s, or something like that.
| drdaeman wrote:
| > For what k8s provides, it isn't complex, and it's all
| documented very well
|
| I had a different experience. Some years ago I wanted to set
| up a toy K8s cluster over an IPv6-only network. It was a
| total mess - documentation did not cover this case (at least
| I have not found it back then) and there was _a lot_ of code
| to dig through to learn that it was not really supported back
| then as some code was hardcoded with AF_INET assumptions (I
| think it 's all fixed nowadays). And maybe it's just me, but
| I really had much easier time navigating Linux kernel source
| than digging through K8s and CNI codebases.
|
| This, together with a few very trivial crashes of "normal"
| non-toy clusters that I've seen (like two nodes suddenly
| failing to talk to each other, typically for simple textbook
| reasons like conntrack issues), resulted in an opinion "if
| something about this breaks, I have very limited ideas what
| to do, and it's a huge behemoth to learn". So I believe that
| simple things beat complex contraptions (assuming a simple
| system can do all you want it to do, of course!) in the long
| run because of the maintenance costs. Yeah, deploying K8s and
| running payloads is easy. Long-term maintenance - I'm not
| convinced that it can be easy, for a system of that scale.
|
| I mean, I try to steer away from K8s until I find a use case
| for it, but I've heard that when K8s fails, a lot of people
| just tend to deploy a replacement and migrate all payloads
| there, because it's easier to do so than troubleshoot. (Could
| be just my bubble, of course.)
| YZF wrote:
| There are other out of the box features that are useful:
|
| * Cert manager.
|
| * External-dns.
|
| * Monitoring stack (e.g. Grafana/Prometheus.)
|
| * Overlay network.
|
| * Integration with deployment tooling like ArgoCD or
| Spinnaker.
|
| * Relatively easy to deploy anything that comes with a helm
| chart (your database or search engine or whatnot).
|
| * Persistent volume/storage management.
|
| * High availability.
|
| It's also about using containers which mean there's a lot
| less to manage in hosts.
|
| I'm a fan of k8s. There's a learning curve but there's a huge
| ecosystem and I also find the docs to be good.
|
| But if you don't need any of it - don't use it! It is
| targeting a certain scale and beyond.
| epgui wrote:
| To this I would also add the ability to manage all of your
| infrastructure with k8s manifests (eg.: crossplane).
| kachapopopow wrote:
| I started with kubernetes and have never looked back. Being
| able to bring up a network copy, deploy a clustered
| database, deploy a distributed fs all in 10 minutes
| (including the install of k3s or k8s) has been a game-
| changer for me.
|
| You can run monolithic apps with no downtime restarts quite
| easily with k8s using rollout restart policy which is very
| useful when applications take minutes to start.
| BobbyJo wrote:
| 100%
|
| I can bring up a service, connect it to a
| postgres/redis/minio instance, and do almost anything
| locally that I can do in the cloud. It's a massive help
| for iterating.
|
| There is a learning curve, but you learn it and you can
| do _so damn much so damn easily_.
| methodical wrote:
| In the same vein here.
|
| Every time I see one of these posts and the ensuing
| comments I always get a little bit of inverse imposter
| syndrome. All of these people saying "Unless you're at
| 10k users+ scale you don't need k8s". If you're running a
| personal project with a single-digit user count, then
| sure, but only purely out of a cost-to-performance metric
| would I say k8s is unreasonable. Any scale larger,
| however, and I struggle to reconcile this position with
| the reality that anything with a consistent user base
| _should_ have zero-downtime deployments, load balancing,
| etc. Maybe I 'm just incredibly OOTL, but when did these
| simple features to implement and essentially free from a
| cost standpoint become optional? Perhaps I'm just
| misunderstanding the argument, and the argument is that
| you should use a Fly or Vercel-esque platform that
| provides some of these benefits without needing to
| configure k8s. Still, the problem with this mindset is
| that vendor lock-in is a lot harder to correct once a
| platform is in production and being used consistently
| without prolonged downtime.
|
| Personally, I would do early builds with Fly and once I
| saw a consistent userbase I'd switch to k8s for scale,
| but this is purely due to the cost of a minimal k8s
| instance (especially on GKE or EKS). This, in essence,
| allows scaling from ~0 to ~1M+ with the only bottleneck
| being DB scaling (if you're using a single DB like
| CloudSQL).
|
| Still, I wish I could reconcile my personal disconnect
| with the majority of people here who regard k8s as overly
| complicated and unnecessary. Are there really that many
| shops out there who consider the advantages of k8s above
| them or are they just achieving the same result in a
| different manner?
|
| One could certainly learn enough k8s in a weekend to
| deploy a simple cluster. Now I'm not recommending this
| for someone's company's production instance, due to the
| foot guns if improperly configured, but the argument of
| k8s being too complicated to learn seems unfounded.
|
| /rant
| shakiXBT wrote:
| I've been in your shoes for quite a long time. By now
| I've accepted that a lot of folks on HN and other similar
| forums simply don't know / care about the issue that
| Kubernetes resolves, or that someone else in their
| company takes care of those for them
| AndrewKemendo wrote:
| It's actually much simpler than that
|
| k8s makes it easier to build over engineered
| architectures for applications that don't need that level
| of complexity
|
| So while you are correct that it is not actually that
| difficult to learn and implement K8S it's also almost
| always completely unnecessary even at the largest scale
|
| given that you can do the largest scale stuff without it
| and you should do most small scale stuff without it, the
| number of people for whom all of the risks and costs
| balancr out is much smaller than the amount that it has
| been promoted and pushed
|
| And given the fact that orchestration layers are a
| critical part of infrastructure, handing over or changing
| the data environment relationship in a multilayer
| computing environment to such an extent is a non-trivial
| one-way door
| uaas wrote:
| With the simplicity and cost of k3s and alternatives it
| can also make sense for personal projects from day one.
| tbrownaw wrote:
| > _k8s is complex, if you don 't need the following you
| probably shouldn't use it:_
|
| I use it (specifically, the canned k3s distro) for running a
| handful of single-instance things like for example plex on my
| utility server.
|
| Containers are a very nice UX for isolating apps from the
| host system, and k8s is a very nice UX for running things
| made out of containers. Sure it's _designed_ for complex
| distributed apps with lots of separate pieces, but it still
| handles the degenerate case (single instance of a single
| container) just fine.
| st3fan wrote:
| If you don't need any of those things then your use of k8s
| just becomes simpler.
|
| I find k8s an extremely nice platform to deploy simple things
| in that don't need any of the advanced features. All you do
| is package your programs as containers and write a minimal
| manifest and there you go. You need to learn a few new
| things, but the things you do not have to worry about that is
| a really great return.
|
| Nomad is a good contender in that space but I think HashiCorp
| is letting it slowly become EOL and there are bascially zero
| Nomad-As-A-Service providers.
| hylaride wrote:
| If you don't need any of those things, going for a
| "serverless" option like fargate or whatever other cloud
| equivalents exist is a far better value prop. Then you
| never have to worry about k8s support or upgrades (of
| course, ECS/fargate is shit in its own ways, in particular
| the deployments being tied to new task definitions...).
| tracerbulletx wrote:
| Even on a small project it's actually better imo than tying
| everything to a platform like netlify or vercel. I have this
| little notepad app that I deploy to a two node cluster in a
| github action and its an excellent workflow. The k8s to get
| everything deployed, provision tls and everything on commit is
| like 150 lines of mostly boilerplate yaml, I could pretty
| easily make it support branch previews or whatever too.
| https://github.com/SteveCastle/modelpad
| saaspirant wrote:
| Unrelated: What does _TFA_ mean here? Google and GPT didn't
| help (even with context)
| solatic wrote:
| The Featured Article.
|
| (or, if you read it in a frustrated voice, The F**ing
| Article.)
| Xixi wrote:
| Related acronyms: RTFA (Read The F**ing Article) and RTFM
| (Read The F**ing Manual). The latter was a very common
| answer when struggling with Linux in the early 2000s...
| osigurdson wrote:
| Kubernetes is the most amazing piece of software engineering
| that I have ever seen. Most of the hate is merely being
| directed at the learning curve.
| otabdeveloper4 wrote:
| No, k8s is shit.
|
| It's only useful for the degenerate "run lots of instances of
| webapp servers running slow interpreted languages" use case.
|
| Trying to do anything else in it is madness.
|
| And for the "webapp servers" use case they could have built
| something a thousand times simpler and more robust. Serving
| templated html ain't rocket science. (At least compared to
| e.g. running an OLAP database cluster.)
| shakiXBT wrote:
| Could you please bless us with another way to easily
| orchestrate thousands of containers in a cloud vendor
| agnostic fashion? Thanks!
|
| Oh, and just in case your first rebuttal is "having
| thousands of containers means you've already failed" - not
| everyone works in a mom n pop shop
| otabdeveloper4 wrote:
| Read my post again.
|
| Just because k8s is the only game in town doesn't mean it
| is technically any good.
|
| As a technology it is a total shitshow.
|
| Luckily, the problem it solves ("orchestrating" slow
| webapp containers) is not a problem most professionals
| care about.
|
| Feature creep of k8s into domains it is utterly
| unsuitable for because devops wants a pay raise is a
| different issue.
| shakiXBT wrote:
| > Orchestrating containers is not a problem most
| professionals care about
|
| I truly wish you were right, but maybe it's good job
| security for us professionals!
| osigurdson wrote:
| >> As a technology it is a total shitshow.
|
| What aspects are you referring to?
|
| >> is not a problem most professionals care about
|
| professional as in True Scotsman?
| otabdeveloper4 wrote:
| > professional as in True Scotsman?
|
| No, I mean that Kubernetes solves a super narrow and
| specific problem that most developers do not need to
| solve.
| candiddevmike wrote:
| > Oh, and just in case your first rebuttal is "having
| thousands of containers means you've already failed" -
| not everyone works in a mom n pop shop
|
| The majority of folks, whether or not they admit it,
| probably do...
| osigurdson wrote:
| Does this meet your definition of madness?
|
| https://openai.com/index/scaling-kubernetes-to-7500-nodes/
| otabdeveloper4 wrote:
| Yeah, they basically spent a shitload of effort
| developing their own cluster management platform that
| turns off all the Kubernetes functionality in Kubernetes.
|
| Must be some artifact of hosting on Azure, because I
| can't imagine any other reason to do something this
| contorted.
| spmurrayzzz wrote:
| I agree with respect to admiring it from afar. I've gone
| through large chunks of the source many times and always have
| an appreciation for what it does and how it accomplishes it.
| It has a great, supportive community around it as well (if
| not a tiny bit proselytizing at times, which doesn't bother
| me really).
|
| With all that said, while I have no "hate" for the stack, I
| still have no plans to migrate our container infrastructure
| to it now or in the foreseeable future. I say that precisely
| _because_ I 've seen the source, not in spite of it. The net
| ROI on subsuming that level of complexity for most
| application ecosystems just doesn't strike me as obvious.
| hylaride wrote:
| Not to be rude, but K8s has had some very glaring issues,
| especially early on when the hype was at max.
|
| * Its secrets management was terrible, and for awhile it
| stored them in plaintext in etcd. * The learning curve was
| real and that's dangerous as there were no "best practice"
| guides or lessons learned. There are lots of horror stories
| of upgrades gone wrong, bugs, etc. Complexity leaves a
| greater chance of misconfiguration, which can cause security
| or stability problems. * It was often redundant. If you're in
| the cloud, you already had load balancers, service discovery,
| etc. * Upgrades were dangerous and painful in its early days.
| * It initially had glaring third party tooling integration
| issues, which made monitoring or package management harder
| (and led to third party apps like Helm, etc).
|
| A lot of these have been rectified, but a lot of us have been
| burned by the promise of a tool that google said was used
| internally, which was a bit of a lie as kubernetes was a
| rewrite of Borg.
|
| Kubernetes is powerful, but you can do powerful in simple(r)
| ways, too. If it was truly "the most amazing" it would have
| been designed to be simple by default with as much complexity
| needed as everybody's deployments. It wasn't.
| globular-toast wrote:
| I don't get the hate even if you are a small company. K8s has
| massively simplified our deployments. It used to be each app
| had it's own completely different deployment process. Could
| have been a shell script that SSHed to some VM. Who managed
| said VM? Did it do its own TLS termination? Fuck knows. Maybe
| they used Ansible. Great, but that's another tool to learn and
| do I really need to set up bare metal hosts from scratch for
| every service. No, so there's probably some other Ansible
| config somewhere that sets them up. And the secrets are stored
| where? Etc etc.
|
| People who say "you don't need k8s" never say what you do need.
| K8s gives us a uniform interface that works for everything. We
| just have a few YAML files for each app and it just works. We
| can just chuck new things on there and don't even have to think
| about networking. Just add a Service and it's magically
| available with a name to everything in the cluster. I know how
| to do this stuff from scratch and I do not want to be doing it
| every single time.
| shakiXBT wrote:
| if you don't need High Availability you can even deploy to a
| single node k3s cluster. It's still miles better than having
| to setup systemd services, an Apache/NGINX proxy, etc. etc.
| globular-toast wrote:
| Yep, and you can get far with k3s "fake" load balancer
| (ServiceLB). Then when you need a more "real" cluster
| basically all the concepts are the same you just move to a
| new cluster.
| roydivision wrote:
| One thing I learned when I started learning Kubernetes is that
| it is two disciplines that overlap, but are distinct none the
| less:
|
| - Platform build and management - App build and management
|
| Getting a stable K8s cluster up and running is quite different
| to building and running apps on it. Obviously there is overlap
| in the knowledge required, but there is a world of difference
| between using a cloud based cluster over your own home made
| one.
|
| We are a very small team and opted for cloud managed clusters,
| which really freed me up to concentrate on how to build and
| manage applications running on it.
| smitelli wrote:
| I can only speak for myself, and some of the reasons why K8s
| has left a bad taste in my mouth:
|
| - It can be complex depending on the third-party controllers
| and operators in use. If you're not anticipating how they're
| going to make your resources behave differently than the
| documentation examples suggest they will, it can be exhausting
| to trace down what's making them act that way.
|
| - The cluster owners encounter forced software updates that
| seem to come at the most inopportune times. Yes, staying fresh
| and new is important, but we have other actual business goals
| we have to achieve at the same time and -- especially with the
| current cost-cutting climate -- care and feeding of K8s is
| never an organizational priority.
|
| - A bunch of the controllers we relied on felt like alpha grade
| toy software. We went into each control plane update (see
| previous point) expecting some damn thing to break and require
| more time investment to get the cluster simply working like it
| was before.
|
| - While we (cluster owners) _begrudgingly_ updated, software
| teams that used the cluster absolutely did not. Countless
| support requests for broken deployments, which were all
| resolved by hand-holding the team through a Helm chart update
| that we advised them they 'd need to do months earlier.
|
| - It's really not cheaper than e.g. ECS, again, in my
| experience.
|
| - Maybe this has/will change with time, but I really didn't see
| the "onboarding talent is easier because they already know it."
| They absolutely did not. If you're coming from a shop that used
| Istio/Argo and move to a Linkerd/Flux shop, congratulations,
| now there's a bunch to unlearn and relearn.
|
| - K8s is the first environment where I palpably felt like we as
| an industry reached a point where there were so many layers and
| layers on top of abstractions of abstractions that it became
| un-debuggable _in practice_. This is points #1-3 coming
| together to manifest as weird latency spikes, scaling runaways,
| and oncall runbooks that were tantamount to "turn it off and
| back on."
|
| Were some of these problems organizational? Almost certainly.
| But K8s had always been sold as this miracle technology that
| would relieve so many pain points that we would be better off
| than we had been. In my experience, it did not do that.
| ahoka wrote:
| What would be the alternative?
| breakingcups wrote:
| I feel so out of touch when I read a blog post which casually
| mentions 6 CNCF projects with kool names that I've never heard
| of, for gaining seemingly simple functionality.
|
| I'm really wondering if I'm aging out of professional software
| development.
| renewiltord wrote:
| Nah, there's lots of IC work. It just means that you're
| unfamiliar with one approach to org scaling: abstracting over
| hardware, logging, retrying handled by platform team.
|
| It's not the only approach so you may well be familiar with
| others.
| twodave wrote:
| TL;DR because they already ran everything in containers. Having
| performed a migration where this wasn't the case, the path from
| non-containerized to containerized is way more effort than going
| from containerized non-k8s to k8s.
| _pdp_ wrote:
| In my own experience, AWS Fargate is easier, more secure and way
| more robust then running your K8S even with EKS.
| watermelon0 wrote:
| Do you mean ECS Fargate? Because you can use AWS Fargate with
| EKS, with some limitations.
| _pdp_ wrote:
| Yes, ECS Fargate.
| ko_pivot wrote:
| I'm not surprised that the first reason they state for moving off
| of ECS was the lack of support for stateful services. The lack of
| integration between EBS and ECS has always felt really strange to
| me, considering that AWS already built all the logic to integrate
| EKS with EBS in a StatefulSet compliant way.
| datatrashfire wrote:
| https://aws.amazon.com/about-aws/whats-new/2024/01/amazon-ec...
|
| This was actually added beginning of the year. Definitely was
| on my most wanted list for a while. You could technically use
| EFS, but that's a very expensive way to run anything IO
| intensive.
| ko_pivot wrote:
| This adds support for ephemeral EBS volumes. When a task is
| created a volume gets created, and when the task is
| destroyed, for whatever reason, the volume is destroyed too.
| It has no concept of task identity. If the task needs to be
| moved to a new host, the volume is destroyed.
| julienmarie wrote:
| I personally love k8s. I run multiple small but complex custom
| e-commerce shops and handle all the tech on top of marketing,
| finance and customer service.
|
| I was running on dedicated servers before. My stack is quite
| complicated and deploys were a nightmare. In the end the dread of
| deploying was slowing down the little company.
|
| Learning and moving to k8s took me a month. I run around 25
| different services ( front ends, product admins, logistics
| dashboards, delivery routes optimizers, orsm, ERP, recommendation
| engine, search, etc.... ).
|
| It forced me to clean my act and structure things in a repeatable
| way. Having all your cluster config in one place allows you to
| exactly know the state of every service, which version is
| running.
|
| It allowed me to do rolling deploys with no downtime.
|
| Yes it's complex. As programmers we are used to complex. An Nginx
| config file is complex as well.
|
| But the more you dive into it the more you understand the
| architecture if k8s and how it makes sense. It forces you to
| respect the twelve factors to the letter.
|
| And yes, HA is more than nice, especially when your income is
| directly linked to the availability and stability of your stack.
|
| And it's not that expensive. I lay around 400 usd a month in
| hosting.
| maccard wrote:
| Figma were running on ECS before, so they weren't just running
| dedicated servers.
|
| I'm a K8S believer, but it _is_ complicated. It solves hard
| problems. If you're multi-cloud, it's a no brainer. If you're
| doing complex infra that you want a 1:1 mapping of locally, it
| works great.
|
| But if you're less than 100 developers and are deploying
| containers to just AWS, I think you'd be insane to use EKS over
| ECS + Fargate in 2024.
| epgui wrote:
| I don't know if it's just me, but I really don't see how
| kubernetes is more complex than ECS. Even for a one-man show.
| mrgaro wrote:
| Kubernetes needs regular updates, just as everything else
| (unless you carefully freeze your environment and somehow
| manage the vulnerability risks) and that requires manual
| work.
|
| ECS+Fargate however does not. If you are a small team
| managing the entire stack, you need to factor this into
| accounts. For example EKS forces you to upgrade the cluster
| to keep in the main kubernetes release cycle, albeit you
| can delay it somewhat.
|
| I personally run k8s at home and another two at work and I
| recommend our teams to use ECS+Fargate+ALB if it is enough
| for them.
| metaltyphoon wrote:
| > Kubernetes needs regular updates, just as everything
| else (unless you carefully freeze your environment and
| somehow manage the vulnerability risks) and that requires
| manual work
|
| Just use a managed K8s solution that deals with this?
| AKS, EKS and GKE all do this for you.
| ttymck wrote:
| It doesn't do everything for you. You still need to
| update applications that use deprecated APIs.
|
| This sort of "just" thinking is a great way for teams to
| drown in ops toil.
| metaltyphoon wrote:
| Are you assuming the workloads have to use K8s APIs?
| Where is this coming from? If that's not the case can you
| actually explain with a concrete example?
| Ramiro wrote:
| I agree with @metaltyphoon on this. Even for small teams,
| a managed version of Kubernetes takes away most of the
| pain. I've used both ECS+Fargate and Kubernetes, but
| these days, I prefer Kubernetes mainly because the
| ecosystem is way bigger, both vendor and open source.
| Most of the problems we run into are always one search or
| open source project away.
| mountainriver wrote:
| This just feels like a myth to me at this point. Kubernetes
| isn't hard, the clouds have made is so simple now that it's
| in no way more difficult than ECS and is way more flexible
| davewritescode wrote:
| I'm not saying I agree with the comment above you but
| Kubernetes upgrades and keeping all your addons/vpc stuff
| up to date can be a never ending slog of one-way upgrades
| that, when they go wrong, can cause big issues.
| organsnyder wrote:
| Those are all issues that should be solved by the managed
| provider.
|
| It's been a while since I spun up a k8s instance on AWS,
| Azure, or the like, but when I did I was astounded at how
| many implementation decisions and toil I had to do
| myself. Hosted k8s should be plug-and-play unless you
| have a very specialized use-case.
| xyst wrote:
| Of course there's no mention of performance loss or gain after
| migration.
|
| I remember when microservices architecture was the latest hot
| trend that came off the presses. Small and big firms were racing
| to redesign/reimplement apps. But most forgot they weren't
| Google/Netflix/Facebook.
|
| I remember end user experience ended up being _worse_ after the
| implementation. There was a saturation point where a single micro
| service called by all of the other micro services would cause
| complete system meltdown. There was also the case of an
| "accidental" dependency loop (S1 -> S2 -> S3 -> S1). Company
| didn't have an easy way to trace logs across different services
| (way before distributed tracing was a thing). Turns out only a
| specific condition would trigger the dependency loop (maybe, 1 in
| 100 requests?).
|
| Good times. Also, job safety.
| api wrote:
| This is a very fad driven industry. One of the things you earn
| after being in it for a long time is intuition for spotting
| fads and gratuitous complexity traps.
| makeitdouble wrote:
| I think that aspect is indirectly covered, as one of the main
| motivation was to get on a popular platform that helps
| hiring.
|
| I agree on how it's technically a waste of time to pursue
| fads, but it's also a huge PITA to have a platform that good
| engineers actively try to avoid, as their careers would
| stagnate (even as they themselves know that it's half a fad)
| sangnoir wrote:
| I avoid working at organisations with NIH syndrome - if
| they are below a certain size (i.e. they lack a standing
| dev-eng team to support their homegrown K8s "equivalent").
| Extra red flags if the said homegrown-system was developed
| by _that guy[1]_ who 's ostensibly a genius and has very
| strong opinions about his system. Give me k8s' YAML-hell
| any day instead, at least that bloat has transferable
| skills, amd I can actually Google common resolutions.
|
| 1. Has been at org for so long, management condones them
| flaunting the rules, like pushing straight to prod. Hates
| the "inefficiency" of open source platforms and purpose-
| built something "suitable for the company" by themselves,
| no documentation you have to ask them to fix issues because
| they don't accept code or suggestions from others. The DSL
| they developed is inconsistent and has no parser/linter.
| fragmede wrote:
| yeah. If you think Kubernetes is too complicated, the
| flip side of that is someone built the simpler thing, but
| then unfortunately it grew and grew, and now you've got
| this mess of a system. you could have just used a hosted
| k8s or k3s system from the start instead of reinventing
| the wheel.
|
| absoutely start as simple as you can, but plan to move to
| a hosted kube something asap instead of writing your own
| base images, unless that's a differentiator for your
| company.
| keybored wrote:
| 1. You have to constantly learn to keep up!
|
| 2. Fad-driven
|
| I wonder why I don't often see (1) critiqued on the basis of
| (2).
| api wrote:
| There's definitely a connection. Some of the change is
| improvement, like memory safe languages, but a ton of it is
| fads and needless complexity invented to provide a reason
| for some business or consultancy or group in a FAANG to
| exist. The rest of the industry cargo cults the big
| companies.
| teleforce wrote:
| > There was also the case of an "accidental" dependency loop
| (S1 -> S2 -> S3 -> S1).
|
| The classic dependency loop example that you thought will never
| encounter again for the rest of your life after OS class
| pram wrote:
| The best part is when it's all architected to depend on
| something that becomes essentially a single point of failure,
| like Kafka.
| bamboozled wrote:
| Not saying you're wrong, but what is your grand plan ? I've
| never seen anything perfect.
| intelVISA wrote:
| Shhh, you're ruining the party!
| arctek wrote:
| Isn't this somewhat better, at least when it fails it's in a
| single place?
|
| As someone using Kafka, I'd like to know what the (good)
| alternatives are if you have suggestions.
| happymellon wrote:
| It really depends on what your application is.
|
| Where I'm at, most of Kafka usage adds nothing of note and
| could be replaced with a rest service. It sounds good that
| Kafka makes everything execute in order, but honestly just
| making requests block does the same thing.
|
| At least then I could autoscale, which Kafka prevents.
| NortySpock wrote:
| NATS JetStream seemed to support horizontal scaling (either
| hierarchical via leaf nodes or a flat RAFT quorum) and back
| pressure when I played with it.
|
| I found it easy to get up and running, even as a RAFT
| cluster, but I have not tried to use JetStream mode heavily
| yet.
| lmm wrote:
| At least Kafka can be properly master-master HA. How people
| ever got away with building massively redundant fault-
| tolerant applications that were completely dependent on a
| single SQL server, I'll never understand.
| trog wrote:
| I think because it works pretty close to 100 percent of the
| time with only the most basic maintenance and care (like
| making sure you don't run out of disk space and keeping up
| with security updates). You can go amazingly far with this,
| and adding read only replicas gets you a lot further with
| little extra effort.
| slt2021 wrote:
| stack overflow has no problem serving entire planet from
| just _four_ SQL Servers
| (https://nickcraver.com/blog/2016/02/17/stack-overflow-the-
| ar...)
|
| There is really nothing wrong with a large vertically
| scaled up SQL server. You need to be either really really
| large large scale - or really really UNSKILLED in sql as to
| keep your relational model and working set in SQL so bad
| that you reach its limits
| HideousKojima wrote:
| >or really really UNSKILLED in sql as to keep your
| relational model and working set in SQL so bad that you
| reach its limits
|
| Sadly that's the case at my current job. Zero thought put
| into table design, zero effort into even formatting our
| stored procedures in a remotely readable way, zero
| attempts to cache data on the application side even when
| it's glaringly obvious. We actually brought in a
| consultant to diagnose our SQL Server performance issues
| (I'm sure we paid a small fortune for that) and the DB
| team and all of the other higher ups capable of actually
| enforcing change rejected every last one of his
| suggestions.
| lelanthran wrote:
| > How people ever got away with building massively
| redundant fault-tolerant applications that were completely
| dependent on a single SQL server, I'll never understand.
|
| It works, with a lower cognitive burden than that of
| horizontally scaling.
|
| For the loading concern (i.e. is this enough to handle the
| load):
|
| For most businesses, being able to serve 20k _concurrent_
| requests is way more than they need _anyway_ : an internal
| app used by 500k users typically has fewer than 20k
| concurrent requests in flight at peak.
|
| A cheap VPS running PostgreSQL can easily handle that.[1]
|
| For the "if something breaks" concern:
|
| Each "fault-tolerance" criteria added adds some cost. At
| some point the cost of being resistant to errors exceeds
| the cost of downtime. The mechanisms to reduce downtime
| when the single large SQL server shits the bed (failovers,
| RO followers, whatever) can reduce that downtime to mere
| minutes.
|
| What is the benefit to removing 3 minutes of downtime?
| $100? $1k? $100k? $1m? The business will have to decide
| what those 3 minutes are worth, and if that worth exceeds
| the cost of using something other than a single large SQL
| server.
|
| Until and unless you reach the load and downtime-cost of
| Google, Amazon, Twitter, FB, Netflix, etc, you're simply
| prematurely optimising for a scenario that, even in the
| businesses best-case projections, might never exist.
|
| The best thing to do, TBH, is ask the business for their
| best-case projections and build to handle 90% of that.
|
| [1] An expensive VPS running PostgreSQL can handle _a lot
| more_ than you think.
| brazzy wrote:
| > Each "fault-tolerance" criteria added adds some cost.
| At some point the cost of being resistant to errors
| exceeds the cost of downtime.
|
| Not to forget: those costs are not just in money and
| time, but also in complexity. And added complexity comes
| with its own downtime risks. It's not that uncommon for
| systems to go down due to problems with mechanisms or
| components that would not exist in a simpler, "not fault
| tolerant" system.
| zelphirkalt wrote:
| The business can try to decide what those 3min are worth,
| but ultimately the customers vote by either staying or
| leaving that service.
| lelanthran wrote:
| > The business can try to decide what those 3min are
| worth, but ultimately the customers vote by either
| staying or leaving that service.
|
| That's still a business decision.
|
| Customers don't vote with their feet based on what tech
| stack the business chose, they vote based on a range of
| other factors, few, if none, of which are related to 3m
| of downtime.
|
| There are little to no services I know off that would
| lose customers over 3m of downtime per week.
|
| IOW, 3m of downtime is mostly an imaginary problem.
| zelphirkalt wrote:
| That's really a too broad generalization.
|
| Services that people might leave, because of downtime are
| for example a git hoster, or a password manager. When
| people cannot push their commits and this happens
| multiple times, they may leave for another git hoster. I
| have seen this very example when gitlab was less stable
| and often unreachable for a few minutes. When people need
| some credentials, but cannot reach their online password
| manager, they cannot work. They cannot trust that service
| to be available in critical moments. Not being able to
| access your credentials leaves a very bad impression.
| Some will look for more reliable ways of storing their
| credentials.
| Bjartr wrote:
| The user experience of "often unreachable" means way more
| than 3m per week in practice.
| skydhash wrote:
| Why does a password manager needs to be online. I
| understand the need for synchronization, but being
| exclusively online is a very bad decision. And git
| synchronization is basically ssh, and if you mess that up
| on a regular basis, you have no job being in business in
| the first place. These are examples, but there's a few
| things that do not need to be online unless your computer
| is a thin client or you don't trust it at all.
| scott_w wrote:
| >> What is the benefit to removing 3 minutes of downtime?
|
| > The business can try to decide what those 3min are
| worth, but ultimately the customers vote by either
| staying or leaving that service.
|
| What do you think the business is doing when it evaluates
| what 3 minutes are worth?
| zelphirkalt wrote:
| There is no "the business". Businesses do all kinds of
| f'ed up things and lie to themselves all the time as
| well.
|
| I don't understand, what people are arguing about here.
| Are we really arguing about customers making their own
| choice? Since that is all I stated. The business can jump
| up and down all it wants, if the customers decide to
| leave. Is that not very clear?
| lelanthran wrote:
| > The business can jump up and down all it wants, if the
| customers decide to leave.
|
| I think the point is that, for a few minutes of downtime,
| businesses lose so little customers that it's not worth
| avoiding that downtime.
|
| Just now, we had a 5m period where disney+ stopped
| responding. We aren't going to cut off our toddler from
| peppa big and bluey for 5m of downtime _per day_ ,
| nevermind _per week_.
|
| You appeared to be under the impression that 3m
| downtime/week is enough to make people leave. This is
| simply not true, especially for internet services where
| the users are conditioned to simply wait.
| consteval wrote:
| True, but what people should understand about databases
| is they're incredibly mature software. They don't fail,
| they just don't. It's not like the software we're used to
| using where "whoopsie! Something broke!" is common.
|
| I've never, in my life, seen an error in SQL Server
| related to SQL Server. It's always been me, the app code
| developer.
|
| Now, to be fair, the server itself or the hardware CAN
| fail. But having active/passive database configurations
| is simple, tried and tested.
| skydhash wrote:
| And the server itself can be very resilient if you run
| something like debian or freebsd. Even on arch, I've seen
| things fails rarely unless it's fringe/proprietary code
| (bluetooth, nvidia, the browser and 3d accelerated
| graphics,...). That presumes you will use boring tech
| which are heavily tested by people around the world, not
| something "new" and "hyped" which is still on 0.x
| consteval wrote:
| I agree 100%. Unfortunately my company is pretty tied to
| windows and windows server, which is a pain. Upgrading
| and sysadmin-type work is still very manual and there's a
| lot of room for human error.
|
| I wish we would use something like Debian and take
| advantage of tech like systemd. But alas, we're still
| using COM and Windows Services and we still need to
| remote desktop in and click around on random GUIs to get
| stuff to work.
|
| Luckily, SQL Server itself is very stable and reliable.
| But even SQL Server runs on Linux.
| ClumsyPilot wrote:
| > For most businesses, being able to serve 20k concurrent
| requests is way more than they need anyway: an internal
| app used by 500k
|
| This is a very simple distinction and I am not sure why
| is it not understood
|
| For some reason people design public apps the same as
| internal apps
|
| The largest companies employ circa 1 million people -
| that's Walmart, Amazon, etc. most giants, like Shell,
| etc. companies have ~ 100k tops. That can be handled by 1
| beefy server.
|
| Successful consumer facing apps are hundred millions to
| billion. It's 3 orders of magnitude difference
|
| I have seen a company with 5k employees invest into mega-
| scalable microservice event driven architecture and I was
| was thinking - I hope they realise what they are doing
| and it's just CV-driven development
| hylaride wrote:
| Because people are beholden to costs and it's often out of
| our hands when to spend money on redundancy (or opportunity
| costs elsewhere).
|
| It's less true today when redundancy is baked into SaS
| products (like AWS Aurora, where even if you have a single
| database instance, it's easy to spin up a replacement one
| if the hardware on the one running fails).
| ClumsyPilot wrote:
| Ow yeah, I am looking at that problem right now
| vergessenmir wrote:
| I'm sorry,but I can't tell if you're being serious or not
| since you commented without qualification.
|
| One of the most stable system archictectures I've built was
| on Kafka AND it was running with microservices managed by
| teams across multiple geographies and time zones. It was one
| of the most reliable systems in the bank. There are
| situations where it isn't appropriate, which can be said for
| most tech e.g( K8S vs ECS vs Nomad vs bare metal)
|
| Every system has failure characteristics. Kafka's is defined
| as Consistent and Available and your system architecture
| needs to take that into consideration. Also the
| transactionality of tasks across multiple services and
| process boundaries is important.
|
| Let's not pretend that kubernetes (or the tech of the day) is
| at fault while completely ignoring the complex architectural
| considerations that are being juggled
| pram wrote:
| Basically because people end up engineering their
| microservices as a shim to funnel data into the "magical
| black hole"
|
| From my experience most microservices aren't engineered to
| handle back pressure. If there is a sudden upsurge in
| traffic or data the Kafka cluster is expected to absorb all
| of the throughput. If the cluster starts having IO issues
| then literally everything in your "distributed" application
| is now slowly failing until the consumers/brokers can catch
| up.
| jamesfinlayson wrote:
| > There was a saturation point where a single micro service
| called by all of the other micro services would cause complete
| system meltdown.
|
| Yep - saw that at a company recently - something in AWS was
| running a little slower than usual which cascaded to cause
| massive failures. Dozens of people were trying to get to the
| bottom of it, it mysteriously fixed itself and no one could
| offer any good explanation.
| spyspy wrote:
| Any company that doesn't have some form of distributed
| tracing in this day and age is acting with pure negligence
| IMO. Literally flying blind.
| bboygravity wrote:
| Distributed tracing?
|
| Is that the same as what Elixir/Erlang call supervision
| trees?
| williamdclt wrote:
| Likely about Opentelemetry
| ahoka wrote:
| Oh, come on! Just make sure your architecture is sound. If
| you need to run an expensive data analysis cluster
| connected to massive streams of collected call information
| to see that you have loops on your architecture, then you
| have a bigger issue.
| hylaride wrote:
| I don't know if you're being sarcastic, which if you are:
| heh.
|
| But to the point, if you're going to build a distributed
| system, you need tools to track problems across the
| distributed system that also works across teams. A poorly
| performing service could be caused by up/downstream
| components and doing that without some kind of tracing is
| hard even if your stack is linear.
|
| The same is true for a giant monolithic app, but the
| sophisticated tools are just different.
| consteval wrote:
| Solution: don't build a distributed system. Just have a
| computer somewhere running .NET or Java or something. If
| you really want data integrity and safety, just make the
| data layer distributed.
|
| There's very little reason to distribute application code.
| It's very, very rare that the limiting factor in an
| application is compute. Typically, it's the data layer,
| which you can change independently of your application.
| MaKey wrote:
| I have yet to personally see an application where
| distribution of its parts was beneficial. For most
| applications a boring monolith works totally fine.
| consteval wrote:
| I'm sure it exists when the problem itself is
| distributed. For example, I can imagine something like
| YouTube would require a complex distributed system.
|
| But I think very few problems fit into that archetype.
| Instead, people build distributed systems for reliability
| and integrity. But it's overkill, because you bring all
| the baggage and complexity of distributed computing. This
| area is incredibly difficult. I view it similar to
| parallelism. If you can avoid it for your problem, then
| avoid it. If you really can't, then take a less complex
| approach. There's no reason to jump to "scale to X
| threads and every thread is unaware of where it's
| running" type solutions, because those are complex.
| jiggawatts wrote:
| Even if the microservices platform is running at 1% capacity,
| it's guaranteed to have worse performance than almost any
| monolith architecture.
|
| It's very rare to meet a developer who has even the vaguest
| notion of what an RPC call costs in terms of microseconds.
|
| Fewer still that know about issues such as head-of-line
| blocking, the effects of load balancer modes such as hash
| versus round-robin, or the CPU overheads of protocol ser/des.
|
| If you have an architecture that involves about 5 hops combined
| with sidecars, envoys, reverse proxies, and multiple zones
| you're almost certainly spending 99% to 99.9% of the wall clock
| time _just waiting_. The useful compute time can rapidly start
| to approach zero.
|
| This is how you end up with apps like Jira taking a solid
| minute to show an empty form.
| throwaway48540 wrote:
| Are you talking about cloud Jira? I use it daily and it's
| very quick, even search results appear immediately...
| nikau wrote:
| Have you ever used an old ticketing system like remedy?
| Ugly as sin, but screens appear instantly.
|
| I think Web apps have been around so long now people have
| forgotten how unresponsive things are vs old 2 tier stuff.
| BeefWellington wrote:
| There's a definite vibe these days of "this is how it's
| always been" when it really hasn't.
| throwaway48540 wrote:
| The SPA Jira cloud is faster than anything server
| rendered for me, my connection is shit. In Jira I can at
| least move to static forms quickly and I'm not
| downloading the entire page on every move.
| chuckadams wrote:
| I've used Remedy in three different shops, and the
| experience varied dramatically. The entry screen might
| have popped instantly, but RPC timeouts on submission
| were common. Tab order for controls was the order they
| were added, not position on the screen. Remedy could be
| pleasant, but it was very dependent on a competent admin
| to set up and maintain it.
| abrookewood wrote:
| Maybe you're being honest, but you're using a throwaway
| account and I use Cloud Jira every day. It's slow and
| bloated and drives me crazy.
| kahmeal wrote:
| Has a LOT to do with how it's configured and what plugins
| are installed.
| jiggawatts wrote:
| It _really doesn 't_. This is the excuse trotted out by
| Atlassian staff when defending their products in public
| forums, essentially "corporate propaganda". They have a
| history of gaslighting users, either telling them to
| disregard the evidence of their own lying eyes, or that
| it's _their own fault_ for using the product wrong
| somehow.
|
| I tested the Jira cloud service with a new, blank
| account. Zero data, zero customisations, zero security
| rules. _Empty._
|
| Almost all basic operations took tens of seconds to run,
| even when run repeatedly to warm up any internal caches.
| Opening a new issue ticket form was especially bad,
| taking nearly a _minute_.
|
| Other Atlassian excuses included: corporate web proxy
| servers (I have none), slow Internet (gigabit fibre),
| slow PCs (gaming laptop on "high performance" settings),
| browser security plugins (none), etc...
| ffsm8 wrote:
| > Opening a new issue ticket form was especially bad,
| taking nearly a minute.
|
| At that point, something mustve been wrong with your
| instance. I'd never call jira fast, but the new ticked
| dialog on a unconfigured instance opens within <10s
| (which is absolutely horrendous performance to be clear.
| Anything more then 200-500ms is.)
| jiggawatts wrote:
| Cloud Jira is notably slower than on-prem Jira, which
| takes on the order of 10 seconds like you said.
| ffsm8 wrote:
| That does not mirror my own experience. And it's very
| easy to validate. Just create a free jira cloud instance,
| takes about 1 minute (
| https://www.atlassian.com/software/jira/try ) and click
| new issue.
|
| it's open within 1-2 sec (which is still bad performance,
| objectively speaking. It's an empty instance after all
| and already >1s )
| jiggawatts wrote:
| Ah, there's a new UI now. The last time I tested this,
| the entire look & feel was different and everything was
| in slow motion.
|
| It's still sluggish compared to a desktop app from the
| 1990s, but it's _much_ faster than just a couple of years
| ago.
| throwaway48540 wrote:
| This is just not true. The create new issue form appears
| nearly immediately. I have created two tickets right now
| - in less than a minute including the writing of few
| sentences.
| hadrien01 wrote:
| Atlassian themselves don't use JIRA Cloud. They use the
| datacenter edition (on-premise) for their public bug
| tracker, and it's sooooo much faster than the Cloud
| version: https://jira.atlassian.com/browse/
| lmm wrote:
| With cloud Jira you get thrown on a shared instance with no
| control over who you're sharing with, so it's random
| whether yours will be fast or extremely slow.
| Lucasoato wrote:
| I'm starting to think that these people praising Jira are
| just part of an architected PR campaign that tries to deny
| what's evident to the end users: Jira is slow, bloated in
| many cases almost unusable.
| randomdata wrote:
| _> It 's very rare to meet a developer who has even the
| vaguest notion of what an RPC call costs in terms of
| microseconds._
|
| To be fair, small time units are difficult to internalize.
| Just look at what happens when someone finds out that it
| takes tens of nanoseconds to call a C function in Go (gc).
| They regularly conclude that it's completely unusable, and
| not just in a tight loop with an unfathomable number of
| calls, but even for a single call in their program that runs
| once per day. You can flat out tell another developer exactly
| how many microseconds the RPC is going to add and they still
| aren't apt to get it.
|
| It is not rare to find developers who understand that RPC has
| a higher cost than a local function, though, and with enough
| understanding of that to know that there could be a problem
| if overused. Where they often fall down, however, is when the
| tools and frameworks try to hide the complexity by making RPC
| _look_ like a local function. It then becomes easy to miss
| that there is additional overheard to consider. Make the
| complexity explicit and you won 't find many developers
| oblivious to it.
| j16sdiz wrote:
| Those time cost need to contextualize with time budgets for
| each service. Without that, it is always somebody else's
| problem in a RPC world.
| HideousKojima wrote:
| >Just look at what happens when someone finds out that it
| takes tens of nanoseconds to call a C function in Go (gc).
|
| I'm not too familiar with Go but my default assumption is
| that it's just used as a convenient excuse to avoid
| learning how to do FFI.
| lmm wrote:
| > This is how you end up with apps like Jira taking a solid
| minute to show an empty form.
|
| Nah. Jira was horrible and slow long before the microservice
| trend.
| p_l wrote:
| JIRA slowness usually involved under provisioned server
| resources in my experience
| otabdeveloper4 wrote:
| It's actually worse than what you said. In 2024 the network
| is the only resource we can't upgrade on demand. There are
| physical limits we can't change. (I.e., there are only so
| many wires connecting your machines, and any significant
| upgrade involves building your own data center.)
|
| So really eventually we'll all be optimizing around network
| interfaces as the bottleneck.
| jiggawatts wrote:
| _cough_ speed of light _cough_
| beeandapenguin wrote:
| Extremely slow times - from development to production, backend
| to frontend. Depending on how bad things are, you might catch
| the microservice guys complaining over microseconds from a team
| downstream, in front of a FE dev who's spent his week
| optimizing the monotonically-increasing JS bundles with code
| splitting heuristics.
|
| Of course, it was because the client app recently went over the
| 100MB JS budget. Which they decided to make because the last
| time that happened, customers abroad reported seeing "white
| screens". International conversion dropped sharply not long
| after that.
|
| It's pretty silly. So ya, good times indeed. Time to learn k8s.
| dangus wrote:
| This article specifically mentions that they are not running
| microservices and has pretty clearly defined motivations for
| making the migration.
| ec109685 wrote:
| Why would moving to Kubernered from ECS introduce performance
| issues?
|
| They already has their architecture, largely, and just moved it
| over to K8s.
|
| They even mention they aren't a micro service company.
| trhway wrote:
| >there's no mention of performance loss or gain after migration
|
| to illustrate performance cost i usually ask people what ping
| they have say from one component/pod/service to another and to
| compare that value to what ping they'd get between 2 linux
| boxes sitting on that gorgeous 10Gb/40Gb/100Gb or even 1000Gb
| network that they are running their modern microservices
| architecture over.
| ellieh wrote:
| I imagine because the article mentions:
|
| > More broadly, we're not a microservices company, and we don't
| plan to become one
| dgb23 wrote:
| It seems the "micro" implies that services are separated by
| high level business terms, like "payment" or "inventory" with
| each having their own databases instead of computational
| terms like "storage", "load balancing" or "preprocessing"
| etc.
|
| Is this generally correct? How well is this term defined?
|
| If yes, then I'm not surprised this type of design has become
| a target for frustration. State is smeared across system,
| which implies a lot of messaging and arbitrary connections
| between services.
|
| That type of design is useful if you are an application
| platform (or similar) where you have no say in what the
| individual entities are, and actually have no idea what they
| will be.
|
| But if you have the birds-eye view and implement all of it,
| then why would you do that?
| rco8786 wrote:
| Or cost increase or decrease!
| crossroadsguy wrote:
| I see this in Android. Every few years (sometimes multiple
| times in a year) a new arch. becomes the fad and every TDH dev
| starts hankering for one way or the other on the lines "why are
| you not doing X..". Problem is, at immature firms (specially
| startups) the director and senior manager level leaders happily
| agree for the re-write i.e re-archs because they usually leave
| every season for the next place and they get to talk about that
| new thing which probably didn't last beyond their stay, or
| might have conked off before that.
|
| And the "test cases" porn! Goodness! People want "coverage" and
| that's it. It's a box to tick, irrespective of whether those
| test cases and they way those are written are actually
| meaningful in anyway. There are more like "let's have
| dependency injection everywhere" charade.
| alexpotato wrote:
| As often, Grug has a great line about microservices:
| grug wonder why big brain take hardest problem, factoring
| system correctly, and introduce network call too
| seem very confusing to grug
|
| https://grugbrain.dev/#grug-on-microservices
| kiesel wrote:
| Thanks for the link to this awesome page!
| Escapado wrote:
| Thanks for that link, I genuinely laughed out loud while
| reading some of those points! Love the presentation and what
| a wonderful reality check I can't agree with more.
| tsss wrote:
| > single micro service called by all of the other micro
| services
|
| So they didn't do microservices correctly. Big surprise.
| zbentley wrote:
| I mean, that pattern is pretty common in the micro service
| world. Services for things like authz, locking,
| logging/tracing, etc. are often centralized SPOFs.
|
| There are certainly ways to mitigate the SPOFiness of each of
| those cases, but that doesn't make having them an
| antipattern.
| jknoepfler wrote:
| One of the points of a microservice architecture (on k8s or
| otherwise) is that you can easily horizontally scale the
| component that was under pressure without having to scale out a
| monolithic application... that just sounds like people being
| clueless, not a failure of microservice architecture...
| okr wrote:
| And if it was a single application, how would that solved it?
| You still would have that loop, no?
|
| Personally, i think, it does not have to be hundreds of
| microservices, basically each function a service. But i see it
| more as the web in the internet. Things are sometimes not
| reachable or overloaded. I think that is normal life.
| jmdots wrote:
| Please just use it as a warehouse scale computer and don't make
| mode groups into pets.
| jokethrowaway wrote:
| In which universe migrating from docker containers in ECS to
| Kubernetes is an effort measured in years?
| strivingtobe wrote:
| > At the time we did not auto-scale any of our containerized
| services and were spending a lot of unnecessary money to keep
| services provisioned such that they could always handle peak
| load, even on nights and weekends when our traffic is much lower.
|
| Huh? You've been running on AWS for how long and haven't been
| using auto scaling AT ALL? How was this not priority number one
| for the company to fix? You're just intentionally burning money
| at that point!
|
| > While there is some support for auto-scaling on ECS, the
| Kubernetes ecosystem has robust open source offerings such as
| Keda for auto-scaling. In addition to simple triggers like CPU
| utilization, Keda supports scaling on the length of an AWS Simple
| Queue Service (SQS) queue as well as any custom metrics from
| Datadog.
|
| ECS autoscaling is easy, and supports these things. Fair play if
| you just really wanted to use CNCF projects, but this just seems
| like you didn't really utilize your previous infrastructure very
| well.
| kachapopopow wrote:
| It appears to me that people don't really understand kubernetes
| here.
|
| Kubernetes does not mean microservices, it does not mean
| containerization and isolation, hell, it doesn't even mean
| service discovery most of the time.
|
| The default smallest kubernetes installation provides you two
| things: kubelet (the scheduling agent) and kubeapi.
|
| What do these two allow you to do? KubeApi provides an API to
| interact with kubelet instances by telling them to do with
| manifests.
|
| That's all, that's all kubernetes is, just a dumb agent with some
| default bootstrap behavior that allows you to interact with a
| backend database.
|
| Now, let's get into kubernetes default extensions:
|
| - CoreDNS - linking service names to service addresses.
|
| - KubeProxy - routing traffic from host to services.
|
| - CNI(many options) - Networking between various service
| resources.
|
| After that, kubernetes is whatever you want it to be. It can be
| something that you can use to spawn few test databases. Deploy an
| entire production-certified clustered databases. A full
| distributed fs with automatic device discovery. Deploy backend
| services if you want to take advantage of service discovery,
| autoscaling and networking. Or it can be something as small as
| deploying monitoring (such as node-exporter) to every instance.
|
| And as a bonus, it allows you to do it from the comfort of your
| own local computer.
|
| This article says that figma migrated necessary services to
| kubernetes to improve developer experience and clearly said that
| things that don't need to be kubernetes aren't. For all we know
| they still run their services in raw instances and only use
| kubernetes for their storage and databases. And to add to all of
| that, kubernetes doesn't care where it runs, which is a great way
| to increase competition between cloud providers lowering the
| costs for all.
| tbrownaw wrote:
| Is it very common to use it without containers?
| otabdeveloper4 wrote:
| It is impossible.
| lisnake wrote:
| it's possible with virtlet or kubevirt
| darby_nine wrote:
| You can use VMs instead. I don't think the distinction
| matters very much though.
| kachapopopow wrote:
| I run it that way on my windows machines, the image is
| downloaded and executed directly.
| lmm wrote:
| Kubernetes absolutely means containerisation in practice. There
| is no other supported way of doing things with it. And "fake"
| abstraction where you pretend something is generic but it's
| actually not is one of the easiest ways to overcomplicate
| anything.
| kachapopopow wrote:
| If you disable security policy you and remount to pid 1 you
| escape any encapsulation. Or you can use a k8s implementation
| that just extracts the image and runs it.
| consteval wrote:
| To be fair, at the small scales you're talking about (maybe 1-2
| machines) systemd does the same stuff, just better with less
| complexity. And there's various much simpler ways to automate
| your deployments.
|
| If you don't have a distributed system then personally I think
| k8s makes no sense.
| kachapopopow wrote:
| I did say with few machines it can be overkill, but when you
| have more than a dozen of 2-3 machines or 6+ machines it gets
| overwhelming really fast. Kubernetes in it's smallest form is
| around 50MiB of memory and 0.1cpu.
| ec109685 wrote:
| > At high-growth companies, resources are precious
|
| Yeah, at those low growth companies, you have unlimited resources
| /s
| datadeft wrote:
| > Migrating onto Kubernetes can take years
|
| What a heck am I reading? For who? I am not sure why companies
| even bother with such migrations. Where is the business value?
| Where is the gain for the customer? Is this one of those "L'art
| pour l'art" project that Figma does it just because they can?
| xorcist wrote:
| It solves the "we have recently been acquired and have a lot of
| resources that we must put to use" problem.
| kevstev wrote:
| FWIW... I was pretty taken aback by this statement as well- and
| also the "brag" that they moved onto K8s in less than a year.
| At a very well established firm ~30 years old and with the
| baggage that came with it, we moved to K8s in far less time-
| though we made zero attempt to move everything to k8s, just
| stuff that could benefit from it. Our pitch was more or less-
| move to k8s and when we do the planned datacenter move at the
| end of the year, you don't have to do anything aside from a
| checkout. Otherwise you will have to redeploy your apps to new
| machines or VMs and deal with all the headache around that. Or
| you could just containerize now if you aren't already and we
| take care of the rest. Most migrated and were very happy with
| the results.
|
| There was plenty of services that were latency sensitive or in
| the HPC realm where it made no sense to force a migration
| though, and there was no attempt to force them to shoehorn in.
| rayrrr wrote:
| Just out of curiosity, is there any other modern system or
| service that anyone here can think of, where anyone in their
| right might would brag about migrating to it in less than a year?
| jjice wrote:
| It's a hard question to answer. Not all systems are equal in
| size, scope, and impact. K8s as a system is often the core of
| your infra, meaning everything running will be impacted. That
| coupled with their team constraints in the article make it
| sound like a year isn't awful.
|
| One system I can think of off the top of my head is when Amazon
| moved away from Oracle to fully Amazon/OSS RDBMSs a while ago,
| but that was multi year I think. If they could have done it in
| less than a year, they'd definitely be bragging.
| therealdrag0 wrote:
| I've seen many migrations take over a year. It's less about the
| technology and more about your tech debt, integration
| complexity, and resourcing.
| 05bmckay wrote:
| I don't think this is the flex they think it is...
| ravedave5 wrote:
| Completely left out of this post and most of the conversation is
| that being on K8 makes it much, much easier to go multi-cloud. K8
| is k8.
| syngrog66 wrote:
| k8s and "12 months" -> my priors likely confirmed. ha
| sjkoelle wrote:
| The title alone is a teensie bit hilarious
| Ramiro wrote:
| I love reading these "reports from the field"; I always pick up a
| thing or two. Thanks for sharing @ianvonseggern!
___________________________________________________________________
(page generated 2024-08-09 23:02 UTC)