[HN Gopher] Learnings from our 8 years of Kubernetes in production
___________________________________________________________________
Learnings from our 8 years of Kubernetes in production
Author : jonsson101
Score : 54 points
Date : 2024-02-06 10:10 UTC (12 hours ago)
(HTM) web link (medium.com)
(TXT) w3m dump (medium.com)
| nostrebored wrote:
| My question when looking at Kubernetes for small teams is always
| the same. Why?
|
| In the blog, there are multiple days of downtime, a complete
| cluster rebuild, a description of how individual experts have to
| be crowned as the technology is too complex to jump in and out of
| in any real production environment, handling versioning of helm
| and k8s, a description of managing the underlying scripts to
| rebuild for disaster (I'm assuming there's a data
| persistence/backup step here that goes unmentioned!), and on, and
| on and on.
|
| When you're already using cloud primitives, why not use your
| existing expertise there, their serverless offerings, and learn
| the IaC tooling of choice for that provider?
|
| Yes it will be more expensive on your cloud bell. But when you
| measure the TCO, is it really?
| belval wrote:
| Especially considering that the author seems to be using some
| Azure specific features anyway:
|
| > While being vendor-agnostic is a great idea, for us, it came
| with a high opportunity cost. After a while, we decided to go
| all-in on AKS-related Azure products, like the container
| registry, security scanning, auth, etc. For us, this resulted
| in an improved developer experience, simplified security (
| centralized access management with Azure Entra Id), and more,
| which led to faster time-to-market and reduced costs (volume
| benefits).
| teaearlgraycold wrote:
| We're starting to use k8s as a small team because the simpler
| offerings with GPUs available don't meet our needs. It's clear
| they're either built for someone else or are less reliable than
| an EKS cluster would be.
| liveoneggs wrote:
| k8s is half-baked at best but people enjoy copy-paste yaml
| recipes, which half-baked products lend themselves to, so it is
| loved
| auspiv wrote:
| I work for a US subsidiary of a very large oil company. We
| are migrating from Azure to AWS for many things (it is deemed
| "OneCloud"). A very large number of our new EC2 instances,
| and even our EKS instances, were provisioned within the last
| 6 months as T2 instances. Some, if we were lucky, were T3. T3
| was released 10 years ago. Copy + paste indeed.
| menschmanfred wrote:
| Our setup works very very well.
|
| And in smaller setups you would have a shared cluster or fully
| managed like gke etc.
| cortesoft wrote:
| Do people try to push it that strongly for small teams? Lots of
| us work on bigger teams and enjoy more of the benefits.
|
| However, I also still use Kubernetes for my personal projects,
| because I really appreciate the level of abstraction it
| supplies. Everyone always points out that you can do all the
| things k8s does in other ways, but what I like about it defines
| a common way to do everything. I don't care that there are 50
| ways to do it, I just like having one way.
|
| What this allows is for tools to seamlessly work together. It
| is trivial to have all sorts of cool functionality with minimal
| configuration.
| karolist wrote:
| This. It's the npm install 100 packages and do everything
| with JS vs Rails arguments all over again.
| politelemon wrote:
| > Do people try to push it that strongly for small teams?
|
| Yes. You have to understand that a lot of people without the
| benefit of experience will often base their technology
| choices on blog posts. K8S has a lot of mindshare and blog
| attention, so it gets seen as the only way to run a container
| in a production environment, while all the important aspects
| of it are ignored.
| datadeft wrote:
| > because I really appreciate the level of abstraction it
| supplies
|
| which are?
|
| I am seriously asking. I use docker-compose of some of the
| things I do but it never occured to me during my 20 years in
| systems engineering that k8s offers any kind of great
| abstraction. For small systems it is easy to use docker (for
| example running a database for testing). For larger projects
| there are so many aternatives to k8s that are better,
| including the major cloud vendor offerings that I have really
| a hard time justifying even to consider k8s. After years of
| carnage that they left, seeing failures after failures, even
| customers reaching out to me in panic to help them because
| there are timeouts or other issues that nobody can resolve
| after selling them the idea that k8s has "great level of
| abstraction" and putting it to production.
|
| > I don't care that there are 50 ways to do it, I just like
| having one way.
|
| Seeing everything as a nail...
| cortesoft wrote:
| >> because I really appreciate the level of abstraction it
| supplies
|
| > which are?
|
| When I am creating a new service/application, I just need
| to define in my resource what I need... listening ports,
| persistent storage, CPU, memory, ingress, etc... then I am
| free to change how those are provided without having to
| change the app. If a new, better, storage provider comes
| along, I can switch it out without changing anything on my
| app.
|
| At my work, we have on premise clusters as well as cloud
| clusters, and I can move my workloads between them
| seamlessly. In the cloud, we use EBS backed volumes, but my
| app doesn't need to care. On the on-prem clusters, we use
| longhorn, but again my app doesn't care. In AWS, we use the
| ELB as our ingress, but my app doesn't care... on prem, I
| use metallb, but my app doesn't care.
|
| I just specify that I need a cert and a URL, and each
| cluster is set up to update DNS and get me a cert. I don't
| have to worry about DNS or certs expiring. When I deploy my
| app to a different cluster, that all gets updated
| automatically.
|
| I also get monitoring for free. Prometheus knows how to
| discover my services and gather metrics, no matter where I
| deploy. For log processing, when a new tool comes out, I
| can plug it in with a few lines of configuration.
|
| The kubernetes resource model provides a standard way to
| define my stuff. Other services know how to read that
| resource model and interact with it. If I need something
| different, I can create my own CRD and controller.
|
| I am able to run a database using a cluster controller with
| my on prem cluster without having to manage individual
| nodes. Anyone who has run a database cluster manually knows
| hardware maintenance or failure is a whole thing... with
| controllers and k8s nodes, I just need to use node drain
| and my controller will know how to move the cluster members
| to different nodes. I can update and upgrade the hardware
| without having to do anything special. Hardware patching is
| way easier.
|
| The k8s model forces you to specify how your service should
| handle node failure, and nodes coming in or out are built
| into the model from the beginning. It forces you to think
| about horizontal scaling, failover, and maintenance from
| the beginning, and gives a standard way for it to work.
| When you do a node drain, every single app deployed to the
| cluster knows what to do, and the maintainer doesn't have
| to think about it.
|
| >> I don't care that there are 50 ways to do it, I just
| like having one way.
|
| > Seeing everything as a nail...
|
| I don't think that is a fair comparison, because you can
| create CRDs if your model doesn't fit any existing
| resource. However, even when you create a CRD, it is still
| a standard resource that hooks into all of the k8s
| lifecycle management, and you become part of that
| ecosystem.
| dilyevsky wrote:
| I would think it's more dependent on technology requirements
| more than the size of the team. If all you need is some
| variation of LAMP stack, then you'd probably be better off with
| a paas like render, fly or the like.
| datadeft wrote:
| Same exact question I ask every single time. We just decided
| against k8s, again, in 2024. We are going to go with AWS ECS
| and Azure Container Apps (the infra has to exist in both
| clouds).
|
| ECS and Container Apps provides all the benefits of k8s without
| the cons. What we want is a to be able to execute container
| (Docker) images with autoscaling and control which group of
| instances can talk to each other. What we do not want to do:
|
| - learn all of the error modes of k8s
|
| - learn all the network modes of k8s
|
| - learn the tooling of k8s (and the pitfalls)
|
| - learn how to embed yaml into yaml the right way (I have seen
| some of the tools are doing this)
|
| - do upgrades of k8s and figuring out what has changed the way
| that is backward incompatible
|
| - learn how to manage certificates for k8s the right way
|
| - learn how to debug DNS issues in a distributed system
| (https://github.com/kubernetes/kubernetes/issues/110550 and
| many more)
|
| I could go on and on but many people and companies figured out
| the hard way that k8s complexity is not justified.
| ManBeardPc wrote:
| My experience with Kubernetes has been mostly bad. I always see
| an explosion of complexity and there is something that needs
| fixing all the time. The knowledge required comes on top of the
| existing stack.
|
| Maybe I'm biased and just have the wrong kind of projects, but
| so far everything I encountered could be built with a simple
| tech stack on virtual or native hardware. A reverse
| proxy/webserver, some frontend library/framework, a backend,
| database, maybe some queues/logs/caching solutions on any
| server Linux distribution. Maintenance is minimal, dirt cheap,
| no vendor lock-in and easy to teach. Is everyone building the
| next Amazon/Netflix/Goole and needs to scale to infinity? I
| feel there is such a huge amount of software and companies that
| will never require or benefit from Kubernetes.
| betaby wrote:
| In 2000s we were talking that `snowflake` servers are bad. New
| generation is re-learning the same with k8s, which can be
| summarized as 'snowflake k8s clusters are bad'. Fundamentally
| it's the same problem.
| menschmanfred wrote:
| It's not.
|
| The control plane is ha and you can upgrade one after the other
| independent of your workers.
|
| With workers you can do that too.
|
| That feels much less like a snowflake and more like snow.
| betaby wrote:
| Control panel as HA as your certificates which were expired
| in the article.
| menschmanfred wrote:
| K8s introduced Auto rotation surprisingly late.
|
| Even we run into that issue 5 years ago.
|
| But k8s is still very young and the problem is solved for
| 5-6 years
| dilyevsky wrote:
| are snowflake products bad too?
| zellyn wrote:
| I'm curious: what do you do for developer environments? Do you
| have a need to spin up a partial subgraph of microservices, and
| have them talk to each other while developing against a slice of
| the full stack?
| Moto7451 wrote:
| Can't speak for everyone but I have worked in this environment.
| It can work fine if you allocate a sub slice of CPU time (.1
| CPU for example) and small amounts of (overcommitted) memory,
| and explicitly avoid using it for things that are more easily
| managed by cloud provider sub accounts and managed services. IE
| don't force your devs to manage owncloud or a similar stand in
| for S3 - use something first party to stand in or S3 itself.
|
| This doesn't always work and the failure mode of committing to
| this can be doubling your hosting bill if it won't run locally
| and densely packed small instances can't handle your app.
| sdwr wrote:
| That's how we do it - micro services run locally in tilt,
| pointed at staging services / DB for whatever isn't local.
|
| When it works it's great.
| mieubrisse wrote:
| Can you clarify more about the "when it works"? What are the
| pain points you're seeing?
| jscheel wrote:
| It's worked great for us. Every developer runs a dev cluster on
| their own machines. Services like s3 are transparently replaced
| with mock versions. We have two builds that can be run, which
| really just determines which set of helm charts to deploy: the
| full stack or a lightweight one with just the bare necessities.
| mieubrisse wrote:
| What did you use for mock versions? Localstack?
| dilyevsky wrote:
| I would recommend Tilt + kind clusters (via
| https://github.com/tilt-dev/ctlptl) - minimum headache setup by
| a large margin and runs well on linux _and_ macs
| Bassilisk wrote:
| Not a native English speaker, but when exactly did "lessons" get
| replaced by "learnings"?
|
| To me the latter always sounds very unsophisticated.
| ojbyrne wrote:
| As a native English speaker, in my opinion it's incredibly
| pretentious.
| teaearlgraycold wrote:
| Native English speaker - I refuse to use "learnings". It's a
| ridiculous office-speak word.
| PaulStatezny wrote:
| Same - just like "asks". "Here's the ask" versus "here's the
| request".
| dartos wrote:
| Native English speaker.
|
| I don't think they got replaced. Colloquially they mean the
| same thing.
|
| Maybe learnings sounds a little more casual and lessons more
| academic or formal.
| pmontra wrote:
| Apparently it's a 21st century thing
| https://en.wiktionary.org/wiki/learnings
| OJFord wrote:
| Ha, some time this century for sure. To me it's not
| 'unsophisticated' _exactly_ , but it's definitely a certain
| sort of person - it's the ' _Hi team_ - just sharing some
| _learnings_ - please do _reach out_ if you have any questions '
| sort of corporate speak.
| JasonSage wrote:
| Not to discredit your experience, but I'm a native English
| speaker and I've never had the perception that it's
| unsophisticated. I think they can have a very slightly
| different connotation from one another, but in a lot of usage I
| think they're interchangeable.
| ibejoeb wrote:
| It's corporate-speak. There are all sorts of these things.
|
| Lessons/learnings
|
| Requests/asks
|
| Solutions/solves
|
| Agreement/alignments
|
| It definitely sounds weird if you don't spend a lot of time
| in that world. It's like they replace the actual noun forms
| with an oddly cased verb form, i.e., nominalization.
|
| Oh, one of my most hated:
|
| Thoughts/ideations
|
| Jeez...
| packetlost wrote:
| This is basically saying that it's the opposite of
| "unsophisticated" but instead corporate _formal_ speak.
| throwaway11460 wrote:
| And lessons is academy/state school-speak. Can't stand the
| word. Take your lessons home Ms Teacher, this is a place of
| business.
| danielvaughn wrote:
| I'm a native English speaker and I agree, though it could be a
| regional/cultural thing. It sounds pretty odd to me.
| doctor_eval wrote:
| I'm a native English speaker and don't use the phrase, but I've
| always thought that a _lesson_ is something taught, but a
| _learning_ is something learned. The former does not always
| imply the latter.
| radicalbyte wrote:
| I've always assumed that "learnings" was the American English
| version of "lessons" in English English.
| burkaman wrote:
| I think it's more Corporate English. I've never heard anyone
| say it outside of a work meeting.
| radicalbyte wrote:
| Those are really American though.. like "co-worker", that
| isn't a word which was used in England. We'd use
| "colleague". It came from American English as part of the
| corporate lingo.
| karlshea wrote:
| It's very much just bro corporate speak, if I heard someone use
| "learnings" instead of "lessons" irl they would definitely fall
| into the slot for a specific type of person in my head. Very
| LinkedIn.
| niam wrote:
| I feel like the word unambiguously describes exactly what it
| is, which is all I can really ask for from a word.
|
| "Lesson" by itself might connote a more concrete transmission
| of knowledge (like a school lesson). Which is a meaningful
| distinction if the goal of the article is merely to muse about
| lessons _they 've_ learned rather than imply that this is a
| lesson from the writers to the audience. "Lesson learned" could
| imply the same thing, but is longer to say -\\_(tsu)_/-
|
| I get what the comments here are saying about it sounding
| corporate, but I think this is a unique situation where this
| word actually makes sense.
| youngtaff wrote:
| It's a bloody American thing... Lessons FTW... uses less
| characters too
| louwrentius wrote:
| What I really miss in articles like this - and I understand why
| to some degree - what the actual numbers would be.
|
| Admitting that you need at least two full-time engineers working
| on Kubernetes I wonder how that kind of investment pay's itself
| back, especially because of all the added complexity.
|
| I desperately would like to rebuild their environment on regular
| VMs, maybe not even containerized and understand what the
| infrastructure cost would have been. And what the maintenance
| burden would have been as compared to kubernetes.
|
| Maybe it's not about pure infrastructure cost but about the
| development-to-production pipeline. But still.
|
| These is just so much context that seems relevant to understand
| if an investment in kubernetes is warranted or not.
| karolist wrote:
| k8s is simply a set of bullet proof ideas to run production
| grade services forcing "hope is not a strategy" as much as
| possible, it standardises things like rollouts, rolling
| restarts, canary deployments, failover etc. You can replicate
| it with a zoo of loosely coupled products but a monolith which
| you can hire for with impeccable production record and industry
| certs will always be preferable to orgs. It's Googles way of
| fighting cloud vendor lockin' when they saw they're losing
| market share to AWS. Only large companies need it really, a
| small 5 person startup will do on Digital Ocean VPS just fine
| with some S3 for blob storage and CDN cache.
| jurschreuder wrote:
| This is always my exact thought with k8.
|
| Why not just have auto-scaling servers with a CI/CD pipeline.
|
| Seems so much easier and more convenient.
|
| But I guess developers are just always drawn to complexity.
|
| It's in their nature that's why they became developers in the
| first place.
| biggestlou wrote:
| Can we please put the term "learnings" to rest?
| chunha wrote:
| I don't see the problem with it tbh
| holmb wrote:
| The OP is Swedish.
| geodel wrote:
| I see no problem with leveraging best-of-breed terms like
| learning.
| 0xbadcafebee wrote:
| You aren't a real K8s admin until your self-managed cluster
| crashes hard and you have to spend 3 days trying to
| recover/rebuild it. Just dealing with the certs once they start
| expiring is a nightmare.
|
| To avoid chicken-and-egg, your critical services (Drone, Vault,
| Bind) need to live outside of K8s in something stupid simple,
| like an ASG or a hot/cold EC2 pair.
|
| I've mostly come to think of K8s as a development tool. It makes
| it quick and easy for devs to mock up a software architecture and
| run it anywhere, compared to trying to adopt a single cloud
| vendor's SaaS tools, and giving devs all the Cloud access needed
| to control it. Give them access to a semi-locked-down K8s cluster
| instead and they can build pretty much whatever they need without
| asking anyone for anything.
|
| For production, it's kind of crap, but usable. It doesn't have
| any of the operational intelligence you'd want a resilient
| production system to have, doesn't have real version control,
| isn't immutable, and makes it very hard to identify and fix
| problems. A production alternative to K8s should be much more
| stripped-down, like Fargate, with more useful operational
| features, and other aspects handled by external projects.
| throwboatyface wrote:
| Honestly in this day and age rolling your own k8s cluster is
| negligent. I've worked at multiple companies using EKS, AKS,
| GKE, and we haven't had 10% of the issues I see people
| complaining about.
| jauntywundrkind wrote:
| Once your team has upgrades down, everything is pretty rote.
| This submission (Urbit, lol) seemed particularly incompetent
| at managing cert rotation.
|
| The other capital lesson here? Have backups. The team couldnt
| restore a bunch of their services effectively, cause they
| didn't have the manifests. Sure, a managed provider may have
| less disruptions/avoid some fuckups, but the whole point of
| Kubernetes is Promise Theory, is Desired State Mamagememt. If
| you can re-state your asks, put the manifests back, most shit
| should just work again, easy as that. The team had seemingly
| no operational system so their whole cluster was a vast
| special pet. They fucked up. Don't do that.
| dilyevsky wrote:
| I've picked my fare share of outages on managed k8s
| solutions. The difference there is once it's hosed, your fate
| is 100% in the hands of cloud support and well... good luck
| with that one.
| nvarsj wrote:
| [delayed]
| vundercind wrote:
| In the bad old days of self-managing some servers with a few
| libvirt VMs and such, I'd have considered a 3-day outage such a
| shockingly bad outcome that I'd have totally reconsidered what
| I was doing.
|
| And k8s is supposed to make that situation _better_ , but these
| multi-day outage stories are... common? Why are we adding all
| this complexity and cost of the result is consumer-PC-tower-in-
| a-closet-with-no-IAC uptime (or worse)?
| datadeft wrote:
| This is insane: The Root CA certificate, etcd
| certificate, and API server certificate expired, which
| caused the cluster to stop working and prevented our
| management of it. The support to resolve this, at that time, in
| kube-aws was limited. We brought in an expert, but in
| the end, we had to rebuild the entire cluster from
| scratch.
|
| I can't even imagine how I could explain any of my customers such
| an outage.
| bdangubic wrote:
| "us-east-1 was down" :)
| datadeft wrote:
| If most infra I worked on was a single region one, sure. :)
| DR is so much easier in the cloud. You can have ECS scale to
| 0 in the DR site and when us-east-1 goes down just move the
| traffic there. We did that with amazon.com before AWS even
| existed. With AWS it became easier. There are still some
| challenges, like having a replica of the main SQL db if you
| run a traditional stack for example.
| dilyevsky wrote:
| Just in last couple of years I can recall DataDog being down
| for most of the day and Roblox took something like 72h outage.
| If huge public companies managed, you probably can too. I'd
| argue that unless real monetary damage was done it's actually
| worse for the customer to experience many small-scale outages
| than a very occasional big outage.
| geodel wrote:
| Well the industry analysts and consultants who develop
| _metrics_ have decided that multiple outages is the way to go
| as it keeps people on toes more often. And management likes
| busy people as they are earning their keep.
| dilyevsky wrote:
| > During our self-managed time on AWS, we experienced a massive
| cluster crash that resulted in the majority of our systems and
| products going down. The Root CA certificate, etcd certificate,
| and API server certificate expired, which caused the cluster to
| stop working and prevented our management of it. The support to
| resolve this, at that time, in kube-aws was limited. We brought
| in an expert, but in the end, we had to rebuild the entire
| cluster from scratch.
|
| That's crazy, I've personally recovered 1.11-ish kops clusters
| from this exact fault and it's not that hard when you really
| understand how it works. Sounds like a case of bad "expert"
| advice.
| therealfiona wrote:
| If anyone has any tips on keeping up with control plane upgrades,
| please share them. We're having trouble keeping up with EKS
| upgrades. But, I think it's self-inflicted and we've got a lot of
| work to remove the knives that keep us from moving faster.
|
| Things on my team's todo list (aka: correct the sins that
| occurred before therealfiona was hired):
|
| - Change manifest files over to Helm. (Managing thousands of
| lines of yaml sucks, don't do it, use Helm or similar that we
| have not discovered yet.) - Setup Renovate to help keep Helm
| chart versions up to date. - Continue improving our process
| because there was none as of 2 years ago.
| throwboatyface wrote:
| IME EKS version upgrades are pretty painless - AWS has a tool
| that tells you if any of your resources would be affected by an
| upcoming change even.
| raffraffraff wrote:
| It's not the EKS upgrade part that's a pain, it's the
| deprecated K8S resources that you mention. Layers of
| terraform, fluxcd, helm charts getting sifted through and
| upgraded before the EKS upgrade. You get all your clusters
| safely upgraded, and in the blink of an eye you have to do it
| all over again.
| catherinecodes wrote:
| This is definitely a hard problem.
|
| One technique is to never upgrade clusters. Instead, create a
| new cluster, apply manifests, then point your DNS or load
| balancers to the new one.
|
| That technique won't work with every kind of architecture, but
| it works with those that are designed with the "immutable
| infrastructure" approach in mind.
|
| There's a good comment in this thread about not having your
| essential services like vault inside of kubernetes.
| jauntywundrkind wrote:
| This indeed seems like _The Way_ but I have no idea how it
| works when storage is involved. How do Rook or any other
| storage providers deal with this?
|
| If Kubernetes is _only_ for stateless services, well, that 's
| much less useful for the org to invest in.
| doctor_eval wrote:
| I'm in the very unusual situation of being tasked to set up a
| self-sufficient, local development team for a significant
| national enterprise in a developing country. We don't have AWS,
| Google or any other cloud service here, so getting something
| running locally, that they can deploy code to, is part of my job.
| I also want to ensure that my local team is learning about modern
| engineering environments. And there is a large mix of unrelated
| applications to build, so a monolith of some sort is out of the
| question; there will be a mix of applications and languages and
| different reliability requirements.
|
| In a nutshell, I'm looking for a general way to provide compute
| and storage to future, modern, applications and engineers, while
| at the same time training them to manage this themselves. It's a
| medium-long term thing. The scale is already there - one of our
| goals is to replace an application with millions of existing
| users.
|
| Importantly, the company wants us to be self sufficient. So a
| RedHat contract to manage an OpenShift cluster won't fly
| (although maybe openshift itself will?)
|
| For the specific goals that we have, the broad features of
| Kubernetes fit the bill - in terms of our ability to launch a set
| of containers or features into a cluster, run CICD, run tests,
| provide storage, host long- and short lived applications, etc.
| But I'm worried about the complexity and durability of such a
| complex system in our environment - in the medium term, they need
| to be able to do this without me, that's the whole point. This
| article hasn't helped me feel better about k8s!
|
| I personally avoided using k8s until the managed flavours came
| about, and I'm really concerned about the complexity of deploying
| this, but I think some kind of cluster management system is
| critical; I don't want us to go back to manually installing
| software on individual machines (using either packaging or just
| plain docker). I want there to be a bunch of resources that we
| can consume or grow as we become more proficient.
|
| I've previously used Nomad in production, which was much simpler
| than K8s, and I was wondering if this or something else might be
| a better choice? How hard is k8s to set up _today_? What is the
| risk of the kind of failures these guys hit, _today_?
|
| Are there any other environments where I can manage a set of
| applications on a cluster of say 10 compute VMs? Any other
| suggestions?
|
| Without knowing a lot about their systems, I suspect something
| like Oxide might be the best bet for us - but I doubt we have the
| budget for a machine like that. But any other thoughts or ideas
| would be welcome.
| geodel wrote:
| Well Amazon CEO himself said, there is no shortcut to
| experience. I am sure gaining experience in developing
| infrastructure solution will give you respectable return in
| long term. Of course Cloud vendors will be happy to sell
| turnkey solutions to you though.
| steveklabnik wrote:
| (I work at Oxide)
|
| > I doubt we have the budget for a machine like that.
|
| Before even thinking about budget,
|
| > for a significant national enterprise in a developing
| country.
|
| I suspect we just aren't ready to sell in your country,
| whatever it is, for very normal "gotta get the product
| certified through safety regulations" kinds of reasons. We will
| get there eventually.
|
| buuuuut also,
|
| > Are there any other environments where I can manage a set of
| applications on a cluster of say 10 compute VMs? Any other
| suggestions?
|
| Oxide would give you those VMs, but if you want orchestration
| with them, you'd be running kubes or whatever else, yourself,
| on top of it. So I don't think our current product would give
| you exactly what you want anyway, or at least, you'd be in the
| same spot you are now with regards to the orchestration layer.
| aguacaterojo wrote:
| Very similar story for my team, incl. the 2x cert expiry cluster
| disasters early on requiring a rebuild. We migrated from
| Kubespray to kOPs (with almost no deviations from a default
| install) and it's been quite smooth for 4 or 5 years now.
|
| I traded ELK for Clickhouse & we use Fluentbit to relay logs,
| mostly created by our homegrown opentelemetry-like lib. We still
| use Helm, Quay & Drone.
|
| Software architecture is mostly stateless replicas of ~12x mini
| services with a primary monolith. DBs etc sit off cluster. Full
| cluster rebuild and switchover takes about 60min-90min, we do it
| about 1-2x a year and have 3 developers in a team of 5 that can
| do it (thanks to good documentation, automation and keeping our
| use simple).
|
| We have a single cloud dev environment, local dev is just running
| the parts of the system you need to affect.
|
| Some tradeoffs and yes burned time to get there, but it's great.
___________________________________________________________________
(page generated 2024-02-06 23:00 UTC)