[HN Gopher] Treat Kubernetes clusters as cattle, not pets
___________________________________________________________________
Treat Kubernetes clusters as cattle, not pets
Author : mffap
Score : 110 points
Date : 2021-06-30 13:24 UTC (9 hours ago)
(HTM) web link (zitadel.ch)
(TXT) w3m dump (zitadel.ch)
| mrweasel wrote:
| While I don't disagree, this is also a reminder that debugging
| Kubernetes can be terribly complicated.
| ulzeraj wrote:
| I want to go back in time when naming your servers as X-men
| characters or Dune houses was a thing. I'm not a big fan of this
| brave new DevOps world.
| exdsq wrote:
| My first job, tech support, had fantasy server names. That was
| fun :)
| psanford wrote:
| > It means that we'd rather just replace a "sick" instance by a
| new healthy one than taking it to a doctor.
|
| This analogy really bothers me. Cattle are expensive. They are an
| investment. You don't put down an investment just because it got
| sick.
|
| If you have a sick cow you will in-fact call your local large
| animal vet to come and treat it.
| ska wrote:
| Yes that statement is wrong. I guess the real difference in the
| pet/cow calculus is for cattle you probably won't pay more for
| treatment than the cow is worth; with pets people do this all
| the time.
| psanford wrote:
| Yup. If the argument is "don't fall in love with your
| servers", I'm in agreement.
|
| However, the idea that whenever anything weird happens you
| should just kill you server/cluster and move on without doing
| any sort of investigation seems like a recipe for disaster.
| That's a great way to mask bugs that may in-fact be systemic
| in nature, where they are or will eventually cause service
| degradation for your customers.
|
| I would hate to work in an environment where bugs are ignored
| and worked around instead of understood and fixed.
| lamontcg wrote:
| I like the idea of killing it and moving on without paging
| someone at 2AM in the middle of the night.
|
| Ideally that goes into an async queue of issues though and
| someone finds the root cause and that goes into a queue of
| issues to fix, which is actually burned down.
|
| I suspect what is happening more often is that the whole
| stack has so many levels and the SREs responsible for it
| all don't have the visibility into the stack they need to
| debug it all, so they use their SLOs as a club to ignore
| issues as long as they're meeting their metrics until it
| becomes a firefighting drill.
|
| A pile of cargo culted best practices and SLOs replacing
| hands on debugging.
| pyuser583 wrote:
| Don't let facts get in the way of a good metaphor.
|
| There's nothing remotely disgusting about making sausages, but
| people say "Laws are like sausages - it's better not to see
| them being made."
|
| I stopped correcting them long ago.
| coding123 wrote:
| AWS in my mind can quickly lose the kubernetes war amongst cloud
| providers. This is every cloud providers chance: EKS on AWS is so
| damn tied into a bunch of other AWS products that it's literally
| impossible to just delete a cluster now. I tried. It's tied into
| VPCs and Subnets and EC2 and Load balancers and a bunch of other
| products that no longer makes sense now that K8s won.
|
| In my opinion it needs to be re-engineered completely into a
| super slim product that is not tied to all these crazy things.
| ffo wrote:
| You mean EKS needs re-engineering?
| kinghajj wrote:
| Also the fact that a new EKS cluster takes at least 20 minutes
| to come up and be ready, makes AWS' offering the weakest among
| the Big Three cloud vendors.
| jen20 wrote:
| This isn't true. I've provisioned 53 EKS clusters this week,
| and every one of them has been up in under 11 minutes with
| all of the accoutrements. I understand it has been
| substantially slower in the past.
| nickjj wrote:
| 11 minutes is still a long time IMO.
|
| You can spin up a local multi-node cluster using kind[0] in
| 1 minute on 6+ year old hardware. I know it's not the same
| but I really have to imagine there's ways to speed this up
| on the cloud. I haven't spun up a cluster on DO or Linode
| in a while, does anyone know how long it takes there for a
| comparison?
|
| [0]: https://kind.sigs.k8s.io/
| mlnj wrote:
| I think it is unfair to compare KinD with an actual
| Kubernetes cluster which comes with Load Balancers,
| External IP addresses etc.
|
| My Terraform scripts get a HA K3s cluster in Google Cloud
| VMs in less than 9 minutes, which in my opinion is
| fantastic.
| misiti3780 wrote:
| Are you using terraform ?
| hodgesrm wrote:
| It really depends on the use case. We use Kubernetes for
| hosting managed data warehouses. EKS cluster spin-up time is
| a one-time cost to start a new environment for data
| warehouses. It's insignificant compared to the time required
| to load data. Other issues like auto-scaling performance are
| more important, at least in our case.
| hughrr wrote:
| Also the base cluster cost of $0.10 an hour really hurts for
| test and experimentation.
| jrockway wrote:
| GCP charges the same now (though your first one is free). I
| don't think it's an unreasonable cost, but it's certainly
| not competitive with places like Digital Ocean which are
| still free.
| atmosx wrote:
| Not sure. I'm using DigitalOcean Managed Version and have
| experience with EKS. There are pros and cons on both sides.
| What I like about EKS is support for API and k8s logs
| (accounting / security), coupled with CloudTrail and IAM
| integration for RBAC/user management is a bliss.
| misiti3780 wrote:
| I am about to move our infrastructure from ECS to EKS, what
| other AWS products is EKS tied to?
| k__ wrote:
| I don't have the impression AWS even sees it as a war worth
| fighting.
|
| And I can't blame them.
| unfunco wrote:
| It's not literally impossible to delete a cluster, I do this
| many times daily fully automated with no issues, and an EKS
| cluster is not tied to a VPC or subnets, you can spin them up
| independently, and you can delete your clusters without
| affecting the VPC or subnets in any way.
|
| An Ingress or a service of type LoadBalancer will create a load
| balancer in AWS that's tied to your cluster, but that's the
| whole point of Kubernetes, it'll spin up the equivalent
| resources in Azure or GCP or DigitalOcean.
| aliswe wrote:
| > ... the GitOps pattern is gaining adoption for Kubernetes
| workload deployment.
|
| Is it really though? I for one am glad I didnt jump on the
| bandwagon early. A lot of the articles popping up nowadays
| mentioning the downsides of GitOps make a lot of sense.
| tw600040 wrote:
| This idea that cattles are meant to be slaughtered and can't be
| pets is extremely offensive.
| sweetheart wrote:
| Preach. It's a metaphor that enforces harmful beliefs, like
| master/slave terminology.
|
| Our language defines our world, so we should aim to shy away
| from language that amplifies a harmful message, even in
| radically different contexts.
| zachrose wrote:
| I'd like to see a vegan alternative to the pets/cattle meme.
| Zababa wrote:
| Fungible vs non-fungible seem to be the concept hiding
| behind pets/cattle. You can substitute a dollar for
| another, you can't substitute your favorite rock for
| another.
| sweetheart wrote:
| Screen printing, not painting.
| mfer wrote:
| If I've read the Google papers on borg right (Kubernetes is
| conceptually borg v3 with omega being v2) this is different from
| how Google runs the things.
|
| They'll do warehouse scale computing with borg operating large
| clusters. borg is at the bottom.
|
| The workloads spanning dev, test, and prod then run on these
| clusters. By having large clusters with lots of things running on
| them they get high utilization of the hardware and need less
| hardware.
|
| It's amusing to see k8s used in such a different way and one that
| often uses a lot more hardware while driving up costs. Concepts
| Google used to lower the cost.
|
| Or, maybe I read the papers and book wrong.
|
| I like the idea of higher utilization and better efficiency
| because it uses less resources which is more green.
| closeparen wrote:
| Exactly. We practice this with Mesos. The point is that a
| central infrastructure team maintains the (regional) clusters
| and provides an interface for application teams to submit the
| services. Each application team maintaining its own cluster or
| dealing directly with the full power of the cluster scheduler
| ecosystem would defeat the purpose.
| atmosx wrote:
| > It's amusing to see k8s used in such a different way and one
| that often uses a lot more hardware while driving up costs.
|
| To be fair, google runs its own datacenters with teams doing
| research & optimisations on all levels of the stack: hardware
| and software and most importantly an amount of engineering
| resources converging to infinity.
|
| The rest are stuck with VMs that share network interfaces and
| have to monitor CPU steal, understand complex pricing models,
| etc. Engineering resources are scarce so most companies will
| over-provision just to be safe and because profiling the
| application and fixing that API call that takes too long is
| expensive, will spin-up another 50 pods.
| bambambazooka wrote:
| Could you please name the papers and the book?
| q3k wrote:
| Not GP, but:
|
| https://research.google/pubs/pub43438/
|
| https://sre.google/sre-book/production-environment/
|
| are the best public documentation entrypoints that I know of.
| q3k wrote:
| > Or, maybe I read the papers and book wrong.
|
| No, that's exactly how it works. You have clusters spanning a
| datacenter failure domain (~= an AZ), and everything from prod
| to dev workloads runs on there, with low priority batch jobs
| bringing up the resource utilization to a sensible level.
|
| You can do the same thing with k8s, you just have to trust its
| multitenancy support. You have RBAC, priority, quotas,
| preemption, pod security policies, network policies... Use
| them. You can even force some workloads to use gVisor or
| separate prod and dev workloads on different worker machines.
| conradev wrote:
| I've also see Kubernetes being used this way, with one cluster
| per data center for company-wide utilization with segmentation
| being at the _namespace_ layer. The priority class system is
| heavily utilized to make sure production workloads are always
| running, and other workloads are pre-empted as needed.
| pyuser583 wrote:
| Sounds like a Mesos-style arrangement.
| fulafel wrote:
| How do they do version upgrades, isn't that the traditional
| Achilles heel of K8s that leads people to want to frequently
| recreate clusters from scratch and/or do blue/green?
| mfer wrote:
| First, if I understand it right... Google does some smart
| things in upgrades. They do things like tests and then
| upgrade their equivalent to AZs in a data center. I'm sure
| there have been upgrades gone bad that they've had to fix.
|
| Kubernetes can be upgraded. I've watched nicely done upgrades
| happening 4 or 5 years ago. I've watched simple upgrades
| happen more recently. It's not unheard of. Even in public
| clouds I've upgrade Kubernetes through many minor versions
| without issue.
|
| I would argue it's more work to create more clusters. You
| need to migrate workloads and anything pointing to them. It
| also would cost more as you have to run more hardware.
| [deleted]
| dharmab wrote:
| We have active clusters that have been continually updated
| since 1.13 or so.
|
| Of course it is the cluster of Theseus since every single
| bit of compute has been entirely replaced :)
| moondev wrote:
| Highly available kubeadm clusters are designed to be upgraded
| in place. The Kubernetes api-server is also designed to
| function with a minor version skew (for example v1.19.x and
| v1.20.x) that would happen during an in-place upgrade.
|
| Cluster API takes the above and can in-place upgrade clusters
| for you. It's pretty awesome to see first hand. Bumping the
| Kubernetes version string and machine image reference can
| upgrade and even swap distros of a running cluster with no
| down-time.
| dharmab wrote:
| One caveat is that downgrading the apiserver is not
| guaranteed to be possible, since the schema of some types
| in the API may have been migrated to newer versions in the
| etcd database that the previous version may not be able to
| read. There are tools such as Velero (https://velero.io)
| which can restore a previous snapshot, but you will likely
| incur downtime and lose any changes since the snapshot.
| lokar wrote:
| Nothing important is single homed to one cluster. Much
| emphasis is placed on preventing correlated failed of
| clusters.
| jeffbee wrote:
| The borg concepts and interfaces have been the same for ages.
| The borglet and borgmaster get released and pushed out very
| frequently (daily or weekly) and it doesn't break anything
| among the workloads. There are not maintenance windows for
| these changes because containers can run while the borglet
| restarts, and borgmasters are protocol/format-compatible so
| the new release binary simply joins the quorum. Machines also
| get rotated out of the borg cell regularly for kernel
| upgrades, way more often than I've seen outside of Google.
|
| I think an important thing to know about K8s is Omega failed
| to replace Borg, and then the Omega people created K8s. So
| K8s does not necessarily descend from Borg, and not all of
| Borg's desirable attributes made it into K8s.
| dekhn wrote:
| The omega folks were on the borgmaster and borglet teams
| when they were building omega (I was on the borg team at
| the same time, but working on a different project). It's
| fair to say that k8s is an intellectual inheritor of the
| parts of borg that are required for it to be useful on the
| outside.
|
| Borglet/borgmaster releases definitely break workloads. I
| recall one where something was rolled out to all the test
| clusters, passed all the tests (except one of ours) and was
| about to be promoted to increasing percents of the prod
| clusters. Whilst debugging why our test (which is not part
| of the feature rollout) broke we realize that if this had
| rolled out to prod, it would have broken all Tensorflow
| jobs, and would have been a major OMG.
|
| But yeah, most of the time, the release process for borglet
| and borgmaster is fairly fast and fairly reliable.
| jeffbee wrote:
| > required for it to be useful on the outside.
|
| A big part of this is that outside Google, the number of
| people who have to _operate_ k8s as a fraction of all k8s
| users is way higher than the fraction of borg users who
| have to operate borg, so there 's a lot of stuff in k8s
| that is 'end user experience' comforts and affordances
| for operators.
| q3k wrote:
| If you don't depend on clusters being 100% available all the
| time and design your applications to handle cluster-wide
| outages (which you need to do at Google scale anyway), then
| simply doing progressive rollouts across clusters is good
| enough. If a cluster rollout fails and knocks some workloads
| offline, so be it. Just revert the rollout and figure out
| what went wrong.
|
| You can also throw in some smaller clusters that replicate
| the 'main' clusters' software stack and have some workloads
| whose downtime does not impact production jobs (not just dev
| workloads, but replicas of prod workloads!). These can be the
| earliest in your rollout pipeline and serve as an early stage
| canary.
| lifty wrote:
| You're right, vanilla Kubernetes has a level of complexity that
| starts paying off at a certain cluster level. But the wide
| adoption of K8s also shows that people love the standardization
| and API it offers for orchestrating workloads even if they
| don't take advantage of its scaling capabilities.
|
| My hope is that projects like k3s will manage to cover that
| small scale spectrum of the market.
| void_mint wrote:
| I've intentionally ignored Kubernetes for a few years. Is it
| worth looking into K3s instead?
| antonvs wrote:
| It depends what you're looking for.
|
| One attraction of k3s is you can get a "real" production-
| grade K8s running on a single machine, or a small cluster,
| very quickly and easily. That can be great for learning,
| development and testing, certainly.
|
| K3s is actually targeted at "edge" scenarios where you
| aren't running in a cloud, and don't have the ability
| and/or desire to dynamically adjust the cluster size.
|
| You can scale a k3s cluster easily enough manually, adding
| or deleting nodes, but to get cluster autoscaling you'd
| need to do environment-dependent work to make it automatic.
|
| If you do want that, then things will likely be quite a bit
| easier with a managed cluster like Google's GKE, or even
| possibly an self-installed cluster using a tool like kops,
| kubespray etc. (It's a while since I've used any of those,
| so not sure which is best.) AWS EKS is another choice,
| although its setup is a bit more complex and I wouldn't
| really recommend it to someone getting started.
|
| And for production scenarios, integration with cloud load
| balancers, network environment etc. is similarly going to
| be easier with a provider-managed cluster. It's all
| possible to do with k3s, but it's more work and a steeper
| learning curve.
| koeng wrote:
| I've really enjoyed developing on K3s, would recommend.
| andrewstuart2 wrote:
| And furthermore, the primary problem with scaling out the
| number of clusters is that it hamstrings one of the primary
| value propositions of kubernetes: increased utilization. The
| scheduler can't do its job without yet another scheduler on top
| if you spread your workloads outside of their sphere of
| influence.
| deeblering4 wrote:
| All this does is make me want to go vegan and avoid maintaining
| the entire k8s farm.
|
| Truly, if your software team headcount is under 500 why are you
| running k8s?
| exdsq wrote:
| I'm in a team of 1 but using it because my product is based on
| spinning up dedicated services for users on demand, so it works
| well
| mschuster91 wrote:
| It gives you automatic failover and decent-ish (at least when
| coupled with Rancher, naked k8s is nuts) management compared to
| a couple of manually (or Puppet) managed servers.
|
| A well implemented mini cluster can and will save you so much
| time in later maintenance and deployments.
| LAC-Tech wrote:
| Raising cattle is a lot of work. You have to weigh them
| regularly, apply treatments for intestinal worms, for lice, move
| them from pasture to pasture so they don't overgraze. It's a
| fulltime job.
|
| Also if a cow dies, people don't just buy a new one. It
| represents quite a loss of profit. Also represents a big
| potential problem on the farm that people will want to resolve -
| they're you're money makers, if they're dying it's an issue.
| [deleted]
| [deleted]
| rllin wrote:
| really you should treat your entire cloud deployment (sans state
| where impossible) as cattle
| jeffbee wrote:
| Running a separate cluster for every service assures high
| overhead and poor utilization. Fine if you can afford it, but be
| aware that you are paying it.
| q3k wrote:
| Yeah, especially in production bare metal clusters. If you want
| N+2 redundancy that's at least 5 physical machines for just the
| control plane (etcd & apiservers), more if you don't want to
| colocate worker nodes with that...
|
| Even if you have full bare metal automation and thousands of
| machines that seems like unnecessary waste.
| dharmab wrote:
| Add the cost of administrative services: DNS resolvers,
| ingress controllers, log forwarders, monitoring (e.g.
| Prometheus, some exporters), autoscalers, tracing
| infrastructure, admission controllers, backups/disaster
| recovery tools (e.g. Velero)...
|
| It can add up to millions per year if you aren't auditing
| your costs.
| dijit wrote:
| Yeah, that's insane as a concept. One of the larger selling
| points of Kubernetes was bin packing. Removing that selling
| point leaves you with...
|
| * Orchestration of jobs (restart, start); this can be achieved
| easily without the complexity of k8s
|
| * Sidecar loading; Literally the easiest thing to do with
| normal VMs.
|
| and...?
| dolni wrote:
| Packing everything together is a "selling point" until you
| find that a service can fill up ephemeral storage and take
| down other services, or consume bandwidth without limit.
|
| Let's not forget the potential security implications of not
| keeping things properly isolated.
|
| People who were around when provisioning on bare-metal was
| still a thing already learned all these lessons. Somehow it
| seems they have been forgotten by all the people driving hype
| around Kubernetes.
| q3k wrote:
| > Packing everything together is a "selling point" until
| you find that a service can fill up ephemeral storage and
| take down other services, or consume bandwidth without
| limit.
|
| Ephemeral storage has resource requests/limits in pods.
|
| Traffic shaping/limiting can be accomplished using
| kubernetes.io/{ingress,egress}-bandwidth annotations. It's
| not as nice as resources (because there's not quotas and
| capacity planning, and it's generally very simplistic) but
| you can still easily build on this.
|
| Pods can also have priorities and higher priority workloads
| can and will preempt lower priority workloads.
|
| > Let's not forget the potential security implications of
| not keeping things properly isolated.
|
| For hardware isolation isolation, you can use gVisor or
| even Kata containers.
|
| > People who were around when provisioning on bare-metal
| was still a thing already learned all these lessons.
| Somehow it seems they have been forgotten by all the people
| driving hype around Kubernetes.
|
| Kubernetes explicitly aims to solve resource isolation. It
| was built by people who have decades of experience solving
| this exact problem in production, on bare metal, at scale.
| Effectively, Kubernetes resource isolation is one of the
| best solutions out there to easily, predictably and
| strongly isolate resource between workloads _and_ maximize
| utilization at the same time.
| packetlost wrote:
| Does ephemeral storage not have configurable limits? That
| seems like quite the oversight if not.
| dijit wrote:
| Kubernetes has the concept of limits, especially on
| ephemeral storage; additionally: if your node becomes
| unhealthy then the workloads would be rescheduled on
| another node.
|
| I'm super not hypey about kubernetes, mostly because the
| complexity surrounding networking is opaque and built on a
| foundation of sand... But let's not argue things that
| aren't true.
| dolni wrote:
| > Kubernetes has the concept of limits, especially on
| ephemeral storage
|
| So... why are these issues open?
|
| https://github.com/kubernetes/enhancements/issues/1029
| https://github.com/kubernetes/enhancements/issues/361
| https://github.com/kubernetes/kubernetes/issues/54384
|
| > additionally: if your node becomes unhealthy then the
| workloads would be rescheduled on another node.
|
| Well of course, but you're going to run into that issue
| (likely) on all of the nodes where the offending service
| lives.
|
| > But let's not argue things that aren't true.
|
| If what I've said is untrue, looking at open GitHub
| issues and the Kubernetes documentation is certainly no
| indication. That's a massive problem all by itself.
| q3k wrote:
| The first issue you've linked concerns quota support for
| ephemeral storage requests/limits - which is not about
| the limits themselves, but the ability to set a limit
| quotas per tenant/namespace. Eg., team A cannot use a
| total of more than 100G ephemeral storage in total in the
| cluster. EDIT: No, sorry, it's about using underlying
| filesystem quotas for limiting ephemeral storage, vs. the
| current implementation, see the third point below. Also
| see KEP: https://github.com/kubernetes/enhancements/tree/
| master/keps/...
|
| The second is a tracking issue for a KEP that has been
| implemented but is still in alpha/beta. This will be
| closed when all the related features are stable. There's
| also some discussion about related functionality that
| might be added as part of this KEP/design.
|
| The third issue is about integrating Docker storage
| quotas with Kubernetes ephemeral quotas - ie.,
| translating ephemeral storage limits into disk quotas
| (which would result in -ENOSPC to workloads), vs. the
| standard kubelet implementation which just kills/evicts
| workloads that run past their limit.
|
| I agree these are difficult to understand if you're not
| familiar with the k8s development/design process. I also
| had to spend a few minutes on each one of them to
| understand what the actual state of the issues is.
| However, they're in a development issue tracker, and the
| end-user k8s documentation clearly states that Ephemeral
| Storage requests/limits works, how it works, and what its
| limitations are:
| https://kubernetes.io/docs/concepts/configuration/manage-
| res...
| fshbbdssbbgdd wrote:
| Wait till you hear that AWS is renting you instances that
| are running on the same metal as other customers.
| dolni wrote:
| You seem to be asserting that the ability of Linux
| containers to isolate workloads is on par with virtual
| machines.
|
| That's just not the case.
| Spooky23 wrote:
| Remember in big companies the internal politics rule the day.
| It's cheaper to buy more computers than to become the
| overlord of computing.
| ffo wrote:
| You don't exactly need to run a cluster per service ;-) Instead
| you can choose to collocate services who belong together and
| form a ,,domain". But don't go the route and build the almighty
| one Kubernetes cluster where all your domains run in one single
| place.
| eliodorro wrote:
| Decreases utilization but also decreases coordination between
| teams (no man-bear-pigs). Also weight the long-term costs of
| poorly maintained platflorms and infrastructure in desaster
| cases, security issues or when migrating to other providers.
|
| High overhead can be automated away, google ORBOS.
| yongjik wrote:
| At this point, why not just drive it up to the logical
| conclusion? Treat your business model as cattle, not pets.
| Customers leaving? Fire up another business until capital runs
| out, and if it does, no worries, jut hop to another job!
|
| Sorry, but I feel like I landed in crazy-land. Kubernetes is
| already an exercise in how many layers you can insert with nobody
| understanding the whole picture. Ostensibly, it's so that you can
| _isolate_ those fucking jobs so that different teams can run
| different tasks in the same cluster without interfering with each
| other. Hence namespaces, services, resource requirements, port
| translation, autoscalers, and all those yaml files.
|
| It boggles my mind that people look at something like Kubernetes
| and decide "You know what? We need more layers. On top of this."
| tyingq wrote:
| _" You know what? We need more layers. On top of this."_
|
| Heh. Service Mesh!
| rantwasp wrote:
| K8s on K8s! K64s!
| voidfunc wrote:
| But can it run on an N64?
| yongjik wrote:
| Kuberneteuberneteuberneteuberneteuberneteuberneteuberneteub
| ernetes is a perfect name for the beautiful system you are
| about to architect: it's an ancient Greek word for
| helmsman-man-man-man-man-man-man-man, who is in charge of
| helmsman-man-man-man-man-man-man, who is in charge of ...
|
| (...sorry =_=)
| spondyl wrote:
| I mean, depending on the business, employees aren't trusted to
| understand the whole picture regardless. Many employees at
| traditional business don't even have administrator access on
| their laptops, depending on their position so it's not
| logically inconsistent with how things seem to operate. With
| that lack of big picture overview, it makes things hard to
| scrutinise since you're only seeing one sliver of an
| implementation (ie; "It saves money" without seeing, let alone
| understanding the technical requirements and vice versa)
| aynyc wrote:
| Limited admin practice isn't just to save money. It's a
| security practice and it's a good one. In a large business,
| no one understand the whole business maybe except legal
| department and CEO.
| MuffinFlavored wrote:
| > In a large business, no one understand the whole business
| maybe except legal department and CEO.
|
| lol... what?
|
| A CEO of a 20,000+ person company literally sees projects
| as revenue sources and a deadline, nothing more.
|
| To get it delivered, he/she walks down the chain of
| managers until they gets answers. It's might as well be a
| black hole.
| rq1 wrote:
| > It means that we'd rather just replace a "sick" instance by a
| new healthy one than taking it to a doctor.
|
| Oh god! Please treat your cattle better!
| tnisonoff wrote:
| When I worked at Asana, we created a small framework that allowed
| for blue-green deployments of Kubernetes Clusters (and the apps
| that lived on top of them) called KubeApps[0].
|
| It worked out great for us -- upgrading Kubernetes was easy and
| testable, never worried about code drift, etc.
|
| [0] https://blog.asana.com/2021/02/kubernetes-at-asana/ (Not
| written by me).
| AtNightWeCode wrote:
| Resource utilization is the main reason I would run a cluster in
| the first place. Immutable infrastructure is also expensive to
| build and maintain.
| tnisonoff wrote:
| For larger companies, I think a huge benefit of Kubernetes is
| the shared language for defining and operating services, as
| well as the well-thought-out abstractions for how these
| services interact.
|
| Costs are generally less of a concern, but having one way of
| running, operating, and writing services allowed our dev team
| to move faster, share knowledge, etc.
| ffo wrote:
| True, cost is not the biggest issue. Separation of teams with
| different velocity and needs on the other hand is.
|
| One API as abstraction with shared processes eases the pain
| for the people relying on a platform.
| jzelinskie wrote:
| I like the idea of "Building on Quicksand" as the analogy for
| Distributed Systems, but also maintaining your software
| dependencies. This article basically recommends trying to
| minimize your dependencies to keep reproducibility/portability
| high. I generally agree with this, but also carry an "all things
| within reason" mentality. But just as the article describes
| coworkers growing into their cluster, the complexity of what they
| run in their cluster will also grow over time and eventually
| they'll realize they've just built up their own "distribution". A
| few years ago, I've written a post asking people to think
| critically when they hear someone mention "Vanilla"
| Kubernetes[0].
|
| The real problem they suffered is actually that Kubernetes isn't
| fundamentally designed for multi-tenancy. Instead, you're forced
| to make separate clusters to isolate different domains. Google
| themselves run multiple Borg clusters to isolate different
| domains, so it's natural that Kubernetes end up with a similar
| design.
|
| [0]: https://jzelinskie.com/posts/youre-not-running-vanilla-
| kuber...
|
| Disclosure: I worked as an engineer and product manager on CoreOS
| Tectonic, the (now defunct) Kubernetes used in the post.
| GauntletWizard wrote:
| You're just wrong, unfortunately - Google runs dev and test and
| prod all on the same clusters. Kubernetes multi-tenancy works
| just fine, but the conventional definition of multi-tenancy
| includes things like "network isolation" that are misguided.
| Multi-tenancy should be set up (and is within Google) by
| understanding what is and isn't shared with the environment,
| and through cryptographic assertion of who you're speaking to.
| If you want to see the latter part nicely integrated, come to a
| SIGAUTH meeting and help me argue for it
| jzelinskie wrote:
| All I said was that they use separate clusters to "isolate
| domains" which is a pretty vague description on purpose -- I
| did not intend to claim they do it for different deployment
| environments as you've described.
|
| It's fairly subjective what types of isolation define "multi-
| tenancy" which is why there hasn't been progress made despite
| SIG and WG efforts in the past. While you do not believe
| network isolation should be included, there are plenty of
| developers working on OpenShift Online which may disagree.
| OSO lets anyone on the internet sign up for free and
| instantly be given their own namespace on a shared cluster
| full of untrusted actors.
| q3k wrote:
| > If you want to see the latter part nicely integrated, come
| to a SIGAUTH meeting and help me argue for it.
|
| Invitation accepted! :) I've been dying to see an ALTS-like
| [1] thing that works with Kubernetes. I really should be able
| to talk encrypted and authenticated gRPC-to-gRPC without ever
| having to set up secrets or manually provision certificates,
| dammit.
|
| [1] - https://cloud.google.com/security/encryption-in-
| transit/appl...
| ffo wrote:
| Hey we used tectonic ;-) was a great tool at that time.
| Tectonic did influence some of the concepts around ORBOS. Just
| think of Tectonic combined with GitOps, minus the iPXE part.
|
| Disclaimer: I am working with ORBOS
___________________________________________________________________
(page generated 2021-06-30 23:00 UTC)