https://lwn.net/SubscriberLink/905164/e1f4d4c1ce35f8b9/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Distributions + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account The container orchestrator landscape [LWN subscriber-only content] Welcome to LWN.net Free trial subscription The following subscription-only Try LWN for free for 1 content has been made available to month: no payment or you by an LWN subscriber. Thousands credit card required. of subscribers depend on LWN for Activate your trial the best news from the Linux and subscription now and see free software communities. If you why thousands of readers enjoy this article, please consider subscribe to LWN.net. accepting the trial offer on the right. Thank you for visiting LWN.net! August 23, 2022 This article was contributed by Jordan Webb Docker and other container engines can greatly simplify many aspects of deploying a server-side application, but numerous applications consist of more than one container. Managing a group of containers only gets harder as additional applications and services are deployed; this has led to the development of a class of tools called container orchestrators. The best-known of these by far is Kubernetes ; the history of container orchestration can be divided into what came before it and what came after. The convenience offered by containers comes with some trade-offs; someone who adheres strictly to Docker's idea that each service should have its own container will end up running a large number of them. Even a simple web interface to a database might require running separate containers for the database server and the application; it might also include a separate container for a web server to handle serving static files, a proxy server to terminate SSL/TLS connections, a key-value store to serve as a cache, or even a second application container to handle background jobs and scheduled tasks. An administrator who is responsible for several such applications will quickly find themselves wishing for a tool to make their job easier; this is where container orchestrators step in. A container orchestrator is a tool that can manage a group of multiple containers as a single unit. Instead of operating on a single server, orchestrators allow combining multiple servers into a cluster, and automatically distribute container workloads among the cluster nodes. Docker Compose and Swarm Docker Compose is not quite an orchestrator, but it was Docker's first attempt to create a tool to make it easier to manage applications that are made out of several containers. It consumes a YAML-formatted file, which is almost always named docker-compose.yml. Compose reads this file and uses the Docker API to create the resources that it declares; Compose also adds labels to all of the resources, so that they can be managed as a group after they are created. In effect, it is an alternative to the Docker command-line interface (CLI) that operates on groups of containers. Three types of resources can be defined in a Compose file: * services contains declarations of containers to be launched. Each entry in services is equivalent to a docker run command. * networks declares networks that can be attached to the containers defined in the Compose file. Each entry in networks is equivalent to a docker network create command. * volumes defines named volumes that can be attached to the containers. In Docker parlance, a volume is persistent storage that is mounted into the container. Named volumes are managed by the Docker daemon. Each entry in volumes is equivalent to a docker volume create command. Networks and volumes can be directly connected to networks and filesystems on the host that Docker is running on, or they can be provided by a plugin. Network plugins allow things like connecting containers to VPNs; a volume plugin might allow storing a volume on an NFS server or an object storage service. Compose provides a much more convenient way to manage an application that consists of multiple containers, but, at least in its original incarnation, it only worked with a single host; all of the containers that it created were run on the same machine. To extend its reach across multiple hosts, Docker introduced Swarm mode in 2016. This is actually the second product from Docker to bear the name "Swarm" -- a product from 2014 implemented a completely different approach to running containers across multiple hosts, but it is no longer maintained. It was replaced by SwarmKit, which provides the underpinnings of the current version of Docker Swarm. Swarm mode is included in Docker; no additional software is required. Creating a cluster is a simple matter of running docker swarm init on an initial node, and then docker swarm join on each additional node to be added. Swarm clusters contain two types of nodes. Manager nodes provide an API to launch containers on the cluster, and communicate with each other using a protocol based on the Raft Consensus Algorithm in order to synchronize the state of the cluster across all managers. Worker nodes do the actual work of running containers. It is unclear how large these clusters can be; Docker's documentation says that a cluster should have no more than 7 manager nodes but does not specify a limit on the number of worker nodes. Bridging container networks across nodes is built-in, but sharing storage between nodes is not; third-party volume plugins need to be used to provide shared persistent storage across nodes. Services are deployed on a swarm using Compose files. Swarm extended the Compose format by adding a deploy key to each service that specifies how many instances of the service should be running and which nodes they should run on. Unfortunately, this led to a divergence between Compose and Swarm, which caused some confusion because options like CPU and memory quotas needed to be specified in different ways depending on which tool was being used. During this period of divergence, a file intended for Swarm was referred to as a "stack file" instead of a Compose file in an attempt to disambiguate the two; thankfully, these differences appear to have been smoothed over in the current versions of Swarm and Compose, and any references to a stack file being distinct from a Compose file seem to have largely been scoured from the Internet. The Compose format now has an open specification and its own GitHub organization providing reference implementations. There is some level of uncertainty about the future of Swarm. It once formed the backbone of a service called Docker Cloud, but the service was suddenly shut down in 2018. It was also touted as a key feature of Docker's Enterprise Edition, but that product has since been sold to another company and is now marketed as Mirantis Kubernetes Engine. Meanwhile, recent versions of Compose have gained the ability to deploy containers to services hosted by Amazon and Microsoft. There has been no deprecation announcement, but there also hasn't been any announcement of any other type in recent memory; searching for the word "Swarm" on Docker's website only turns up passing mentions. Kubernetes Kubernetes (sometimes known as k8s) is a project inspired by an internal Google tool called Borg. Kubernetes manages resources and coordinates running workloads on clusters of up to thousands of nodes; it dominates container orchestration like Google dominates search. Google wanted to collaborate with Docker on Kubernetes development in 2014, but Docker decided to go its own way with Swarm. Instead, Kubernetes grew up under the auspices of the Cloud Native Computing Foundation (CNCF). By 2017, Kubernetes had grown so popular that Docker announced that it would be integrated into Docker's own product. Aside from its popularity, Kubernetes is primarily known for its complexity. Setting up a new cluster by hand is an involved task, which requires the administrator to select and configure several third-party components in addition to Kubernetes itself. Much like the Linux kernel needs to be combined with additional software to make a complete operating system, Kubernetes is only an orchestrator and needs to be combined with additional software to make a complete cluster. It needs a container engine to run its containers; it also needs plugins for networking and persistent volumes. Kubernetes distributions exist to fill this gap. Like a Linux distribution, a Kubernetes distribution bundles Kubernetes with an installer and a curated selection of third-party components. Different distributions exist to fill different niches; seemingly every tech company of a certain size has its own distribution and/or hosted offering to cater to enterprises. The minikube project offers an easier on-ramp for developers looking for a local environment to experiment with. Unlike their Linux counterparts, Kubernetes distributions are certified for conformance by the CNCF; each distribution must implement the same baseline of functionality in order to obtain the certification, which allows them to use the "Certified Kubernetes" badge. A Kubernetes cluster contains several software components. Every node in the cluster runs an agent called the kubelet to maintain membership in the cluster and accept work from it, a container engine, and kube-proxy to enable network communication with containers running on other nodes. The components that maintain the state of the cluster and make decisions about resource allocations are collectively referred to as the control plane -- these include a distributed key-value store called etcd, a scheduler that assigns work to cluster nodes, and one or more controller processes that react to changes in the state of the cluster and trigger any actions needed to make the actual state match the desired state. Users and cluster nodes interact with the control plane through the Kubernetes API server. To effect changes, users set the desired state of the cluster through the API server, while the kubelet reports the actual state of each cluster node to the controller processes. Kubernetes runs containers inside an abstraction called a pod, which can contain one or more containers, although running containers for more than one service in a pod is discouraged. Instead, a pod will generally have a single main container that provides a service, and possibly one or more "sidecar" containers that collect metrics or logs from the service running in the main container. All of the containers in a pod will be scheduled together on the same machine, and will share a network namespace -- containers running within the same pod can communicate with each other over the loopback interface. Each pod receives its own unique IP address within the cluster. Containers running in different pods can communicate with each other using their cluster IP addresses. A pod specifies a set of containers to run, but the definition of a pod says nothing about where to run those containers, or how long to run them for -- without this information, Kubernetes will start the containers somewhere on the cluster, but will not restart them when they exit, and may abruptly terminate them if the control plane decides the resources they are using are needed by another workload. For this reason, pods are rarely used alone; instead, the definition of a pod is usually wrapped in a Deployment object, which is used to define a persistent service. Like Compose and Swarm, the objects managed by Kubernetes are declared in YAML; for Kubernetes, the YAML declarations are submitted to the cluster using the kubectl tool. In addition to pods and Deployments, Kubernetes can manage many other types of objects, like load balancers and authorization policies. The list of supported APIs is continually evolving, and will vary depending on which version of Kubernetes and which distribution a cluster is running. Custom resources can be used to add APIs to a cluster to manage additional types of objects. KubeVirt adds APIs to enable Kubernetes to run virtual machines, for example. The complete list of APIs supported by a particular cluster can be discovered with the kubectl api-versions command. Unlike Compose, each of these objects is declared in a separate YAML document, although multiple YAML documents can be inlined in the same file by separating them with "---", as seen in the Kubernetes documentation. A complex application might consist of many objects with their definitions spread across multiple files; keeping all of these definitions in sync with each other when maintaining such an application can be quite a chore. In order to make this easier, some Kubernetes administrators have turned to templating tools like Jsonnet. Helm takes the templating approach a step further. Like Kubernetes, development of Helm takes place under the aegis of the CNCF; it is billed as "the package manager for Kubernetes". Helm generates YAML configurations for Kubernetes from a collection of templates and variable declarations called a chart. Its template language is distinct from the Jinja templates used by Ansible but looks fairly similar to them; people who are familiar with Ansible Roles will likely feel at home with Helm Charts. Collections of Helm charts can be published in Helm repositories; Artifact Hub provides a large directory of public Helm repositories. Administrators can add these repositories to their Helm configuration and use the ready-made Helm charts to deploy prepackaged versions of popular applications to their cluster. Recent versions of Helm also support pushing and pulling charts to and from container registries, giving administrators the option to store charts in the same place that they store container images. Kubernetes shows no signs of losing momentum any time soon. It is designed to manage any type of resource; this flexibility, as demonstrated by the KubeVirt virtual-machine controller, gives it the potential to remain relevant even if containerized workloads should eventually fall out of favor. Development proceeds at a healthy clip and new major releases come out regularly. Releases are supported for a year; there doesn't seem to be a long-term support version available. Upgrading a cluster is supported, but some prefer to bring up a new cluster and migrate their services over to it. Nomad Nomad is an orchestrator from HashiCorp, which is marketed as a simpler alternative to Kubernetes. Nomad is an open source project, like Docker and Kubernetes. It consists of a single binary called nomad, which can be used to start a daemon called the agent and also serves as a CLI to communicate with an agent. Depending on how it is configured, the agent process can run in one of two modes. Agents running in server mode accept jobs and allocate cluster resources for them. Agents running in client mode contact the servers to receive jobs, run them, and report their status back to the servers. The agent can also run in development mode, where it takes on the role of both client and server to form a single-node cluster that can be used for testing purposes. Creating a Nomad cluster can be quite simple. In Nomad's most basic mode of operation, the initial server agent must be started, then additional nodes can be added to the cluster using the nomad server join command. HashiCorp also provides Consul, which is a general-purpose service mesh and discovery tool. While it can be used standalone, Nomad is probably at its best when used in combination with Consul. The Nomad agent can use Consul to automatically discover and join a cluster, and can also perform health checks, serve DNS records, and provide HTTPS proxies to services running on the cluster. Nomad supports complex cluster topologies. Each cluster is divided into one or more "data centers". Like Swarm, server agents within a single data center communicate with each other using a protocol based on Raft; this protocol has tight latency requirements, but multiple data centers may be linked together using a gossip protocol that allows information to propagate through the cluster without each server having to maintain a direct connection to every other. Data centers linked together in this way can act as one cluster from a user's perspective. This architecture gives Nomad an advantage when scaled up to enormous clusters. Kubernetes officially supports up to 5,000 nodes and 300,000 containers, whereas Nomad's documentation cites example of clusters containing over 10,000 nodes and 2,000,000 containers. Like Kubernetes, Nomad doesn't include a container engine or runtime. It uses task drivers to run jobs. Task drivers that use Docker and Podman to run containers are included; community-supported drivers are available for other container engines. Also like Kubernetes, Nomad's ambitions are not limited to containers; there are also task drivers for other types of workloads, including a fork/exec driver that simply runs a command on the host, a QEMU driver for running virtual machines, and a Java driver for launching Java applications. Community-supported task drivers connect Nomad to other types of workloads. Unlike Docker or Kubernetes, Nomad eschews YAML in favor of HashiCorp Configuration Language (HCL), which was originally created for another HashiCorp project for provisioning cloud resources called Terraform. HCL is used across the HashiCorp product line, although it has limited adoption elsewhere. Documents written in HCL can easily be converted to JSON, but it aims to provide a syntax that is more finger-friendly than JSON and less error-prone than YAML. HashiCorp's equivalent to Helm is called Nomad Pack. Like Helm, Nomad Pack processes a directory full of templates and variable declarations to generate job configurations. Nomad also has a community registry of pre-packaged applications, but the selection is much smaller than what is available for Helm at Artifact Hub. Nomad does not have the same level of popularity as Kubernetes. Like Swarm, its development appears to be primarily driven by its creators; although it has been deployed by many large companies, HashiCorp is still very much the center of the community around Nomad. At this point, it seems unlikely the project has gained enough momentum to have a life independent from its corporate parent. Users can perhaps find assurance in the fact that HashiCorp is much more clearly committed to the development and promotion of Nomad than Docker is to Swarm. Conclusion Swarm, Kubernetes, and Nomad are not the only container orchestrators, but they are the three most viable. Apache Mesos can also be used to run containers, but it was nearly mothballed in 2021; DC/OS is based on Mesos, but much like Docker Enterprise Edition, the company that backed its development is now focused on Kubernetes. Most "other" container orchestration projects, like OpenShift and Rancher, are actually just enhanced (and certified) Kubernetes distributions, even if they don't have Kubernetes in their name. Despite (or perhaps, because of) its complexity, Kubernetes currently enjoys the most popularity by far, but HashiCorp's successes with Nomad show that there is still room for alternatives. Some users remain loyal to the simplicity of Docker Swarm, but its future is uncertain. Other alternatives appear to be largely abandoned at this point. It would seem that the landscape has largely settled around these three players, but container orchestration is a still a relatively immature area. Ten years ago, very little of this technology even existed, and things are still evolving quickly. There are likely many exciting new ideas and developments in container orchestration that are still to come. [Special thanks to Guinevere Saenger for educating me with regard to some of the finer points of Kubernetes and providing some important corrections for this article.] Index entries for this article GuestArticles Webb, Jordan [Send a free link] Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion. ----------------------------------------- (Log in to post comments) The container orchestrator landscape Posted Aug 23, 2022 19:08 UTC (Tue) by NYKevin (subscriber, #129325) [Link] Speaking as a Google SRE, looking at k8s "from the other side" as it were, my main problem with it is actually not complexity, but terminology. It's *almost* 1:1 equivalent to Borg, except everything has a different name and it's differently opinionated about certain things. For example, in k8s you have pods which own or manage containers, and then you have a ReplicaSet which owns or manages one or more pods. This is all very sensible and reasonable. In Borg, we would say that you have alloc instances (pods) which own or manage tasks (containers), and then you have an alloc (ReplicaSet) which describes one or more alloc instances. The problem is that Borg also describes the set of tasks as a "job," and this has no direct equivalent in k8s.* Worse, Borg lets you dispense with the alloc altogether, so you can just have a "naked" job that consists of a bunch of tasks (containers) with no pod-like abstraction over them. On the one hand, this means that we don't have to configure the alloc if we don't want or need to. On the other hand, it means that we have two ways of doing things. But the real problem is that, when we're talking about Borg informally, we often say "job" instead of "job or alloc" - which is the one Borg term that doesn't really have a clean equivalent in k8s. The reverse also happens. Borg doesn't let you submit individual alloc instances (pods) or tasks (containers) without wrapping them up in an alloc (ReplicaSet) or job (see above), so if you just want one copy of something, you have to give Borg a template and say "make one copy of it" instead of submitting the individual object directly, and so in practice we mostly speak of "jobs and allocs" rather than "tasks and alloc instances." But in k8s, you can configure one pod at a time if you really want to. (For more specifics on how Borg works, read the paper: https:// research.google/pubs/pub43438/) * k8s also defines something called a "job," but it's a completely different thing, not relevant here. [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 19:12 UTC (Tue) by jordan (subscriber, #110573) [ Link] Could you tell us how widely deployed Borg is vs k8s inside of Google? While I was doing research for this piece, I found some folks saying that Google was mostly still on Borg internally, and k8s was only used for a few Google Cloud offerings, but the most recent thing I could find about that was from 2018 and it seemed too out-of-date and questionably sourced to include in the article. [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 20:39 UTC (Tue) by NYKevin (subscriber, #129325) [Link] If it's not public, then somebody probably decided that it shouldn't be public, so I don't think I can go into much detail here without asking the mothership for permission. In general however, I would say that we still use Borg for a lot of things. Beyond that, I'm afraid I would have to refer you to Google's rather meager public k8s documentation. [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 20:43 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (Note also that journalists can email press@google.com - I have no idea whether they would be willing to answer questions of this nature.) [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 20:23 UTC (Tue) by dw (subscriber, #12017) [Link] My memory is a little feint after 15 years, but I seem to recall a handful of binaries (no more than 4) that could be manually provisioned on a desktop over lunch using a few kilobyte-sized argvs. That has not been my experience of k8s at all, Borg was a much tidier system that had (at least at that stage) not yet succumbed to enterprization or excessive modularity. [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 19:21 UTC (Tue) by zyga (subscriber, #81533) [ Link] While somewhat of a different breed, does anyone here have experience with using juju, either with k8s or without it? [Reply to this comment] ECS is worth a mention Posted Aug 23, 2022 20:18 UTC (Tue) by dw (subscriber, #12017) [Link] There are a few of us out there who'd prefer ECS at all costs after experiencing the alternatives. Much simpler control plane, AWS-grade backwards compatibility, no inscrutable hypermodularity, and you can run it on your own infra so long as you're happy forking over $5/mo. per node for the managed control plane. I stopped looking at or caring for alternatives, ECS has just the right level of complexity and it's a real shame nobody has found the time to do a free software clone of its control plane. [Reply to this comment] ECS is worth a mention Posted Aug 23, 2022 21:22 UTC (Tue) by beagnach (subscriber, #32987) [Link] Agreed. I feel our team dodged a bullet by opting for ECS over K8S for our fairly straightforward web application. [Reply to this comment] ECS is worth a mention Posted Aug 23, 2022 22:34 UTC (Tue) by k8to (subscriber, #15413) [ Link] This is a rough one. Getting locked into the amazon ecosystem could hurt in the long term, which is full of overcomplicated and difficult services. But container orchestration is often also a huge tarpit of wasted time struggling with overcomplexity. It's funny, when "open source" meant Linux and Samba to me, it seemed like a world of down to earth implementations that might be clunky in some ways but were focused on comprehensible goals. Now in a world of Kubernetes, Spark, and Solr, I associate it more with engineer-created balls of hair, that you have to take care of with specialists to keep them working. More necessary evils than amplifying enablers. [Reply to this comment] ECS is worth a mention Posted Aug 23, 2022 23:14 UTC (Tue) by dw (subscriber, #12017) [Link] "Open source" stratified long ago to incorporate most of what we used to consider enterprise crapware as the default style of project that gets any exposure. They're still the same teams of 9-5s pumping out garbage, it's just that the marketing and licenses changed substantially. Getting paid to "work on open source" might have had some edge 20 years ago, but I can only think of 1 or 2 companies today doing what I'd consider that to have meant in the early 2000s. As for ECS lock-in, the time saved on a 1 liner SSM deploy of on-prem nodes easily covers the risk at some future date of having to port container definitions to pretty much any other system. Optimistically, assuming 3 days of one person's time to set up a local k8s, ECS offers about 450 node-months before reaching breakeven (450 / 5 node cluster = 90 months, much longer than many projects last before reaching the scrapheap). Of course ECS setup isn't completely free, but relatively speaking it may as well be considered free. [Reply to this comment] ECS is worth a mention Posted Aug 24, 2022 1:12 UTC (Wed) by rjones (subscriber, #159862) [ Link] If you don't want to be joined to the hip with AWS a possibly better solution is K0s. One of the problems with self-hosting Kubernetes in the typical approach is to self host uses the naive approach of mixing Kubernetes API components (API/Scheduler/etcd/etc) with Infrastructure components (Networking/storage/ingress controllers/etc) with applications all on the same set of nodes. So you have all these containers operating at different levels all mixing together. Which means that your "blast radius" for the cluster is very bad. If you mess up a network controller configuration you can take your kubernetes offline. If a application freaks out then it can take your cluster offline. Memory resources could be exhausted by a bad deploy or misbehaving application, which then takes out your storage, etc. etc. This makes upgrades irritating and difficult and full of pitfalls and the cluster very vulnerable to misconfigurations. You can mitigate these issues by separating out 'admin' nodes from 'etcd', 'storage', and 'worker' nodes. This greatly reduces the chances of outages and makes management easier, but it also adds a lot of extra complexity and setup. This is a lot of configuring and messing around if you are interested in just hosting 1-5 node kubernetes cluster for personal lab or specific project or whatever. With K0s (and similar approaches with k3s and RancherOS) you have a single Unix-style service that provides the Kubernetes API components. You can cluster if you want, but the simplest setup just uses sqlite as the backend, which works fine for small or single use clusters. This runs in a separate VM or small machine from the rest of the cluster. Even if it's a single point of failure it's not too bad. The cluster will happily hum right along as you reboot your k0s controller node. In this way managing the cluster is much more like how AWS EKS or Azure AKS cluster works. With those the API services are managed by the cloud provider separate from what you manage. This is a massive improvement over what you may have experienced with something like OpenShift, Kubespray, or even really simple kadmin-based deploys. And most other approaches. It may not seem like a big deal, but for what most people are interested in terms of self-hosted kubernetes clusters I think it is. Also I think that having numerous smaller k8s clusters is preferable over having a very large multi-tenet clusters. Just having things split up solves a lot of potential issues. [Reply to this comment] ECS is worth a mention Posted Aug 24, 2022 6:30 UTC (Wed) by dw (subscriber, #12017) [Link] I've spent enough time de-wtfing k3s installs that hadn't been rebooted just long enough for something inscrutable to break that I'd assume k0s was a non-starter for much the same reason. You can't really fix a stupid design by jamming all the stupid components together more tightly, although I admit it at least improves the sense of manageability The problem with kubernetes starts and ends with its design, it's horrible to work with in concept never mind any particular implementation [Reply to this comment] ECS is worth a mention Posted Aug 24, 2022 16:05 UTC (Wed) by sbheinlein (guest, #160469) [ Link] > "seemingly every tech company of a certain size has its own distribution and/or hosted offering to cater to enterprises" That's enough of a mention for me. [Reply to this comment] Apache YARN Posted Aug 23, 2022 20:55 UTC (Tue) by cry_regarder (subscriber, # 50545) [Link] Are folks here using YARN? https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-... IIRC it is the default orchestrator for things like Samza. https:// samza.apache.org/learn/documentation/latest/deplo... [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 22:12 UTC (Tue) by onlyben (subscriber, #132784) [Link] Does anyone have deep experience using both k8s and Nomad, and can provide some assessment on the differences or use cases you'd use them for? I try and avoid Kubernetes if I can. I do appreciate the problem it solves and think it is extremely useful, but it hasn't quite captured the magic for me that other tools have (including original docker, and probably even docker-compose). I'd be curious to know what people feel about Nomad. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 3:27 UTC (Wed) by Cyberax ( supporter , #52523) [Link] K8s is more flexible compared to Nomad, but at the cost of complexity. There's a nice page with a description here: https:// www.nomadproject.io/docs/nomad-vs-kubernetes I personally would avoid Nomad right now. It's an "open core" system, with the "enterprise" version locking in some very useful features like multi-region support. With K8s you can also use EKS on AWS or AKS on Azure to offload running the control plane to AWS/Azure. It's still very heavy on infrastructure that you need to configure, but at least it's straightforward and needs to be done once. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 16:42 UTC (Wed) by schmichael (subscriber, # 160476) [Link] > the "enterprise" version locking in some very useful features like multi-region support. Quick point of clarification: Multi-region federation is open source. You can federate Nomad clusters to create a single global control plane. Multi-region deployments (where you can deploy a single job to multiple regions) are enterprise. Single-region jobs and deployments are open source. Disclaimer: I'm the HashiCorp Nomad Engineering Team Lead [Reply to this comment] The container orchestrator landscape Posted Aug 23, 2022 23:19 UTC (Tue) by bartoc (subscriber, #124262) [ Link] systemd also has a built-in container runtime and (sorta) orchestrator, which is a lot simpler than k8s and makes it easier to have tasks that are just normal systemd units. One thing that always really annoyed me about k8s is the whole networking stack and networking requirements. My servers have real ipv6 addresses, that are routable from everywhere and I really, really do not want to deal with some insane BGP overlay. Each host can good and well get (at least) a /60 that can be further subdivided for each container. The whole process just felt like figuring out exactly how the k8s people had reimplemented any number of existing utilities. It all gave me the impression the whole thing was abstraction for abstraction's sake. I feel the same way about stuff like ansible, so maybe I just really care about what code is actually executing on my servers more than most people. I found Hashicorp's offerings (in general tbh, not just Nomad) to be a lot of shiny websites on top of very basic tools that ended up adding relatively little value compared to just using whatever it was they were abstracting over. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 1:10 UTC (Wed) by jordan (subscriber, #110573) [ Link] I find myself mourning fleet with some regularity. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 1:28 UTC (Wed) by rjones (subscriber, #159862) [ Link] > The whole process just felt like figuring out exactly how the k8s people had reimplemented any number of existing utilities. It all gave me the impression the whole thing was abstraction for abstraction's sake. Kubernetes and it's networking stack complexity is the result of the original target for these sorts of clusters. The idea is that you needed to have a way for Kubernetes to easily adapt to a wide variety of different cloud architectures. The people that are running them don't have control over the addresses they get, addresses are very expensive, and they don't have control over any of the network infrastructure. Ipv6 isn't even close to a option for most of these types of setup. So it makes a lot of sense to take advantage of Tunnelling over TCP for the internal networking. This way it works completely independent of any physical or logical network configuration the kubernetes might be hosted on. You can even make it work between multiple cloud providers if you want. > One thing that always really annoyed me about k8s is the whole networking stack and networking requirements. My servers have real ipv6 addresses, that are routable from everywhere and I really, really do not want to deal with some insane BGP overlay. Each host can good and well get (at least) a /60 that can be further subdivided for each container. You don't have to use the tunneling network approach if you want. For example if you have physical servers with multiple network ports you can just use those separate lans instead. Generally speaking you'll want to have 3 LANs. One for the pod network, one for the service network, and one for external network. More sophisticated setups might want to have a dedicated network for storage on top of that, and I am sure that people can find uses for even more then that. I don't know how mature K8s IPv6 support is nowadays, but I can see why that would be preferable. > It all gave me the impression the whole thing was abstraction for abstraction's sake. I feel the same way about stuff like ansible, so maybe I just really care about what code is actually executing on my servers more than most people. It could be that a lot of people are not in a position to micro-manage things on that level and must depend on the expertise of other people to accomplish things in a reasonable manner. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 6:55 UTC (Wed) by dw (subscriber, #12017) [Link] I don't mean to keep jumping into your replies, but I feel I can see what stage in the cycle you're at with Kubernetes and it's probably worth pointing out something that might not immediately be obvious: in all the rush to absorb the design complexity of the system, it's very easy to forget that there are numerous ways to achieve the flexibility it offers, and the variants Kubernetes chose to bake in are only one instantiation, and IMHO usually far from the right one. Take as a simple example the network abstraction, it's maybe 20%+ of the the whole Kubernetes conceptual overhead. K8 more or less mandates some kind of mapping at the IP and naming layers, so you usually have at a minimum some variation of a custom DNS server and a few hundred ip/nf/xdp rules or whatnot to implement routing. Docker's solution to the same problem was simply a convention for dumping network addresses into environment variables. No custom DNS, no networking nonsense. It's one of a thousand baked-in choices made in k8s that really didn't need to be that way. The design itself is bad. No conversation of Kubernetes complexity is complete without mention of their obsolescent-by-design approach to API contracts. We've just entered a period where Ingresses went from marked beta, to stable, to about-to-be-deprecated by gateways. How many million lines of YAML toil across all k8s users needed trivial updates when the interface became stable, and how many million more will be wasted by the time gateways are fashionable? How long will gateways survive? That's a meta-design problem, and a huge red flag. Once you see it in a team you can expect it time and time again. Not only is it overcomplicated by design, it's also quicksand, and nothing you build on it can be expected to have any permanence. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 7:58 UTC (Wed) by bartoc (subscriber, #124262) [ Link] > The idea is that you needed to have a way for Kubernetes to easily adapt to a wide variety of different cloud architectures. The people that are running them don't have control over the addresses they get, addresses are very expensive, and they don't have control over any of the network infrastructure. Ipv6 isn't even close to a option for most of these types of setup. Well, I don't care about any cloud architectures except mine :). More seriously though the people running clouds absolutely do have control over the addresses they get! And tunneling works just as well if you want to provide access to the ipv6 internet on container hosts that only have ipv4, except in that situation you have some hope of getting rid of the tunnels once you no longer need ipv4. > Generally speaking you'll want to have 3 LANs. One for the pod network, one for the service network, and one for external network. More sophisticated setups might want to have a dedicated network for storage on top of that, and I am sure that people can find uses for even more then that. IMO this is _nuts_, I want _ONE_ network and I want that network to be the internet (with stateful firewalls, obviously). [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 0:26 UTC (Wed) by denton (subscriber, #159595) [ Link] Does Nomad enable you to extend the control plane with your own custom types? One thing that k8s got right was it gave the ability for users to define new resource types, via CustomResourceDefinitions (CRDs). So for example, if you wanted a Postgres database in your k8s cluster, you could install a CRD + Postgres Controller and have access to that new API. It's led to a large number of Operators that can enable advanced functionality in the cluster, without the user needing to understand how they work. This is similar to managed services on cloud providers, like Aurora or RDS in AWS. I'm wondering if nomad has a similar functionality? [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 1:11 UTC (Wed) by jordan (subscriber, #110573) [ Link] As far as I can tell, Nomad does not have a direct equivalent; I think the closest you could get is a custom task driver. [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 17:09 UTC (Wed) by schmichael (subscriber, # 160476) [Link] > I'm wondering if nomad has a similar functionality? No, Nomad has chosen not to implement CRDs/Controllers/Operators as seen in Kubernetes. Many users use the Nomad API to build their own service control planes, and the Nomad Autoscaler - https://github.com /hashicorp/nomad-autoscaler/ - is an example of a generic version of this: it's a completely external project and service that runs in your Nomad cluster to provide autoscaling of your other Nomad managed services and their infrastructure. Projects like Patroni also work with Nomad, so similar projects to controllers due exist: https:// github.com/ccakes/nomad-pgsql-patroni The reason (pros) for this decision is largely that it lets Nomad focus on core scheduling problems. Many of our users build a platform on top of Nomad and appreciate the clear distinction between Nomad placing workloads and their higher level platform tooling managing the specific orchestration needs of their systems using Nomad's APIs. This should feel similar to the programming principles of encapsulation and composition. The cons we've observed are: (1) you likely have to manage state for your control plane ... somewhere ... this makes it difficult to write generic open source controllers, and (2) your API will be distinct from Nomad's and require its own security, discovery, UI, etc. I don't want to diminish the pain of forcing our users to solve those themselves. I could absolutely see Nomad gaining CRD-like capabilities someday, but in the short term you should plan on having to manage controller state and APIs yourself. Disclaimer: I am the HashiCorp Nomad Engineering Team Lead [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 11:15 UTC (Wed) by jezuch (subscriber, #52988) [ Link] My favorite "orchestrator" is actually testcontainers. It turns integration tests from a horrible nightmare into something that's almost pleasant ;) The biggest downside is that they're usuallly somewhat slow to start, but everyone at my $DAYJOB is more than willing to pay that cost (which is also monetary, since those tests are executed in CI in the cloud). [Reply to this comment] The container orchestrator landscape Posted Aug 24, 2022 17:40 UTC (Wed) by jordan (subscriber, #110573) [ Link] It's worth noting, that while Docker's website is largely devoid of any mention of Swarm, Mirantis reaffirmed their commitment to Swarm in April of this year. It seems like it will continue to be supported in Mirantis's product, but it's unclear to me what that might mean users of the freely-available version of Docker, which is developed and distributed by an entirely different company. [Reply to this comment] Yikes... Posted Aug 24, 2022 17:58 UTC (Wed) by dskoll (subscriber, #1630) [ Link] I don't have much to add, but reading this hurt my brain and I now understand a second meaning of the term "Cluster****" I am so glad I'm nearing the end of my career and not starting out in tech today. [Reply to this comment] Copyright (c) 2022, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds