https://lwn.net/SubscriberLink/905164/e1f4d4c1ce35f8b9/

LWN.net Logo LWN
.net News from the source LWN

  * Content
      + Weekly Edition
      + Archives
      + Search
      + Kernel
      + Security
      + Distributions
      + Events calendar
      + Unread comments
      + -------------------------------------------------------------
      + LWN FAQ
      + Write for us

User: [        ] Password: [        ] [Log in]
|
[Subscribe]
|
[Register]
Subscribe / Log in / New account

The container orchestrator landscape

[LWN subscriber-only content]

   Welcome to LWN.net                  Free trial subscription

   The following subscription-only     Try LWN for free for 1
   content has been made available to  month: no payment or
   you by an LWN subscriber. Thousands credit card required.
   of subscribers depend on LWN for    Activate your trial
   the best news from the Linux and    subscription now and see
   free software communities. If you   why thousands of readers
   enjoy this article, please consider subscribe to LWN.net.
   accepting the trial offer on the
   right. Thank you for visiting
   LWN.net!

August 23, 2022

This article was contributed by Jordan Webb

Docker and other container engines can greatly simplify many aspects
of deploying a server-side application, but numerous applications
consist of more than one container. Managing a group of containers
only gets harder as additional applications and services are
deployed; this has led to the development of a class of tools called
container orchestrators. The best-known of these by far is Kubernetes
; the history of container orchestration can be divided into what
came before it and what came after.

The convenience offered by containers comes with some trade-offs;
someone who adheres strictly to Docker's idea that each service
should have its own container will end up running a large number of
them. Even a simple web interface to a database might require running
separate containers for the database server and the application; it
might also include a separate container for a web server to handle
serving static files, a proxy server to terminate SSL/TLS
connections, a key-value store to serve as a cache, or even a second
application container to handle background jobs and scheduled tasks.

An administrator who is responsible for several such applications
will quickly find themselves wishing for a tool to make their job
easier; this is where container orchestrators step in. A container
orchestrator is a tool that can manage a group of multiple containers
as a single unit. Instead of operating on a single server,
orchestrators allow combining multiple servers into a cluster, and
automatically distribute container workloads among the cluster nodes.

Docker Compose and Swarm

Docker Compose is not quite an orchestrator, but it was Docker's
first attempt to create a tool to make it easier to manage
applications that are made out of several containers. It consumes a
YAML-formatted file, which is almost always named docker-compose.yml.
Compose reads this file and uses the Docker API to create the
resources that it declares; Compose also adds labels to all of the
resources, so that they can be managed as a group after they are
created. In effect, it is an alternative to the Docker command-line
interface (CLI) that operates on groups of containers. Three types of
resources can be defined in a Compose file:

  * services contains declarations of containers to be launched. Each
    entry in services is equivalent to a docker run command.
  * networks declares networks that can be attached to the containers
    defined in the Compose file. Each entry in networks is equivalent
    to a docker network create command.
  * volumes defines named volumes that can be attached to the
    containers. In Docker parlance, a volume is persistent storage
    that is mounted into the container. Named volumes are managed by
    the Docker daemon. Each entry in volumes is equivalent to a
    docker volume create command.

Networks and volumes can be directly connected to networks and
filesystems on the host that Docker is running on, or they can be
provided by a plugin. Network plugins allow things like connecting
containers to VPNs; a volume plugin might allow storing a volume on
an NFS server or an object storage service.

Compose provides a much more convenient way to manage an application
that consists of multiple containers, but, at least in its original
incarnation, it only worked with a single host; all of the containers
that it created were run on the same machine. To extend its reach
across multiple hosts, Docker introduced Swarm mode in 2016. This is
actually the second product from Docker to bear the name "Swarm" -- a
product from 2014 implemented a completely different approach to
running containers across multiple hosts, but it is no longer
maintained. It was replaced by SwarmKit, which provides the
underpinnings of the current version of Docker Swarm.

Swarm mode is included in Docker; no additional software is required.
Creating a cluster is a simple matter of running docker swarm init on
an initial node, and then docker swarm join on each additional node
to be added. Swarm clusters contain two types of nodes. Manager nodes
provide an API to launch containers on the cluster, and communicate
with each other using a protocol based on the Raft Consensus
Algorithm in order to synchronize the state of the cluster across all
managers. Worker nodes do the actual work of running containers. It
is unclear how large these clusters can be; Docker's documentation
says that a cluster should have no more than 7 manager nodes but does
not specify a limit on the number of worker nodes. Bridging container
networks across nodes is built-in, but sharing storage between nodes
is not; third-party volume plugins need to be used to provide shared
persistent storage across nodes.

Services are deployed on a swarm using Compose files. Swarm extended
the Compose format by adding a deploy key to each service that
specifies how many instances of the service should be running and
which nodes they should run on. Unfortunately, this led to a
divergence between Compose and Swarm, which caused some confusion
because options like CPU and memory quotas needed to be specified in
different ways depending on which tool was being used. During this
period of divergence, a file intended for Swarm was referred to as a
"stack file" instead of a Compose file in an attempt to disambiguate
the two; thankfully, these differences appear to have been smoothed
over in the current versions of Swarm and Compose, and any references
to a stack file being distinct from a Compose file seem to have
largely been scoured from the Internet. The Compose format now has an
open specification and its own GitHub organization providing
reference implementations.

There is some level of uncertainty about the future of Swarm. It once
formed the backbone of a service called Docker Cloud, but the service
was suddenly shut down in 2018. It was also touted as a key feature
of Docker's Enterprise Edition, but that product has since been sold
to another company and is now marketed as Mirantis Kubernetes Engine.
Meanwhile, recent versions of Compose have gained the ability to
deploy containers to services hosted by Amazon and Microsoft. There
has been no deprecation announcement, but there also hasn't been any
announcement of any other type in recent memory; searching for the
word "Swarm" on Docker's website only turns up passing mentions.

Kubernetes

Kubernetes (sometimes known as k8s) is a project inspired by an
internal Google tool called Borg. Kubernetes manages resources and
coordinates running workloads on clusters of up to thousands of
nodes; it dominates container orchestration like Google dominates
search. Google wanted to collaborate with Docker on Kubernetes
development in 2014, but Docker decided to go its own way with Swarm.
Instead, Kubernetes grew up under the auspices of the Cloud Native
Computing Foundation (CNCF). By 2017, Kubernetes had grown so popular
that Docker announced that it would be integrated into Docker's own
product.

Aside from its popularity, Kubernetes is primarily known for its
complexity. Setting up a new cluster by hand is an involved task,
which requires the administrator to select and configure several
third-party components in addition to Kubernetes itself. Much like
the Linux kernel needs to be combined with additional software to
make a complete operating system, Kubernetes is only an orchestrator
and needs to be combined with additional software to make a complete
cluster. It needs a container engine to run its containers; it also
needs plugins for networking and persistent volumes.

Kubernetes distributions exist to fill this gap. Like a Linux
distribution, a Kubernetes distribution bundles Kubernetes with an
installer and a curated selection of third-party components.
Different distributions exist to fill different niches; seemingly
every tech company of a certain size has its own distribution and/or
hosted offering to cater to enterprises. The minikube project offers
an easier on-ramp for developers looking for a local environment to
experiment with. Unlike their Linux counterparts, Kubernetes
distributions are certified for conformance by the CNCF; each
distribution must implement the same baseline of functionality in
order to obtain the certification, which allows them to use the
"Certified Kubernetes" badge.

A Kubernetes cluster contains several software components. Every node
in the cluster runs an agent called the kubelet to maintain
membership in the cluster and accept work from it, a container
engine, and kube-proxy to enable network communication with
containers running on other nodes.

The components that maintain the state of the cluster and make
decisions about resource allocations are collectively referred to as
the control plane -- these include a distributed key-value store
called etcd, a scheduler that assigns work to cluster nodes, and one
or more controller processes that react to changes in the state of
the cluster and trigger any actions needed to make the actual state
match the desired state. Users and cluster nodes interact with the
control plane through the Kubernetes API server. To effect changes,
users set the desired state of the cluster through the API server,
while the kubelet reports the actual state of each cluster node to
the controller processes.

Kubernetes runs containers inside an abstraction called a pod, which
can contain one or more containers, although running containers for
more than one service in a pod is discouraged. Instead, a pod will
generally have a single main container that provides a service, and
possibly one or more "sidecar" containers that collect metrics or
logs from the service running in the main container. All of the
containers in a pod will be scheduled together on the same machine,
and will share a network namespace -- containers running within the
same pod can communicate with each other over the loopback interface.
Each pod receives its own unique IP address within the cluster.
Containers running in different pods can communicate with each other
using their cluster IP addresses.

A pod specifies a set of containers to run, but the definition of a
pod says nothing about where to run those containers, or how long to
run them for -- without this information, Kubernetes will start the
containers somewhere on the cluster, but will not restart them when
they exit, and may abruptly terminate them if the control plane
decides the resources they are using are needed by another workload.
For this reason, pods are rarely used alone; instead, the definition
of a pod is usually wrapped in a Deployment object, which is used to
define a persistent service. Like Compose and Swarm, the objects
managed by Kubernetes are declared in YAML; for Kubernetes, the YAML
declarations are submitted to the cluster using the kubectl tool.

In addition to pods and Deployments, Kubernetes can manage many other
types of objects, like load balancers and authorization policies. The
list of supported APIs is continually evolving, and will vary
depending on which version of Kubernetes and which distribution a
cluster is running. Custom resources can be used to add APIs to a
cluster to manage additional types of objects. KubeVirt adds APIs to
enable Kubernetes to run virtual machines, for example. The complete
list of APIs supported by a particular cluster can be discovered with
the kubectl api-versions command.

Unlike Compose, each of these objects is declared in a separate YAML
document, although multiple YAML documents can be inlined in the same
file by separating them with "---", as seen in the Kubernetes
documentation. A complex application might consist of many objects
with their definitions spread across multiple files; keeping all of
these definitions in sync with each other when maintaining such an
application can be quite a chore. In order to make this easier, some
Kubernetes administrators have turned to templating tools like
Jsonnet.

Helm takes the templating approach a step further. Like Kubernetes,
development of Helm takes place under the aegis of the CNCF; it is
billed as "the package manager for Kubernetes". Helm generates YAML
configurations for Kubernetes from a collection of templates and
variable declarations called a chart. Its template language is
distinct from the Jinja templates used by Ansible but looks fairly
similar to them; people who are familiar with Ansible Roles will
likely feel at home with Helm Charts.

Collections of Helm charts can be published in Helm repositories;
Artifact Hub provides a large directory of public Helm repositories.
Administrators can add these repositories to their Helm configuration
and use the ready-made Helm charts to deploy prepackaged versions of
popular applications to their cluster. Recent versions of Helm also
support pushing and pulling charts to and from container registries,
giving administrators the option to store charts in the same place
that they store container images.

Kubernetes shows no signs of losing momentum any time soon. It is
designed to manage any type of resource; this flexibility, as
demonstrated by the KubeVirt virtual-machine controller, gives it the
potential to remain relevant even if containerized workloads should
eventually fall out of favor. Development proceeds at a healthy clip
and new major releases come out regularly. Releases are supported for
a year; there doesn't seem to be a long-term support version
available. Upgrading a cluster is supported, but some prefer to bring
up a new cluster and migrate their services over to it.

Nomad

Nomad is an orchestrator from HashiCorp, which is marketed as a
simpler alternative to Kubernetes. Nomad is an open source project,
like Docker and Kubernetes. It consists of a single binary called
nomad, which can be used to start a daemon called the agent and also
serves as a CLI to communicate with an agent. Depending on how it is
configured, the agent process can run in one of two modes. Agents
running in server mode accept jobs and allocate cluster resources for
them. Agents running in client mode contact the servers to receive
jobs, run them, and report their status back to the servers. The
agent can also run in development mode, where it takes on the role of
both client and server to form a single-node cluster that can be used
for testing purposes.

Creating a Nomad cluster can be quite simple. In Nomad's most basic
mode of operation, the initial server agent must be started, then
additional nodes can be added to the cluster using the
nomad server join command. HashiCorp also provides Consul, which is a
general-purpose service mesh and discovery tool. While it can be used
standalone, Nomad is probably at its best when used in combination
with Consul. The Nomad agent can use Consul to automatically discover
and join a cluster, and can also perform health checks, serve DNS
records, and provide HTTPS proxies to services running on the
cluster.

Nomad supports complex cluster topologies. Each cluster is divided
into one or more "data centers". Like Swarm, server agents within a
single data center communicate with each other using a protocol based
on Raft; this protocol has tight latency requirements, but multiple
data centers may be linked together using a gossip protocol that
allows information to propagate through the cluster without each
server having to maintain a direct connection to every other. Data
centers linked together in this way can act as one cluster from a
user's perspective. This architecture gives Nomad an advantage when
scaled up to enormous clusters. Kubernetes officially supports up to
5,000 nodes and 300,000 containers, whereas Nomad's documentation
cites example of clusters containing over 10,000 nodes and 2,000,000
containers.

Like Kubernetes, Nomad doesn't include a container engine or runtime.
It uses task drivers to run jobs. Task drivers that use Docker and
Podman to run containers are included; community-supported drivers
are available for other container engines. Also like Kubernetes,
Nomad's ambitions are not limited to containers; there are also task
drivers for other types of workloads, including a fork/exec driver
that simply runs a command on the host, a QEMU driver for running
virtual machines, and a Java driver for launching Java applications.
Community-supported task drivers connect Nomad to other types of
workloads.

Unlike Docker or Kubernetes, Nomad eschews YAML in favor of HashiCorp
Configuration Language (HCL), which was originally created for
another HashiCorp project for provisioning cloud resources called
Terraform. HCL is used across the HashiCorp product line, although it
has limited adoption elsewhere. Documents written in HCL can easily
be converted to JSON, but it aims to provide a syntax that is more
finger-friendly than JSON and less error-prone than YAML.

HashiCorp's equivalent to Helm is called Nomad Pack. Like Helm, Nomad
Pack processes a directory full of templates and variable
declarations to generate job configurations. Nomad also has a
community registry of pre-packaged applications, but the selection is
much smaller than what is available for Helm at Artifact Hub.

Nomad does not have the same level of popularity as Kubernetes. Like
Swarm, its development appears to be primarily driven by its
creators; although it has been deployed by many large companies,
HashiCorp is still very much the center of the community around
Nomad. At this point, it seems unlikely the project has gained enough
momentum to have a life independent from its corporate parent. Users
can perhaps find assurance in the fact that HashiCorp is much more
clearly committed to the development and promotion of Nomad than
Docker is to Swarm.

Conclusion

Swarm, Kubernetes, and Nomad are not the only container
orchestrators, but they are the three most viable. Apache Mesos can
also be used to run containers, but it was nearly mothballed in 2021;
DC/OS is based on Mesos, but much like Docker Enterprise Edition, the
company that backed its development is now focused on Kubernetes.
Most "other" container orchestration projects, like OpenShift and
Rancher, are actually just enhanced (and certified) Kubernetes
distributions, even if they don't have Kubernetes in their name.

Despite (or perhaps, because of) its complexity, Kubernetes currently
enjoys the most popularity by far, but HashiCorp's successes with
Nomad show that there is still room for alternatives. Some users
remain loyal to the simplicity of Docker Swarm, but its future is
uncertain. Other alternatives appear to be largely abandoned at this
point. It would seem that the landscape has largely settled around
these three players, but container orchestration is a still a
relatively immature area. Ten years ago, very little of this
technology even existed, and things are still evolving quickly. There
are likely many exciting new ideas and developments in container
orchestration that are still to come.

[Special thanks to Guinevere Saenger for educating me with regard to
some of the finer points of Kubernetes and providing some important
corrections for this article.]


Index entries for this article
GuestArticles    Webb, Jordan


[Send a free link]


    Did you like this article? Please accept our trial subscription
    offer to be able to see more content like it and to participate
    in the discussion.

-----------------------------------------
(Log in to post comments)

The container orchestrator landscape

Posted Aug 23, 2022 19:08 UTC (Tue) by NYKevin (subscriber, #129325)
[Link]

Speaking as a Google SRE, looking at k8s "from the other side" as it
were, my main problem with it is actually not complexity, but
terminology. It's *almost* 1:1 equivalent to Borg, except everything
has a different name and it's differently opinionated about certain
things. For example, in k8s you have pods which own or manage
containers, and then you have a ReplicaSet which owns or manages one
or more pods. This is all very sensible and reasonable. In Borg, we
would say that you have alloc instances (pods) which own or manage
tasks (containers), and then you have an alloc (ReplicaSet) which
describes one or more alloc instances. The problem is that Borg also
describes the set of tasks as a "job," and this has no direct
equivalent in k8s.* Worse, Borg lets you dispense with the alloc
altogether, so you can just have a "naked" job that consists of a
bunch of tasks (containers) with no pod-like abstraction over them.
On the one hand, this means that we don't have to configure the alloc
if we don't want or need to. On the other hand, it means that we have
two ways of doing things. But the real problem is that, when we're
talking about Borg informally, we often say "job" instead of "job or
alloc" - which is the one Borg term that doesn't really have a clean
equivalent in k8s.

The reverse also happens. Borg doesn't let you submit individual
alloc instances (pods) or tasks (containers) without wrapping them up
in an alloc (ReplicaSet) or job (see above), so if you just want one
copy of something, you have to give Borg a template and say "make one
copy of it" instead of submitting the individual object directly, and
so in practice we mostly speak of "jobs and allocs" rather than
"tasks and alloc instances." But in k8s, you can configure one pod at
a time if you really want to.

(For more specifics on how Borg works, read the paper: https://
research.google/pubs/pub43438/)

* k8s also defines something called a "job," but it's a completely
different thing, not relevant here.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 19:12 UTC (Tue) by jordan (subscriber, #110573) [
Link]

Could you tell us how widely deployed Borg is vs k8s inside of
Google? While I was doing research for this piece, I found some folks
saying that Google was mostly still on Borg internally, and k8s was
only used for a few Google Cloud offerings, but the most recent thing
I could find about that was from 2018 and it seemed too out-of-date
and questionably sourced to include in the article.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 20:39 UTC (Tue) by NYKevin (subscriber, #129325)
[Link]

If it's not public, then somebody probably decided that it shouldn't
be public, so I don't think I can go into much detail here without
asking the mothership for permission. In general however, I would say
that we still use Borg for a lot of things. Beyond that, I'm afraid I
would have to refer you to Google's rather meager public k8s
documentation.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 20:43 UTC (Tue) by NYKevin (subscriber, #129325)
[Link]

(Note also that journalists can email press@google.com - I have no
idea whether they would be willing to answer questions of this
nature.)
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 20:23 UTC (Tue) by dw (subscriber, #12017) [Link]

My memory is a little feint after 15 years, but I seem to recall a
handful of binaries (no more than 4) that could be manually
provisioned on a desktop over lunch using a few kilobyte-sized argvs.
That has not been my experience of k8s at all, Borg was a much tidier
system that had (at least at that stage) not yet succumbed to
enterprization or excessive modularity.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 19:21 UTC (Tue) by zyga (subscriber, #81533) [
Link]

While somewhat of a different breed, does anyone here have experience
with using juju, either with k8s or without it?
[Reply to this comment]
ECS is worth a mention

Posted Aug 23, 2022 20:18 UTC (Tue) by dw (subscriber, #12017) [Link]

There are a few of us out there who'd prefer ECS at all costs after
experiencing the alternatives. Much simpler control plane, AWS-grade
backwards compatibility, no inscrutable hypermodularity, and you can
run it on your own infra so long as you're happy forking over $5/mo.
per node for the managed control plane.

I stopped looking at or caring for alternatives, ECS has just the
right level of complexity and it's a real shame nobody has found the
time to do a free software clone of its control plane.
[Reply to this comment]
ECS is worth a mention

Posted Aug 23, 2022 21:22 UTC (Tue) by beagnach (subscriber, #32987)
[Link]

Agreed. I feel our team dodged a bullet by opting for ECS over K8S
for our fairly straightforward web application.
[Reply to this comment]
ECS is worth a mention

Posted Aug 23, 2022 22:34 UTC (Tue) by k8to (subscriber, #15413) [
Link]

This is a rough one. Getting locked into the amazon ecosystem could
hurt in the long term, which is full of overcomplicated and difficult
services. But container orchestration is often also a huge tarpit of
wasted time struggling with overcomplexity.

It's funny, when "open source" meant Linux and Samba to me, it seemed
like a world of down to earth implementations that might be clunky in
some ways but were focused on comprehensible goals. Now in a world of
Kubernetes, Spark, and Solr, I associate it more with
engineer-created balls of hair, that you have to take care of with
specialists to keep them working. More necessary evils than
amplifying enablers.
[Reply to this comment]
ECS is worth a mention

Posted Aug 23, 2022 23:14 UTC (Tue) by dw (subscriber, #12017) [Link]

"Open source" stratified long ago to incorporate most of what we used
to consider enterprise crapware as the default style of project that
gets any exposure. They're still the same teams of 9-5s pumping out
garbage, it's just that the marketing and licenses changed
substantially. Getting paid to "work on open source" might have had
some edge 20 years ago, but I can only think of 1 or 2 companies
today doing what I'd consider that to have meant in the early 2000s.

As for ECS lock-in, the time saved on a 1 liner SSM deploy of on-prem
nodes easily covers the risk at some future date of having to port
container definitions to pretty much any other system.
Optimistically, assuming 3 days of one person's time to set up a
local k8s, ECS offers about 450 node-months before reaching breakeven
(450 / 5 node cluster = 90 months, much longer than many projects
last before reaching the scrapheap). Of course ECS setup isn't
completely free, but relatively speaking it may as well be considered
free.
[Reply to this comment]
ECS is worth a mention

Posted Aug 24, 2022 1:12 UTC (Wed) by rjones (subscriber, #159862) [
Link]

If you don't want to be joined to the hip with AWS a possibly better
solution is K0s.

One of the problems with self-hosting Kubernetes in the typical
approach is to self host uses the naive approach of mixing Kubernetes
API components (API/Scheduler/etcd/etc) with Infrastructure
components (Networking/storage/ingress controllers/etc) with
applications all on the same set of nodes.

So you have all these containers operating at different levels all
mixing together. Which means that your "blast radius" for the cluster
is very bad. If you mess up a network controller configuration you
can take your kubernetes offline. If a application freaks out then it
can take your cluster offline. Memory resources could be exhausted by
a bad deploy or misbehaving application, which then takes out your
storage, etc. etc.

This makes upgrades irritating and difficult and full of pitfalls and
the cluster very vulnerable to misconfigurations.

You can mitigate these issues by separating out 'admin' nodes from
'etcd', 'storage', and 'worker' nodes. This greatly reduces the
chances of outages and makes management easier, but it also adds a
lot of extra complexity and setup. This is a lot of configuring and
messing around if you are interested in just hosting 1-5 node
kubernetes cluster for personal lab or specific project or whatever.

With K0s (and similar approaches with k3s and RancherOS) you have a
single Unix-style service that provides the Kubernetes API
components. You can cluster if you want, but the simplest setup just
uses sqlite as the backend, which works fine for small or single use
clusters. This runs in a separate VM or small machine from the rest
of the cluster. Even if it's a single point of failure it's not too
bad. The cluster will happily hum right along as you reboot your k0s
controller node.

In this way managing the cluster is much more like how AWS EKS or
Azure AKS cluster works. With those the API services are managed by
the cloud provider separate from what you manage.

This is a massive improvement over what you may have experienced with
something like OpenShift, Kubespray, or even really simple
kadmin-based deploys. And most other approaches. It may not seem like
a big deal, but for what most people are interested in terms of
self-hosted kubernetes clusters I think it is.

Also I think that having numerous smaller k8s clusters is preferable
over having a very large multi-tenet clusters. Just having things
split up solves a lot of potential issues.

[Reply to this comment]
ECS is worth a mention

Posted Aug 24, 2022 6:30 UTC (Wed) by dw (subscriber, #12017) [Link]

I've spent enough time de-wtfing k3s installs that hadn't been
rebooted just long enough for something inscrutable to break that I'd
assume k0s was a non-starter for much the same reason. You can't
really fix a stupid design by jamming all the stupid components
together more tightly, although I admit it at least improves the
sense of manageability

The problem with kubernetes starts and ends with its design, it's
horrible to work with in concept never mind any particular
implementation
[Reply to this comment]
ECS is worth a mention

Posted Aug 24, 2022 16:05 UTC (Wed) by sbheinlein (guest, #160469) [
Link]

> "seemingly every tech company of a certain size has its own
distribution and/or hosted offering to cater to enterprises"

That's enough of a mention for me.
[Reply to this comment]
Apache YARN

Posted Aug 23, 2022 20:55 UTC (Tue) by cry_regarder (subscriber, #
50545) [Link]

Are folks here using YARN?

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-...

IIRC it is the default orchestrator for things like Samza. https://
samza.apache.org/learn/documentation/latest/deplo...
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 22:12 UTC (Tue) by onlyben (subscriber, #132784)
[Link]

Does anyone have deep experience using both k8s and Nomad, and can
provide some assessment on the differences or use cases you'd use
them for?

I try and avoid Kubernetes if I can. I do appreciate the problem it
solves and think it is extremely useful, but it hasn't quite captured
the magic for me that other tools have (including original docker,
and probably even docker-compose). I'd be curious to know what people
feel about Nomad.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 3:27 UTC (Wed) by Cyberax ( supporter , #52523)
[Link]

K8s is more flexible compared to Nomad, but at the cost of
complexity. There's a nice page with a description here: https://
www.nomadproject.io/docs/nomad-vs-kubernetes

I personally would avoid Nomad right now. It's an "open core" system,
with the "enterprise" version locking in some very useful features
like multi-region support.

With K8s you can also use EKS on AWS or AKS on Azure to offload
running the control plane to AWS/Azure. It's still very heavy on
infrastructure that you need to configure, but at least it's
straightforward and needs to be done once.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 16:42 UTC (Wed) by schmichael (subscriber, #
160476) [Link]

> the "enterprise" version locking in some very useful features like
multi-region support.

Quick point of clarification: Multi-region federation is open source.
You can federate Nomad clusters to create a single global control
plane.

Multi-region deployments (where you can deploy a single job to
multiple regions) are enterprise. Single-region jobs and deployments
are open source.

Disclaimer: I'm the HashiCorp Nomad Engineering Team Lead
[Reply to this comment]
The container orchestrator landscape

Posted Aug 23, 2022 23:19 UTC (Tue) by bartoc (subscriber, #124262) [
Link]

systemd also has a built-in container runtime and (sorta)
orchestrator, which is a lot simpler than k8s and makes it easier to
have tasks that are just normal systemd units.

One thing that always really annoyed me about k8s is the whole
networking stack and networking requirements. My servers have real
ipv6 addresses, that are routable from everywhere and I really,
really do not want to deal with some insane BGP overlay. Each host
can good and well get (at least) a /60 that can be further subdivided
for each container.

The whole process just felt like figuring out exactly how the k8s
people had reimplemented any number of existing utilities. It all
gave me the impression the whole thing was abstraction for
abstraction's sake. I feel the same way about stuff like ansible, so
maybe I just really care about what code is actually executing on my
servers more than most people.

I found Hashicorp's offerings (in general tbh, not just Nomad) to be
a lot of shiny websites on top of very basic tools that ended up
adding relatively little value compared to just using whatever it was
they were abstracting over.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 1:10 UTC (Wed) by jordan (subscriber, #110573) [
Link]

I find myself mourning fleet with some regularity.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 1:28 UTC (Wed) by rjones (subscriber, #159862) [
Link]

> The whole process just felt like figuring out exactly how the k8s
people had reimplemented any number of existing utilities. It all
gave me the impression the whole thing was abstraction for
abstraction's sake.

Kubernetes and it's networking stack complexity is the result of the
original target for these sorts of clusters.

The idea is that you needed to have a way for Kubernetes to easily
adapt to a wide variety of different cloud architectures. The people
that are running them don't have control over the addresses they get,
addresses are very expensive, and they don't have control over any of
the network infrastructure. Ipv6 isn't even close to a option for
most of these types of setup.

So it makes a lot of sense to take advantage of Tunnelling over TCP
for the internal networking. This way it works completely independent
of any physical or logical network configuration the kubernetes might
be hosted on. You can even make it work between multiple cloud
providers if you want.

> One thing that always really annoyed me about k8s is the whole
networking stack and networking requirements. My servers have real
ipv6 addresses, that are routable from everywhere and I really,
really do not want to deal with some insane BGP overlay. Each host
can good and well get (at least) a /60 that can be further subdivided
for each container.

You don't have to use the tunneling network approach if you want. For
example if you have physical servers with multiple network ports you
can just use those separate lans instead.

Generally speaking you'll want to have 3 LANs. One for the pod
network, one for the service network, and one for external network.
More sophisticated setups might want to have a dedicated network for
storage on top of that, and I am sure that people can find uses for
even more then that.

I don't know how mature K8s IPv6 support is nowadays, but I can see
why that would be preferable.

> It all gave me the impression the whole thing was abstraction for
abstraction's sake. I feel the same way about stuff like ansible, so
maybe I just really care about what code is actually executing on my
servers more than most people.


It could be that a lot of people are not in a position to
micro-manage things on that level and must depend on the expertise of
other people to accomplish things in a reasonable manner.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 6:55 UTC (Wed) by dw (subscriber, #12017) [Link]

I don't mean to keep jumping into your replies, but I feel I can see
what stage in the cycle you're at with Kubernetes and it's probably
worth pointing out something that might not immediately be obvious:
in all the rush to absorb the design complexity of the system, it's
very easy to forget that there are numerous ways to achieve the
flexibility it offers, and the variants Kubernetes chose to bake in
are only one instantiation, and IMHO usually far from the right one.

Take as a simple example the network abstraction, it's maybe 20%+ of
the the whole Kubernetes conceptual overhead. K8 more or less
mandates some kind of mapping at the IP and naming layers, so you
usually have at a minimum some variation of a custom DNS server and a
few hundred ip/nf/xdp rules or whatnot to implement routing. Docker's
solution to the same problem was simply a convention for dumping
network addresses into environment variables. No custom DNS, no
networking nonsense.

It's one of a thousand baked-in choices made in k8s that really
didn't need to be that way. The design itself is bad.

No conversation of Kubernetes complexity is complete without mention
of their obsolescent-by-design approach to API contracts. We've just
entered a period where Ingresses went from marked beta, to stable, to
about-to-be-deprecated by gateways. How many million lines of YAML
toil across all k8s users needed trivial updates when the interface
became stable, and how many million more will be wasted by the time
gateways are fashionable? How long will gateways survive? That's a
meta-design problem, and a huge red flag. Once you see it in a team
you can expect it time and time again. Not only is it overcomplicated
by design, it's also quicksand, and nothing you build on it can be
expected to have any permanence.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 7:58 UTC (Wed) by bartoc (subscriber, #124262) [
Link]

> The idea is that you needed to have a way for Kubernetes to easily
adapt to a wide variety of different cloud architectures. The people
that are running them don't have control over the addresses they get,
addresses are very expensive, and they don't have control over any of
the network infrastructure. Ipv6 isn't even close to a option for
most of these types of setup.

Well, I don't care about any cloud architectures except mine :). More
seriously though the people running clouds absolutely do have control
over the addresses they get! And tunneling works just as well if you
want to provide access to the ipv6 internet on container hosts that
only have ipv4, except in that situation you have some hope of
getting rid of the tunnels once you no longer need ipv4.

> Generally speaking you'll want to have 3 LANs. One for the pod
network, one for the service network, and one for external network.
More sophisticated setups might want to have a dedicated network for
storage on top of that, and I am sure that people can find uses for
even more then that.

IMO this is _nuts_, I want _ONE_ network and I want that network to
be the internet (with stateful firewalls, obviously).
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 0:26 UTC (Wed) by denton (subscriber, #159595) [
Link]

Does Nomad enable you to extend the control plane with your own
custom types?

One thing that k8s got right was it gave the ability for users to
define new resource types, via CustomResourceDefinitions (CRDs). So
for example, if you wanted a Postgres database in your k8s cluster,
you could install a CRD + Postgres Controller and have access to that
new API. It's led to a large number of Operators that can enable
advanced functionality in the cluster, without the user needing to
understand how they work. This is similar to managed services on
cloud providers, like Aurora or RDS in AWS.

I'm wondering if nomad has a similar functionality?
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 1:11 UTC (Wed) by jordan (subscriber, #110573) [
Link]

As far as I can tell, Nomad does not have a direct equivalent; I
think the closest you could get is a custom task driver.
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 17:09 UTC (Wed) by schmichael (subscriber, #
160476) [Link]

> I'm wondering if nomad has a similar functionality?

No, Nomad has chosen not to implement CRDs/Controllers/Operators as
seen in Kubernetes. Many users use the Nomad API to build their own
service control planes, and the Nomad Autoscaler - https://github.com
/hashicorp/nomad-autoscaler/ - is an example of a generic version of
this: it's a completely external project and service that runs in
your Nomad cluster to provide autoscaling of your other Nomad managed
services and their infrastructure. Projects like Patroni also work
with Nomad, so similar projects to controllers due exist: https://
github.com/ccakes/nomad-pgsql-patroni

The reason (pros) for this decision is largely that it lets Nomad
focus on core scheduling problems. Many of our users build a platform
on top of Nomad and appreciate the clear distinction between Nomad
placing workloads and their higher level platform tooling managing
the specific orchestration needs of their systems using Nomad's APIs.
This should feel similar to the programming principles of
encapsulation and composition.

The cons we've observed are: (1) you likely have to manage state for
your control plane ... somewhere ... this makes it difficult to write
generic open source controllers, and (2) your API will be distinct
from Nomad's and require its own security, discovery, UI, etc.

I don't want to diminish the pain of forcing our users to solve those
themselves. I could absolutely see Nomad gaining CRD-like
capabilities someday, but in the short term you should plan on having
to manage controller state and APIs yourself.

Disclaimer: I am the HashiCorp Nomad Engineering Team Lead
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 11:15 UTC (Wed) by jezuch (subscriber, #52988) [
Link]

My favorite "orchestrator" is actually testcontainers. It turns
integration tests from a horrible nightmare into something that's
almost pleasant ;) The biggest downside is that they're usuallly
somewhat slow to start, but everyone at my $DAYJOB is more than
willing to pay that cost (which is also monetary, since those tests
are executed in CI in the cloud).
[Reply to this comment]
The container orchestrator landscape

Posted Aug 24, 2022 17:40 UTC (Wed) by jordan (subscriber, #110573) [
Link]

It's worth noting, that while Docker's website is largely devoid of
any mention of Swarm, Mirantis reaffirmed their commitment to Swarm
in April of this year. It seems like it will continue to be supported
in Mirantis's product, but it's unclear to me what that might mean
users of the freely-available version of Docker, which is developed
and distributed by an entirely different company.
[Reply to this comment]
Yikes...

Posted Aug 24, 2022 17:58 UTC (Wed) by dskoll (subscriber, #1630) [
Link]

I don't have much to add, but reading this hurt my brain and I now
understand a second meaning of the term "Cluster****"

I am so glad I'm nearing the end of my career and not starting out in
tech today.
[Reply to this comment]

                  Copyright (c) 2022, Eklektix, Inc.
   Comments and public postings are copyrighted by their creators.
          Linux is a registered trademark of Linus Torvalds