[HN Gopher] Cloudflare Uses HashiCorp Nomad (2020)
       ___________________________________________________________________
        
       Cloudflare Uses HashiCorp Nomad (2020)
        
       Author : saranshk
       Score  : 125 points
       Date   : 2021-10-15 16:14 UTC (6 hours ago)
        
 (HTM) web link (blog.cloudflare.com)
 (TXT) w3m dump (blog.cloudflare.com)
        
       | ridruejo wrote:
       | Well written article and great reminder that Kubernetes is not
       | the answer to everything.
        
       | [deleted]
        
       | eska wrote:
       | The developers at riot games are also always swooning over Nomad
       | and have published quite a couple of talks on YouTube as well as
       | articles on their dev blog.
       | 
       | I've looked into kubernetes but after I got it all going I
       | decided it's not worth the complexity. Same experience as
       | installing gentoo :-)
        
         | villasv wrote:
         | I decided not to learn k8s because I really think something
         | should be coming soon to replace it (or hide it). Kind of like
         | how DistBelief became Tensorflow, but now we have keras and
         | other friendly NN libraries on top.
         | 
         | I think Borg was DistBelief, Tensorflow is Kubertenes and I'm
         | waiting for Keras. Helm isn't there yet IMHO so I'm still
         | waiting.
        
       | travisd wrote:
       | 2020 indicator would be great.
       | 
       | I recently looked closer into my billing data for GKE (Google
       | managed Kubernetes) and was astounded by the overhead taken up
       | just by the Kubernetes internals. Something like 10-20%. It might
       | be better if using bigger node types.
       | 
       | how does Nomad compares on this front?
        
         | AaronFriel wrote:
         | Having run Kubernetes (GKE) in production and AKS/EKS in
         | trials, that overhead is a fixed marginal cost, not one that
         | grows with node size. When running a small cluster with 1-3
         | nodes, there are reasons to scale out first (reliability), but
         | margin will improve more quickly by scaling up nodes.
         | 
         | GCP makes that conveniently pretty easy with their pricing
         | model for CPU and memory, so it is possible to do this
         | incrementally.
        
           | hinkley wrote:
           | For capacity planning you have to look at worst case
           | behaviors, and we have a lot of Pollyannas running around
           | crowing about best and average case behaviors.
           | 
           | A consensus protocol has O(logn) behavior on any network that
           | displays any of the 8 fallacies of distributed computing. But
           | the larger the cluster the more fallacies you're likely to
           | have to deal with in a given day, and O(logn) is too
           | optimistic. If the cost is ever 'marginal', it's in little
           | islands of stability that won't last long.
           | 
           | What I see over and over again is people expending huge
           | opportunity costs trying to keep their brittle system in one
           | of these local maxima as long as they can. I think because
           | they fear that once they slip out of that comfy spot people
           | will see they aren't some miracle worker, they're just
           | slightly above average and really good at story telling.
        
             | marcinzm wrote:
             | I'm honestly not sure what your point is. The cloud
             | provider literally charges a fixed cost irrespective of the
             | number of nodes for managed kubernetes. Sure they may need
             | to deal with scaling issues and costs but that's not the
             | concern of a user who pays a flat fee.
        
               | hinkley wrote:
               | If the hosting dollar figure for K8s is the only cost you
               | can think of, you're down at the bottom of a well of
               | magic thinking that I'm not qualified to hoist you out
               | of.
        
               | marcinzm wrote:
               | Huh? This is a thread discussing the hosting costs of
               | kubernetes. I get it, you've got an axe to grind and will
               | apparently do so at any opportunity but I don't see how
               | it's relevant to the discussion.
        
               | hinkley wrote:
               | The only axe I have in this story is people who downplay
               | the costs of solutions as part of a discussion of pros
               | and cons.
               | 
               | We regularly call out pharmaceutical and petrochemical
               | companies here for doing that. I don't know why you would
               | expect tech to get a free pass.
               | 
               | The most important thing is that you don't fool yourself,
               | and you are the easiest person to fool.
        
               | sealeck wrote:
               | https://m.youtube.com/watch?v=qTiQOaxR1VU
        
             | a1369209993 wrote:
             | To be fair, average case is useful. It's the reason
             | probabalistic algorithms like quicksort (O(N^2)) and hash
             | tables (O(N)) can work at all in practice. The catch is you
             | have to know what distribution your average is being taken
             | over (Hash tables can be assumed to be a uniform
             | distribution if your hash function is good enough;
             | quicksort _can_ be a uniform distribution, but noone writes
             | it that way until after it blows up in their face, if
             | then.)
             | 
             | You do have to know about worst case behaviors.
             | 
             | There are no circumstances where best case behavior is
             | worth even calculating for anything other than curiousity,
             | unless you're trying to deny said behavior to a
             | cryptographic adversary (so it's actually _your_ worst
             | case).
        
             | AaronFriel wrote:
             | Hey, I think there may have been a misunderstanding.
             | Definitely agree with you on the complexity of weighing O
             | notation costs for distributed systems.
             | 
             | What I'm talking about, and I think the OP I replied to is
             | referring to, is the monetary and compute cost in CPU and
             | memory overhead of the kubelet and it's associated
             | processes on real-world deployments. There are plenty of
             | other costs associated with Kubernetes, of course.
             | 
             | GKE charges a fixed cost to operate the consensus protocol
             | (etcd) and control plane (kube-apiserver) on their own
             | systems.
             | 
             | On the nodes the user operates, the costs then are
             | relatively fixed, that is, there is some amount of CPU time
             | and memory spent per node on logging, metrics, and so on,
             | but as the node gets larger, that quantity does not
             | increase. (It's definitely sublinear.)
             | 
             | Or in other words: you can more greatly increase the
             | capacity of a GKE cluster by doubling the size (CPU/memory)
             | than doubling the number of nodes. Scaling up will increase
             | gross capacity by 100% while increasing overhead by a few
             | percent at most, scaling out will have the same increase in
             | gross capacity, but also increase the amount spent on node
             | overhead by 100%.
        
         | vngzs wrote:
         | Nomad is very lightweight. It also does nothing near as much as
         | what Kubernetes does. But if all you need is to schedule
         | containers (or other binaries) it does so quite nicely.
         | 
         | You generally want to pair it with Consul (as Cloudflare has
         | done) to get service discovery. Consul is also quite
         | lightweight.
        
         | Spivak wrote:
         | For work I can totally understand this trade-off but this fact
         | makes k8s pretty much a non-starter on side projects or
         | personal sites because running a small cluster on small nodes
         | leaves you so little headroom for your code that it's just not
         | worth it. Swarm/Nomad/pacemaker all sip resources by
         | comparison.
        
           | hinkley wrote:
           | I got all set up with a Pi-clone cluster and then K3s decided
           | that an extra 100MB of memory savings was not worth
           | maintaining a separate implementation of part of the
           | services.
           | 
           | On a 16GB machine that's not a big deal. On a 2GB machine
           | that's a significant fraction of your disk cache, for a very
           | slow disk. On a 1GB machine that's just make or break.
           | 
           | ETA: And you may think that these toys don't matter, but
           | people have to learn and buy-in somewhere, and the fact of
           | the matter is that I have production services right now that
           | mostly need bandwidth, not CPU or memory, and once I tune
           | them to the most appropriate AWS instance for that workload,
           | the Kubernetes overhead would become the plurality of
           | resource usage on these boxes (if I were using k8s, which we
           | are not at present). K8s only scales by getting into bin
           | packing, where this latency-sensitive service runs with
           | substantially more potentially noisy neighbors.
        
           | p_l wrote:
           | The history of that is actually a bit funny.
           | 
           | You see, there used to be no such "fixed overhead" in early
           | versions (including post 1.0) of k8s.
           | 
           | This turned out to be a worse idea for less involved or
           | experienced operators than fixed overhead, because it turned
           | out that people would run k8s nodes way underpowered, load
           | them with a ton of workload, then have lots of outages as
           | they starved critical system components of resources.
           | 
           | Because of that a (settable) "overhead minimum" was added to
           | calculation done about available resources, iirc originally
           | going for 0.9 cpu core and I don't recall how much memory.
           | This allowed to still run a 2 core node (though it's really
           | not recommended) for experimentation, while _greatly_
           | lowering chances of obscure for newbies issues. It didn 't
           | prevent them (PSA: If another team provisions your cluster,
           | check what resources they provisioned...) but it makes it
           | much harder to fail.
        
           | egberts1 wrote:
           | ^this^
        
           | hobofan wrote:
           | I mean for GKE, there is a free tier that covers the cost of
           | a autopilot cluster[0], which means basically no overhead
           | cost of running on GKE instead of a Compute Engine node.
           | 
           | [0]: https://cloud.google.com/kubernetes-
           | engine/pricing#cluster_m...
        
       | historynops wrote:
       | Cloudflare also did a talk on this:
       | https://www.hashicorp.com/resources/how-nomad-and-consul-are...
        
       | tombert wrote:
       | Out of curiosity, has anyone here used Nomad for a _home_ server?
       | Right now my home  "server" is six NVidia Jetsons running Docker
       | Swarm, which works but I feel like Swarm isn't getting the same
       | priority from Docker as some of the other orchestration things
       | out there.
       | 
       | I tried using Kubernetes, but that was a lot of work for a home
       | server, and so I went back to Swarm. Is Nomad appreciably easier
       | to set up?
        
         | DenseComet wrote:
         | I attempted to use Nomad for a couple months before going back
         | to Kubernetes using k3s. The main issue I ran into was that
         | there is no out of the box solution for ingress and it ended up
         | being quite difficult to get something working reliably. With
         | k8s, I was able to hook it up to cloudflare tunnel using
         | ingress-nginx very easily and it was all integrated into the
         | standard way of doing things.
        
         | vngzs wrote:
         | Yes, you can be up and running with a multi-machine lab install
         | in a couple hours, even if you're rolling the component
         | installations and configuration yourself. In K8s production
         | cloud workloads, this is usually offloaded to GKE/EKS/Kops, but
         | the equivalent Kubernetes manual installation is appreciably
         | more heavy-handed.
        
       | NicoJuicy wrote:
       | This is exactly the reason she I'm having a tedious time for
       | needing to learn k8s for my work :(
        
       ___________________________________________________________________
       (page generated 2021-10-15 23:02 UTC)