[HN Gopher] Bare-Metal Kubernetes, Part I: Talos on Hetzner
___________________________________________________________________
Bare-Metal Kubernetes, Part I: Talos on Hetzner
Author : MathiasPius
Score : 186 points
Date : 2023-09-09 08:44 UTC (14 hours ago)
(HTM) web link (datavirke.dk)
(TXT) w3m dump (datavirke.dk)
| xelxebar wrote:
| Speaking of k8s, anyone here know of ready-made solutions for
| getting XCode (i.e. xcodebuild) running in pods? As far as I'm
| aware, there are no good solutions for getting XCode running on
| Linux, so at the moment I'm just futzing about with a virtual-
| kubelet[0] implementation that spawns MacOS VMs. This works just
| fine, but the problem seems like such an obvious one that I
| expect there to be some existing solution(s) I just missed.
|
| [0]:https://github.com/virtual-kubelet/virtual-kubelet/
| doctorpangloss wrote:
| There are no good ready made solutions.
|
| Someone has submitted patches to containerd and authored "rund"
| (d for darwin) to run HostProcess containers on macOS.
|
| The underlying problem is poorly familiarity with Kubernetes on
| Windows among Kubernetes maintainers and users. Windows is
| where all similar problems have been solved, but the journey is
| long.
| yjftsjthsd-h wrote:
| https://blog.darlinghq.org/2023/08/21/progress-report-q2-202...
| talks about running darling in flatpak, so it's not too much of
| a stretch to imagine it in a pod someday, but I don't think
| it's there today.
| dhess wrote:
| What performance numbers are you seeing on pods with Ceph PVs?
| e.g., what does `rados bench` give?
| MathiasPius wrote:
| I haven't had an excuse to test it yet, but since it's only 6
| OSDs across 3 nodes and all of them are spinning rust, I'd be
| surprised if performance was amazing.
|
| I'm definitely curious to find out though, so I'll run some
| tests and get back to you!
| MathiasPius wrote:
| I rand rados benchmarks and it seems writes are about 74MB/s,
| whereas both random and sequential reads are running at about
| 130MB/s, which is about wire speed given the 1Gbit/s NICs.
|
| Complete results are here:
| https://gist.github.com/MathiasPius/cda8ae32ebab031deb054054...
| mythz wrote:
| Thankfully we've never had the need for such complexity and are
| happy with our current GitHub Actions > Docker Compose > GCR >
| SSH solution [1] we're using to deploy 50+ Docker Containers.
|
| Requires no infrastructure dependencies, stateless deployment
| scripts checked into the same Repo as Project and after GitHub
| Organization is setup (4 secrets) and deployment server has
| Docker compose + nginx-proxy installed, deploying an App only
| requires 1 GitHub Action Secret, as such it doesn't get any
| simpler for us and we'll look to continue to use this approach
| for as long as we can.
|
| [1] https://servicestack.net/posts/kubernetes_not_required
| seabrookmx wrote:
| I used to do something similar at a previous company and this
| works well if you don't have to worry about scaling. YAGNI
| principal and all that. When you run hundreds of containers for
| different workloads, k8s bin packing and autoscaling (both on
| the pod and node level) tips the balance in my experience.
| mythz wrote:
| Yeah if we ever need to autoscale then I can see Kubernetes
| being useful, but I'd be surprised if this a problem most
| companies face.
|
| Even when working at StackOverflow (serving 1B+ pages, 55TB
| /mo [1]) did we need any autoscaling solution, it ran great
| on a handful of fixed servers. Although they were fairly
| beefy bare metal servers which I'd suspect would require
| significantly more VMs if it was to run on the Cloud.
|
| [1] https://stackexchange.com/performance
| swozey wrote:
| I was a k8s contrib since 2015, version 1.1. I even worked
| at Rancher and Google Cloud. If you don't need absolutely
| granular control over a PAAS/SAAS (complex networking w/
| circuit breaking yadda yadda, deep stack tracing, vms
| controlled by k8s (kubevirt etc), multi-tenancy in cpu or
| gpu) you don't need k8s and will absolutely flourish using
| a container solution like ECS. Use fargate and arm64
| containers and you will save an absolute fortune. I dropped
| our AWS bill from $350k/mo to around $250k converting our
| largest apps to arm from x86.
|
| GKE is IMO the best k8s solution PAAS wise that exists, but
| quite frankly few companies need that much control and
| granularity in their infrastructure.
|
| My entire infrastructure now is AWS ECS and it autoscales
| and I literally never, ever, ever have had to troubleshoot
| it outside of my own configuration mishaps. I NEVER get on
| call alerts. I'm the Staff SRE at my corp.
| MathiasPius wrote:
| I recently rebuilt my Kubernetes cluster running across three
| dedicated servers hosted by Hetzner and decided to document the
| process. It turned into a (so far) 8-part series covering
| everything from bootstrapping and firewalls to setting up
| persistent storage with Ceph.
|
| Part I: Talos on Hetzner https://datavirke.dk/posts/bare-metal-
| kubernetes-part-1-talo...
|
| Part II: Cilium CNI & Firewalls https://datavirke.dk/posts/bare-
| metal-kubernetes-part-2-cili...
|
| Part III: Encrypted GitOps with FluxCD
| https://datavirke.dk/posts/bare-metal-kubernetes-part-3-encr...
|
| Part IV: Ingress, DNS and Certificates
| https://datavirke.dk/posts/bare-metal-kubernetes-part-4-ingr...
|
| Part V: Scaling Out https://datavirke.dk/posts/bare-metal-
| kubernetes-part-5-scal...
|
| Part VI: Persistent Storage with Rook Ceph
| https://datavirke.dk/posts/bare-metal-kubernetes-part-6-pers...
|
| Part VII: Private Registry with Harbor
| https://datavirke.dk/posts/bare-metal-kubernetes-part-7-priv...
|
| Part VIII: Containerizing our Work Environment
| https://datavirke.dk/posts/bare-metal-kubernetes-part-8-cont...
|
| And of course, when it all falls apart: Bare-metal Kubernetes:
| First Incident https://datavirke.dk/posts/bare-metal-kubernetes-
| first-incid...
|
| Source code repository (set up in Part III) for node
| configuration and deployed services is available at
| https://github.com/MathiasPius/kronform
|
| While the documentation was initially intended more as a future
| reference for myself as well as a log of decisions made, and why
| I made them, I've received some really good feedback and ideas
| already, and figured it might be interesting to the hacker
| community :)
| baz00 wrote:
| Ah man just looking at that list makes me glad for EKS. But
| thanks for the effort, I will read to learn more.
| MathiasPius wrote:
| Absolutely! If at all possible, go managed, preferably with a
| cloud provider that handles all the hard things for you like
| load balancing and so on.
|
| *Sometimes* however, you want or need full control, either
| for compliance or economic reasons, and that's what I set out
| to explore :)
| js4ever wrote:
| Agreed, this is probably the best ad for managed k8s, this
| and horrors stories about self managed k8s clusters falling
| appart.
| AndrewKemendo wrote:
| Thank you for the amazing write up!
| cjr wrote:
| Great write up and what I especially enjoyed was how you kept
| the bits where you ran into the classic sort of issues,
| diagnosed them and fixed them. The flow felt very familiar to
| whenever I do anything dev-opsy.
|
| I'd be interested to read about how you might configure cluster
| auto scaling with bare metal machines. I noticed that the IP
| address of each node are kinda hard-coded into firewall and
| network policy rules, so that would have to be automated
| somehow. Similarly with automatically spawning a load-balancer
| from declaring a k8s Service. I realise these things are very
| cloud provider specific but would be interested to see if any
| folks are doing this with bare metal. For me, the ease of
| autoscaling is one of the primary benefits of k8s for my
| specific workload.
|
| I also just read about Sidero Omni [1] from the makers of Talos
| which looks like a Saas to install Talos/Kubernetes across any
| kind of hardware sourced from pretty much any provider -- cloud
| VM, bare metal etc. Perhaps it could make the initial bootstrap
| phase and future upgrades to these parts a little easier?
|
| [1]: https://www.siderolabs.com/platform/saas-for-kubernetes/
| MathiasPius wrote:
| When it comes to load balancing, I think the hcloud-cloud-
| controller-manager[1] is probably your best bet, and although
| I haven't tested it, I'm sure it can be coerced into some
| kind of working configuration with the vSwitch/Cloud Network
| coupling, even if none of cluster nodes are actually Cloud-
| based.
|
| I haven't used Sidero Omni yet, but if it's as well
| architected as Talos is, I'm sure it's an excellent solution.
| It still leaves open the question of ordering and
| provisioning the servers themselves. For simpler use-cases it
| wouldn't be too difficult to hack together a script to
| interact with the Hetzner Robot API to achieve this goal, but
| if I wanted any level of robustness, and if you'll excuse the
| shameless plug, I think I'd write a custom operator in Rust
| using my hrobot-rs[2] library :)
|
| As far as the hard-coded IP addresses goes, I think I would
| simply move that one rule into a separate
| ClusterWideNetworkPolicy which is created per-node during
| onboarding and deleted again after. The hard-coded IP
| addresses are only used _before_ the node is joined to the
| cluster, so technically the rule becomes obsoleted by the
| generic "remote-node" one immediately after joining the
| cluster.[3]
|
| [1] https://github.com/hetznercloud/hcloud-cloud-controller-
| mana...
|
| [2] https://github.com/MathiasPius/hrobot-rs
|
| [3] https://github.com/MathiasPius/kronform/blob/main/manifes
| ts/...
| smartbit wrote:
| Have you tried KubeOne? Also with the benefits of machine-
| deployments. Works like a charm, didn't go through your blogs,
| but KubeOne on Hetzner [0] seems easier than your deployment.
| And yes, also Open Source and German support available.
|
| [0]
| https://docs.kubermatic.com/kubeone/main/architecture/suppor...
| MathiasPius wrote:
| Hetzner Cloud is officially supported, but that means setting
| up VPSs in Hetzner's Cloud offering, whereas this project was
| intended as a more or less independent pure bare-metal
| cluster. I see they offer Bare Metal support as well, but I
| haven't dived too deep into it.
|
| I haven't used KubeOne, but I have previously used Syself's
| https://github.com/syself/cluster-api-provider-hetzner which
| I believe works in a similar fashion. I think the approach is
| very interesting and plays right into the Kubernetes Operator
| playbook and its self-healing ambitions.
|
| That being said, the complexity of the approach, probably in
| trying to span and resolve inconsistencies across such a wide
| landscape of providers, caused me quite a bit of grief. I
| eventually abandoned this approach after having some operator
| _somewhere_ consistently attempt and fail to spin up a
| secondary control plane VPS against my wishes. After poring
| over loads of documentation and half a dozen CRDs in an
| attempt to resolve it, I threw in my hat.
|
| Of course, Kubermatic is not Syself, and this was about a
| year ago, so it is entirely possible that both projects are
| absolutely superb solutions to the problem at this point.
| ralala wrote:
| Interesting read. I have just setup a very similar cluster this
| week: 3 node bare metal cluster in a 10G mesh network. Decided
| for Debian, RKE2, Calico and Longhorn. Encryption is done using
| LUKS FDE. For Load Balancing I am using the HCloud Load
| Balancer (in TCP mode). At first I had some problems with the
| mesh network as the CNI would only bind to a single interface.
| Finally solved it using a bridge, veth and isolated ports.
| fireflash38 wrote:
| Using containerd I assume? I've been trying to get RKE2 or
| k3s play nicely with CRI-O and it's been a long exercise in
| frustration.
| KyleSanderson wrote:
| which distro? it should just work out of the box.
| wg0 wrote:
| I've come to the conclusion (after trying kops, kubespray,
| kubeadm, kubeone, GKE, EKS) that if you're looking for < 100 node
| cluster, docker swarm should suffice. Easier to setup, maintain
| and upgrade.
|
| Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To
| some extent.
| nyljasdfw342 wrote:
| amount of nodes is a poor position to take... it should be the
| features and requirements you need for the cluster.
|
| If Docker Swarm satisfies, then yes.
| hn_user82179 wrote:
| I've been at a company running swarm in prod for a few years.
| There have been several nasty bugs that are fun to debug but
| we've accumulated several layers of slapped bandaids trying to
| handle swarm's deficiencies. I can't say I'd pick it again, nor
| would I recommend it for anyone else.
| linuxdude314 wrote:
| Node count driven infrastructure decisions make little sense.
|
| A better approach is to translate business requirements to
| systems capabilities and evaluate which tool best satisfies
| those requirements given the other constraints within your
| organization.
|
| Managed Kubernetes solutions like GKE require pretty minimal
| operational overhead at this point.
| MathiasPius wrote:
| I haven't had much opportunity to work with Docker Swarm, but
| the one time I did, we hit certificate expiration and other
| issues constantly, and it was not always obvious what was going
| on. It soured my perception of it a bit, but like I said I
| hadn't had much prior experience with it, so it might have been
| on me.
| stavros wrote:
| I was of the same opinion, so I rolled my own thin layer over
| Compose:
|
| https://harbormaster.readthedocs.io/
| GordonS wrote:
| This looks really nice, but the main feature of Docker Swarm
| rather than, Docker Compose, is the ability to run on a
| cluster of servers, not just a single node.
| stavros wrote:
| Ah, you're right, brain fart, sorry. Hm, I wonder how
| easily I could change Harbormaster to deploy on Swarm
| instead of using Compose...
| doctorpangloss wrote:
| Any sufficiently complicated Docker Swarm, Heroku, Elastic
| Beanstalk, Nomad or other program contains an ad hoc,
| informally-specified, bug-ridden, slow implementation of half
| of vanilla Kubernetes.
| ori_b wrote:
| Unfortunately, the above statement also applies to
| kubernetes.
| amazingman wrote:
| A pithy response to be sure, but is it true? Every
| Kubernetes object type exists within a well-specified
| hierarchy, has a well-specified specification, an API
| version, and documentation. Most of the object families'
| evolution are managed by a formal SIG. Not sure how any of
| that qualifies as ad-hoc or informal.
| ori_b wrote:
| "It's not a mess! it was designed by committee!"
|
| I'm not sure what to say here. The kubernetes docs and
| code speak for themselves. If you actually think that
| it's clean, simple, well designed, and easy to operate,
| with smooth interop between the parts, I can't change
| your mind. But in practice, I have found it very
| unpleasant. It seems this is common, and the usual
| suggestion is to pay someone else to operate it.
| amazingman wrote:
| First you were complaining that it was ad hoc and
| informal. Now you seem to be complaining that it's too
| formal and designed by committee.
|
| Also I never said Kubernetes was well-designed, easy, or
| simple.
| ori_b wrote:
| You say that as though bureaucracy is equivalent to
| formalism. It's not.
| mardifoufs wrote:
| Kubernetes is anything but adhoc. That's the best thing,
| but can also be the most annoying, part about it
| wg0 wrote:
| Most smaller teams do not need a full fledge kubernetes
| anyways.
|
| There's no one size fits all approach. There are trade offs.
| The Kubernetes tractor needs lots of oiling and what not for
| all the bells and whistles.
|
| Trade offs is the keyword here.
| blowski wrote:
| I agree in part - the features and simplicity of Docker Swarm
| are very appealing over k8s, but it also feels like so
| neglected that I'd be waiting every day for the EOL
| announcement.
| wg0 wrote:
| It's built from another separate project called swarm-kit. So
| if it comes to that where it is abandoned, the forks would be
| out in the wild soon enough.
|
| I see more risk of docker engine as a whole pulling some
| terraform/elastic search licensing someday as investors get
| desperate to cash out.
| linuxdude314 wrote:
| Docker is largely irrelevant in modern container
| orchestration platforms. Kubernetes dropped docker support
| as of 1.24 in favor of CRI-O.
|
| Docker is just one of many implantations of the Open
| Container Initiative (OCI) specifications. It's not even
| fully open source at this point.
|
| Under the hood Docker leverages containerd which in tern
| leverages runc which leverages libcontainer for spawning
| processes.
|
| Linux containers at this point will exist perfectly fine if
| Docker as a corporate entity disappears. The most impact
| that would be felt would be Dockerhub being shutdown.
|
| They also sort of already did pull something like Hashicorp
| with their Docker Desktop product for MacOS.
|
| That's a little different than if Docker disappeared
| completely, but one could easily switch to Podman (which
| has a superset of the docker syntax).
| yjftsjthsd-h wrote:
| > Docker is just one of many implantations of the Open
| Container Initiative (OCI) specifications. It's not even
| fully open source at this point.
|
| How so? I know Docker Desktop wraps its own stuff around
| docker, but AFAIK docker itself is FOSS.
| husarcik wrote:
| The docker swarm ecosystem is very poor as far as tooling goes.
| You're better off using docker-compose (? maybe docker swarm)
| and then migrating to k3s if you need a cluster.
|
| My docker swarm config files are nearly the same craziness as
| my k3s config files so I figured I might as well benefit from
| the tooling in Kubernetes.
|
| Edit for more random thoughts: being able to use helm to deploy
| services helped me switch to k3s from swarm.
| amazingman wrote:
| This is almost exactly my experience with Docker Compose,
| which is lionized by commenters in nearly every Kubernetes
| thread I read on HN. It's great and super simple and easy ...
| until you want to wire multiple applications together, you
| want to preserve state across workload lifecycles for
| stateful applications, and/or you need to stand up multiple
| configurations of the same application. The more you want to
| run applications that are part of a distributed system, the
| uglier your compose files get. Indeed, the original elegant
| Docker Compose syntax just couldn't do a bunch of things and
| had to be extended.
|
| IMO a sufficiently advanced Docker Compose stack is not
| appreciably simpler than the Kubernetes manifests would be,
| and you don't get the benefits of Kubernetes' objects and
| their controllers because Docker Compose is basically just
| stringing low-level concepts together with light automation.
| wg0 wrote:
| Then the Helm and layers of kustomize are not easy to
| reason with either.
|
| That's system configuration and that'll become tedious for
| sure.
| doctorpangloss wrote:
| Helm and Kustomize are low-budget custom resource
| definitions. They serve their purpose well and they have
| few limitations considering how much they can achieve
| before you write your own controllers.
|
| In my opinion, the complexity is symptomatic of success:
| once you make a piece of some kind of seemingly narrowly
| focused software that people actually use, you wind up
| also creating a platform, if not a platform-of-platforms,
| in order to satisfy growth. Kubernetes can scale for that
| business case in ways Docker Swarm, ELB, etc. do not.
|
| Is system configuration avoidable? In order to use AWS,
| you have to know how a VPC works. That is the worst kind
| of configuration. I suppose you can ignore that stuff for
| a very long time, you'll be paying ridiculous amounts of
| money for the privilege - almost the same in bandwidth
| costs, transiting NAT gateways and all your load
| balancers, whatever mistakes you made, as you do in
| compute usage. Once you learn that bullshit, you know,
| Kubernetes isn't so tedious after all.
| vbezhenar wrote:
| I didn't try anything but kubeadm and it worked just fine for
| me for my 1 node cluster.
| wg0 wrote:
| Besides my local cluster of virtual box cluster, I have tried
| Kubernetes on three clouds with at least a dozen different
| installers/distributions and operational pain would be a
| factor going forward has always been my gut feeling.
|
| That's where the author also has following to say:
|
| >My conclusion at this point is that if you can afford it,
| both in terms of privacy/GDPR and dollarinos then managed is
| the way to go.
|
| And I agree. Kubernetes managed is also really hard for those
| of offering it and have to manage it for you behind the
| scenes.[0]
|
| [0]. https://blog.dave.tf/post/new-kubernetes/
| KronisLV wrote:
| > I've come to the conclusion (after trying kops, kubespray,
| kubeadm, kubeone, GKE, EKS) that if you're looking for < 100
| node cluster, docker swarm should suffice. Easier to setup,
| maintain and upgrade.
|
| Personally, I'd also consider throwing Portainer in there,
| which gives you both a nice way to interact with the cluster,
| as well as things like webhooks: https://www.portainer.io/
|
| With something like Apache, Nginx, Caddy or something else
| acting as your "ingress" (taking care of TLS, reverse proxy,
| headers, rate limits, sometimes mTLS etc.) it's a surprisingly
| simple setup, at least for simple architectures.
|
| If/when you need to look past that, K3s is probably worth a
| look, as some other comments pointed out. Maybe some other of
| Rancher's offerings as well, depending on how you like to
| interact with clusters (the K9s tool is nice too).
| bionsystem wrote:
| When I was deploying swarm clusters I would have a default
| stack.yml file with portainer for admin, traefik for reverse-
| proxying, and prometheus, grafana, alertmanager, unsee,
| cadvisor, for monitoring and metrics gathering. All were
| running on their own docker network completely separated from
| the app and were only accessible by ops (and dev if
| requested, but not end users). It was quite easy to deploy
| with HEAT+ansible or terraform+ansible and the hard part was
| the ci/cd for every app each in its tenant, but it worked
| really really well.
| Patrickmi wrote:
| I was using docker swarm cause of the simplicity and easy setup
| but the one feature that I really really need was to be able to
| specify which runtime to use, either I use runsc (and docker
| plugins don't work with runsc) or runc as the default and it
| was too inefficient to have groups of node with certain
| runtime, I really do like swarm but it misses too much features
| that are important
| riku_iki wrote:
| > Docker swarm is to Kubernetes what SQLite is to PostgreSQL.
| To some extent.
|
| curious what do you mean? To me Postgresql doesn't have
| disadvantages over SQLite, everything is just better..
| mulmen wrote:
| PostgreSQL is more complex to use and operate and requires
| more setup than SQLite. If you don't need the capabilities of
| PostgreSQL then you can avoid paying the setup and
| maintenance costs by using the simpler SQLite.
| riku_iki wrote:
| In simplest case, you do sudo apt install ... in both
| cases, connect to database and do your work..
| mulmen wrote:
| [delayed]
| lemper wrote:
| I thought it was about talos the power9 system. intrigued by
| kubernetes on them.
| zkirill wrote:
| Me too. That would be very cool and I'm surprised nobody is
| offering this as a service.
| mkagenius wrote:
| From this, if people get the idea that they should get a Bare
| Metal on Hetzner and try. Don't. They will reject you probably,
| they are very picky.
|
| And if you are from a developing country like India, don't even
| think about it.
| wiktor-k wrote:
| Very nice write-up!
|
| I wonder if it's possible to combine the custom ISO with cloud
| init [0] to automate the initial node installation?
|
| [0]: https://github.com/tech-otaku/hetzner-cloud-init
| MathiasPius wrote:
| I believe the recommended[1] way to deploy Talos to Hetzner
| Cloud (not bare metal) is to use the rescue system and
| Hashicorp Packer to upload the Talos ISO, deploying your VPS
| using this image, and then configuring Talos using the standard
| bootstrapping procedure.
|
| This post series is specifically aimed at deploying a pure-
| metal cluster.
|
| [1] https://www.talos.dev/v1.5/talos-guides/install/cloud-
| platfo...
| wiktor-k wrote:
| Ah, I see. Thanks for the explanation!
| InvaderFizz wrote:
| I'm going through you series now. Very well done.
|
| I thought I would mention that age is now built in to SOPS, thus
| needs no external dependencies and is faster and easier than gpg.
| MathiasPius wrote:
| Have seen age pop up here and there, but haven't spent the
| cycles to see where it fits in yet, so I just went with what I
| knew.
|
| Will definitely take a look though, thanks!
| mulmen wrote:
| Just finished reading part one and wow, what an excellently
| written and presented post. This is exactly the series I needed
| to get started with Kubernetes in earnest. It's like it was
| written for me personally. Thanks for the submission MathiasPius!
___________________________________________________________________
(page generated 2023-09-09 23:01 UTC)