[HN Gopher] Kubernetes on Hetzner: cutting my infra bill by 75%
       ___________________________________________________________________
        
       Kubernetes on Hetzner: cutting my infra bill by 75%
        
       Author : BillFranklin
       Score  : 141 points
       Date   : 2024-12-01 15:43 UTC (7 hours ago)
        
 (HTM) web link (bilbof.com)
 (TXT) w3m dump (bilbof.com)
        
       | dvfjsdhgfv wrote:
       | > Hetzner volumes are, in my experience, too slow for a
       | production database. While you may in the past have had a good
       | experience running customer-facing databases on AWS EBS, with
       | Hetzner's volumes we were seeing >50ms of IOWAIT with very low
       | IOPS.
       | 
       | There is a surprisingly easy way to address this issue: use
       | (ridiculously cheap) Hetzner metal machines as nodes. The ones
       | with nvme storage offer excellent performance for dbs and often
       | have generous amounts of RAM. I'd go as far as to say you'd be
       | better off to invest in two or more beefy bare metal machines for
       | a master-replica(s) setup rather than run the db on k8s.
       | 
       | If you don't want to be bothered with the setup, you can use one
       | of many modern packages such as Pigsty: https://pigsty.cc/ (not
       | affiliated but a huge fan).
        
         | BillFranklin wrote:
         | Thanks, hadn't heard of pigsty. As you say, I had to use nvme
         | ssds for the dbs, the performance is pretty good so I didn't
         | look to get metal nodes.
        
         | threeseed wrote:
         | There are plenty of options for running a database on
         | Kubernetes whilst using local NVMe storage.
         | 
         | There are just pinning the database pods to specific nodes and
         | using a LocalPathProvisioner or distributed solutions like
         | JuiceFS, OpenEBS etc.
        
       | cjr wrote:
       | What about cluster autoscaling?
        
         | BillFranklin wrote:
         | I didn't touch on that in the article, but essentially it's a
         | one line change to add a worker node (or nodes) to the cluster,
         | then it's automatically enrolled.
         | 
         | We don't have such bursty requirements fortunately so I have
         | not needed to automate this.
        
       | postepowanieadm wrote:
       | Lovely website.
        
       | segmondy wrote:
       | Great write up Bill!
        
       | aliasxneo wrote:
       | I'm planning on doing something similar but want to use Talos
       | with bare metal machines. I suspect to see similar price
       | reductions from our current EKS bill.
        
         | threeseed wrote:
         | Depending on your cluster size I highly recommend Omni:
         | https://omni.siderolabs.com
         | 
         | It took minutes to setup a cluster and I love having a UI to
         | see what is happening.
         | 
         | I wish there were more products like this as I suspect there
         | will be a trend towards more self-managed Kubernetes clusters
         | given how expensive the cloud is becoming.
        
         | MathiasPius wrote:
         | I set up a Talos bare metal cluster about a year ago, and
         | documented the whole process on my website. Feel free to reach
         | out if you have any questions!
        
       | Volundr wrote:
       | I haven't used it personally, but https://github.com/kube-
       | hetzner/terraform-hcloud-kube-hetzne... looks amazing as a way to
       | setup and manage kubernetes on Hetzner. At the moment I'm on
       | Oracle free tier, but I keep thinking about switching to it to
       | get off... Well Oracle.
        
         | not_elodin wrote:
         | I've used this to set up a cluster to host a dogfooded
         | journalling site.
         | 
         | In one evening I had a cluster working.
         | 
         | It works pretty well. I had one small problem when the auto-
         | update wouldn't run on arm nodes which stopped the single node
         | I had running at that point (with the control plane taint
         | blocking the update pod running on them).
        
         | mkreis wrote:
         | I'm running two clusters on it, on for production and one for
         | dev. Works pretty good. With a schedule to reboot machines
         | every sunday for automatic security updates (SuSE Micro OS).
         | Also expanded machines for increased workloads. You have to
         | make sure to inspect every change terraform wants to do, but
         | then you're pretty save. The only downside is that every node
         | needs a public IP, even though they are behind a firewall. But
         | that is being worked on.
        
         | maestrae wrote:
         | i recently read an article about running k8s on the oracle free
         | tier and was looking to try it. i'm curious, are there any
         | specific pain points that are making you think of switching?
        
       | chipdart wrote:
       | I loved the article. Insightful, and packed with real world
       | applications. What a gem.
       | 
       | I have a side-question pertaining to cost-cutting with
       | Kubernetes. I've been musing over the idea of setting up
       | Kubernetes clusters similar to these ones but mixing on-premises
       | nodes with nodes from the cloud provider. The setup would be
       | something like:
       | 
       | - vCPUs for bursty workloads,
       | 
       | - bare metal nodes for the performance-oriented workloads
       | required as base-loads,
       | 
       | - on-premises nodes for spiky performance-oriented workloads, and
       | dirt-cheap on-demand scaling.
       | 
       | What I believe will be the primary unknown is egress costs.
       | 
       | Has anyone ever toyed around with the idea?
        
         | oblio wrote:
         | I'm a bit sad the aggressive comment by the new account was
         | deleted :-(
         | 
         | The comment was making fun of the wishful thinking and the
         | realities of networking.
         | 
         | It was a funny comment :-(
        
           | rad_gruchalski wrote:
           | It wasn't funny. I can still see it. The answer was vpn. If
           | you want to go fancy you can do istio with vms.
        
             | ffsm8 wrote:
             | And if you wanna be lazy, there is a tailscale integration
             | to run the cluster communication over it.
             | 
             | https://tailscale.com/kb/1236/kubernetes-operator
             | 
             | They've even improved it, so you can now actually resolve
             | the services etc via the tailnet dns
             | 
             | https://tailscale.com/learn/managing-access-to-kubernetes-
             | wi...
             | 
             | I haven't tried that second part though, only read about
             | it.
        
               | rad_gruchalski wrote:
               | Okay, vpn it is.
        
               | ffsm8 wrote:
               | I just wanted to provide the link in case someone was
               | interested, I know you already mentioned it . * _ * .
               | 
               | (Setting up a k8s cluster over software VPN was kinda
               | annoying the last time I tried it manually, but super
               | easy with the tailscale integration)
        
               | juiyhtybr wrote:
               | yes, like i said, throw an overlay on that motherfucker
               | and ignore the fact that when a customer request enters
               | the network it does so at the cloud provider, then is
               | proxied off to the final destination, possibly with
               | multiple hops along the way.
               | 
               | you can't just slap an overlay on and expect everything
               | to work in a reliable and performant manner. yes, it will
               | work for your initial tests, but then shit gets real when
               | you find that the route from datacenter a to datacenter b
               | is asymmetric and/or shifts between providers, altering
               | site to site performance on a regular basis.
               | 
               | the concept of bursting into on-prem is the most
               | offensive bit about the original comment. when your site
               | traffic is at its highest, you're going to add an extra
               | network hop and proxy into the mix with a subset of your
               | traffic getting shipped off to another datacenter over
               | internet quality links.
        
               | rad_gruchalski wrote:
               | Nobody said ,,do it guerilla-style". Put some thought
               | into it.
        
               | chipdart wrote:
               | > yes, like i said, (...)
               | 
               | I'm sorry, you said absolutely nothing. You just sounded
               | like you were confused and for a moment thought you were
               | posting on 4chan.
        
           | bdcravens wrote:
           | Enable "showdead" on your profile and you can see it.
        
         | mhuffman wrote:
         | For dedicated they say this:
         | 
         | >All root servers have a dedicated 1 GBit uplink by default and
         | with it unlimited traffic.
         | 
         | >Inclusive monthly traffic for servers with 10G uplink is 20TB.
         | There is no bandwidth limitation. We will charge EUR 1/TB for
         | overusage.
         | 
         | So it sounds like it depends. I have used them for (I'm
         | guessing) 20 years and have never had a network problem with
         | them or a surprise charge. Of course I mostly worked in the low
         | double digit terabytes. But have had servers with them that
         | handled millions of requests per day with zero problems.
        
           | pdpi wrote:
           | 1 / 8 * 3600 * 24 * 30 = 324000 so that 1GBit/s server could
           | conceivably get 324TB of traffic per month "for free". It
           | obviously won't, but even a tenth of data is more than the
           | data included with the 10G link.
        
             | jorams wrote:
             | They do have a fair use policy on the 1GBit uplink. I know
             | of one report[1] of someone using over 250TB per month
             | getting an email telling them to reduce their traffic
             | usage.
             | 
             | The 10GBit uplink is something you need to explicitly
             | request, and presumably it is more limited because if you
             | go through the trouble of requesting it, you likely intend
             | to saturate it fairly consistently, and that server's
             | traffic usage is much more likely to be an outlier.
             | 
             | [1]: https://lowendtalk.com/discussion/180504/hetzner-
             | traffic-use...
        
           | lyu07282 wrote:
           | 20TB egress on AWS runs you almost $2,000 btw. one of the
           | biggest benefits of Hetzner
        
           | chipdart wrote:
           | > We will charge EUR 1/TB for overusage.
           | 
           | It sounds like a good tradeoff. The monthly cost of a small
           | vCPU is equivalent to a few TB of bandwidth.
        
         | threeseed wrote:
         | > Has anyone ever toyed around with the idea?
         | 
         | Sidero Omni have done this: https://omni.siderolabs.com
         | 
         | They run a Wireguard network between the nodes so you can have
         | a mix of on-premise and cloud within one cluster. Works really
         | well but unfortunately is a commercial product with a pricing
         | model that is a little inflexible.
         | 
         | But at least it shows it's technically possible so maybe open
         | source options exist.
        
           | sneak wrote:
           | Slack's Nebula does something similar, and it is open source.
        
           | SOLAR_FIELDS wrote:
           | You could make a mesh with something like Netmaker to achieve
           | similar using FOSS. Note I haven't used Netmaker in years but
           | I was able to achieve this in some of their earlier releases.
           | I found it to be a bit buggy and unstable at the time due to
           | it being such young software but it may have matured enough
           | now that it could work in an enterprise grade setup.
           | 
           | The sibling comments recommendation, Nebula, does something
           | similar with a slightly different approach.
        
           | chipdart wrote:
           | > They run a Wireguard network between the nodes so you can
           | have a mix of on-premise and cloud within one cluster.
           | 
           | Interesting.
           | 
           | A quick search shows that some people already toyed with the
           | idea of rolling out something similar.
           | 
           | https://github.com/ivanmorenoj/k8s-wireguard
        
       | slillibri wrote:
       | When I worked in web hosting (more than 10 years ago), we would
       | constantly be blackholeing Hetzner IPs due to bad behavior. Same
       | with every other budget/cheap vm provider. For us, it had nothing
       | to do with geo databases, just behavior.
       | 
       | You get what you pay for, and all that.
        
         | oblio wrote:
         | They could put the backend on Hetzner, if it makes sense (for
         | example queues or batch processors).
        
         | SoftTalker wrote:
         | Yep I had the same problem years ago when I tried to use
         | Mailgun's free tier. Not picking on them, I loved the features
         | of their product but the free tier IPs had a horrble reputation
         | and mail just would not get accepted especially by hotmail or
         | yahoo.
         | 
         | Any free hosting service will be overwhelmed by spammers and
         | fraudsters. Cheap services the same but less so, and the more
         | expensive they are the less they will be used for scams and
         | spams.
        
           | thwarted wrote:
           | Tragedy of the Commons Ruins Everything Around Me.
        
         | Keyframe wrote:
         | depending on the prices, maybe a valid strategy would be to
         | have servers at hetzner and then tunnel ingress/egress
         | somewhere more prominent. Maybe adding the network traffic to
         | the calculation still makes financial sense?
        
       | hipadev23 wrote:
       | Be careful with Hetzner, they null routed my game server on
       | launch day due to false positives from their abuse system, and
       | then took 3 days for their support team to re-enable traffic.
       | 
       | By that point I had already moved to a different provider of
       | course.
        
         | teitoklien wrote:
         | where did you move, asking to keep a list of options for my
         | game servers, i'm using ovh game servers atm
        
           | hipadev23 wrote:
           | I went back to AWS. Expensive but reliable and support I can
           | get ahold of. I'd still like to explore OVH someday though.
        
         | ronsor wrote:
         | Reading comments from the past few days makes it seem like
         | dealing with Hetzner is a pain (and as far as I can tell, they
         | aren't really that cheaper than the competitors).
        
           | gurchik wrote:
           | > (and as far as I can tell, they aren't really that cheaper
           | than the competitors)
           | 
           | Can you say more? Their Cloud instances, for example, are
           | less than half the cost of OVH's, and less than a fifth of
           | the cost of a comparable AWS EC2 instance.
        
             | lurking_swe wrote:
             | even free servers are of no use if it's not usable during a
             | product launch. :) You get what you pay for i guess.
             | 
             | But i do agree, it is much cheaper.
        
           | jjeaff wrote:
           | What competitors are similar to Hetzner in pricing? Last I
           | checked, they seemed quite a bit cheaper than most.
        
           | jgalt212 wrote:
           | > they aren't really that cheaper than the competitors
           | 
           | This is demonstrably false.
        
           | victorbjorklund wrote:
           | I don't think so. We see the outliers. Those happens at
           | Linode, Digital Ocean, etc also. And yes even at Google Cloud
           | and AWS you sometimes get either unlucky or unfairly treated.
        
         | danpalmer wrote:
         | Digital Ocean did this to my previous company. They said we'd
         | been the target of a DOS attack (no evidence we could see).
         | They re-enabled the traffic, then did it again the next day,
         | and then again. When we asked them to stop doing that they said
         | we should use Cloudflare to prevent DOS attacks... all the box
         | did was store backups that we transferred over SSH. Nothing
         | that could go behind Cloudflare, no web server running,
         | literally only one port open.
        
       | Scotrix wrote:
       | Very nicely written article. I'm also running a k8s cluster but
       | on bare metal and qemu-kvms for the base load. Wonder why you
       | would chose VMs instead of bare metal if you looking for cost
       | optimisation (additional overhead maybe?), could you share more
       | about this or did I miss it?
        
         | BillFranklin wrote:
         | Thank you! The cloud servers are sufficiently cheap for us that
         | we could afford the extra flexibility we get from them. Hetzner
         | can move around VMs without us noticing but in contrast they
         | are rebooting a number of metal machines for maintenance now
         | and for the last little while, which would have been disruptive
         | especially during the migration. I might have another look next
         | year at metal but I'm happy with the cloud VMs currently.
        
           | karussell wrote:
           | Note, they usually do not reboot or touch your servers. But
           | yes, the current maintenance of their metal routers (rare,
           | like once every 2 years) requires you to juggle a bit with
           | different machines in different datacenters.
        
       | kakoni wrote:
       | Anybody running k3s/k8s on Hetzner using cax servers? How's that
       | working?
        
       | MuffinFlavored wrote:
       | https://github.com/puppetlabs/puppetlabs-kubernetes
       | 
       | What do the fine people of HN think about the size/scope/amount
       | of technology of this repo?
       | 
       | It is referenced in the article here:
       | https://github.com/puppetlabs/puppetlabs-kubernetes/compare/...
        
       | ArtTimeInvestor wrote:
       | Can anybody speak to the pros and cons of Hetzner vs OVH?
       | 
       | There ain't many large European cloud companies, and I would like
       | to understand how they differentiate.
       | 
       | Ionos is another European one. Currently, it looks like their
       | cloud business is stagnating, though.
        
         | thenaturalist wrote:
         | I'd say stay clear of Ionos.
         | 
         | Bonkers first experience in the last two weeks.
         | 
         | Graphical "Data center designer", no ability to open multiple
         | tabs, instead always rerouting to the main landing page.
         | 
         | Attached 3 IGWs to a box, all public IPs, GUI shows "no active
         | firewall rules".
         | 
         | IGW 1: 100% packet loss over 1 minute.
         | 
         | IGW 2: 85% packet loss over 1 minute.
         | 
         | IGW3: 95% packet loss over 1 minute.
         | 
         | Turns out "no active Firewall rules" just wasn't the case and
         | explicit whitelisting is absolutely required.
         | 
         | But wait, there's more!
         | 
         | Created a hosted PostgreSQL instance, assigned a private subnet
         | for creation.
         | 
         | SSH into my server, ping the URL of the created Postgres
         | instance: The DB's IP is outside the CIDR range of the assigned
         | subnet and unreachable.
         | 
         | What?
         | 
         | Deleted the instance, created another one, exact same settings.
         | Worked this time around.
         | 
         | Support quality also varies extremely.
         | 
         | Out of 3 encounters, I had a competent person once.
         | 
         | Other two straight out said they have no idea what's going on.
        
       | usrme wrote:
       | This is probably out of left field, but what is the benefit of
       | having a naming scheme for nodes without any delimiters? Reading
       | at a glance and not knowing the region name convention of a given
       | provider (i.e. Hetzner), I'm at a loss to quickly decipher the
       | "<region><zone><environment><role><number>" to "euc1pmgr1". I
       | feel like I'm missing something because having delimiters would
       | make all sorts of automated parsing much easier.
        
         | BillFranklin wrote:
         | Quicker to type and scan! Though I admit this is preference,
         | delimiters would work fine too.
         | 
         | Parsing works the same but is based on a simple regex rather
         | than splitting on a hyphen.
         | 
         | euc=eu central; 1=zone/dc; p=production; wkr=worker; 1=node id
        
           | usrme wrote:
           | Thanks for getting back to me! Now that you've written it
           | out, it's plainly obvious, but for me the readability and
           | flexibility of delimiters beats the speed of typing and
           | scanning. Many a times I've been grateful that I added
           | delimiters because then I was no longer be hamstrung by any
           | potential changes to the length of any particular segment
           | within the name.
        
         | o11c wrote:
         | You can treat the numeric parts as self-delimiting ... that
         | leaves only the assumption that "environment" is a single
         | letter.
        
       | devops000 wrote:
       | Did you try Cloud66 for deploy?
        
       | aravindputrevu wrote:
       | Do you know that they are cutting their free tier bandwidth? Did
       | not read too much into it, but heard a few friends were worried
       | about.
       | 
       | End of they day, they are a business!
        
       | tutfbhuf wrote:
       | I have experience running Kubernetes clusters on Hetzner
       | dedicated servers, as well as working with a range of fully or
       | highly managed services like Aurora, S3, and ECS Fargate.
       | 
       | From my experience, the cloud bill on Hetzner can sometimes be as
       | low as 20% of an equivalent AWS bill. However, this cost
       | advantage comes with significant trade-offs.
       | 
       | On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe
       | storage, MariaDB operators, Cilium for networking, and ArgoCD for
       | deploying Helm charts. We had to handle Kubernetes cluster
       | updates ourselves, which included facing a complete cluster
       | failure at one point. We also encountered various bugs in both
       | Kubernetes and Ceph, many of which were documented in GitHub
       | issues and Ceph trackers. The list of tasks to manage and monitor
       | was endless. Depending on the number of workloads and the overall
       | complexity of the environment, maintaining such a setup can
       | quickly become a full-time job for a DevOps team.
       | 
       | In contrast, using AWS or other major cloud providers allows for
       | a more hands-off setup. With managed services, maintenance often
       | requires significantly less effort, reducing the operational
       | burden on your team.
       | 
       | In essence, with AWS, your DevOps workload is reduced by a
       | significant factor, while on Hetzner, your cloud bill is
       | significantly lower.
       | 
       | Determining which option is more cost-effective requires a
       | thorough TCO (Total Cost of Ownership) analysis. While Hetzner
       | may seem cheaper upfront, the additional hours required for
       | DevOps work can offset those savings.
        
         | spwa4 wrote:
         | > Determining which option is more cost-effective requires a
         | thorough TCO (Total Cost of Ownership) analysis. While Hetzner
         | may seem cheaper upfront, the additional hours required for
         | DevOps work can offset those savings.
         | 
         | Sure, but the TLDR is going to be that if you employ n or more
         | sysadmins, the cost savings will dominate. With 2 < n < 7. So
         | for a given company size, Hetzner will start being cheaper at
         | some point, and it will become more extreme the bigger you go.
         | 
         | Second if you have a "big" cost, whatever it is, bandwidth,
         | disk space (essentially anything but compute), cost savings
         | will dominate faster.
        
       | jonas21 wrote:
       | This is an interesting writeup, but I feel like it's missing a
       | description of the cluster and the workload that's running on it.
       | 
       | How many nodes are there, how much traffic does it receive, what
       | are the uptime and latency requirements?
       | 
       | And what's the absolute cost savings? Saving 75% of $100K/mo is
       | very different from saving 75% of $100/mo.
        
       | sureglymop wrote:
       | I went hetzner baremetal, set up a proxmox cluster over it and
       | then have kubernetes on top. Gives me a lot of flexibility I
       | find.
        
       | Iwan-Zotow wrote:
       | this is good
       | 
       | well, running on bare metal would be even better
        
       ___________________________________________________________________
       (page generated 2024-12-01 23:01 UTC)