[HN Gopher] Kubernetes on Hetzner: cutting my infra bill by 75%
___________________________________________________________________
Kubernetes on Hetzner: cutting my infra bill by 75%
Author : BillFranklin
Score : 141 points
Date : 2024-12-01 15:43 UTC (7 hours ago)
(HTM) web link (bilbof.com)
(TXT) w3m dump (bilbof.com)
| dvfjsdhgfv wrote:
| > Hetzner volumes are, in my experience, too slow for a
| production database. While you may in the past have had a good
| experience running customer-facing databases on AWS EBS, with
| Hetzner's volumes we were seeing >50ms of IOWAIT with very low
| IOPS.
|
| There is a surprisingly easy way to address this issue: use
| (ridiculously cheap) Hetzner metal machines as nodes. The ones
| with nvme storage offer excellent performance for dbs and often
| have generous amounts of RAM. I'd go as far as to say you'd be
| better off to invest in two or more beefy bare metal machines for
| a master-replica(s) setup rather than run the db on k8s.
|
| If you don't want to be bothered with the setup, you can use one
| of many modern packages such as Pigsty: https://pigsty.cc/ (not
| affiliated but a huge fan).
| BillFranklin wrote:
| Thanks, hadn't heard of pigsty. As you say, I had to use nvme
| ssds for the dbs, the performance is pretty good so I didn't
| look to get metal nodes.
| threeseed wrote:
| There are plenty of options for running a database on
| Kubernetes whilst using local NVMe storage.
|
| There are just pinning the database pods to specific nodes and
| using a LocalPathProvisioner or distributed solutions like
| JuiceFS, OpenEBS etc.
| cjr wrote:
| What about cluster autoscaling?
| BillFranklin wrote:
| I didn't touch on that in the article, but essentially it's a
| one line change to add a worker node (or nodes) to the cluster,
| then it's automatically enrolled.
|
| We don't have such bursty requirements fortunately so I have
| not needed to automate this.
| postepowanieadm wrote:
| Lovely website.
| segmondy wrote:
| Great write up Bill!
| aliasxneo wrote:
| I'm planning on doing something similar but want to use Talos
| with bare metal machines. I suspect to see similar price
| reductions from our current EKS bill.
| threeseed wrote:
| Depending on your cluster size I highly recommend Omni:
| https://omni.siderolabs.com
|
| It took minutes to setup a cluster and I love having a UI to
| see what is happening.
|
| I wish there were more products like this as I suspect there
| will be a trend towards more self-managed Kubernetes clusters
| given how expensive the cloud is becoming.
| MathiasPius wrote:
| I set up a Talos bare metal cluster about a year ago, and
| documented the whole process on my website. Feel free to reach
| out if you have any questions!
| Volundr wrote:
| I haven't used it personally, but https://github.com/kube-
| hetzner/terraform-hcloud-kube-hetzne... looks amazing as a way to
| setup and manage kubernetes on Hetzner. At the moment I'm on
| Oracle free tier, but I keep thinking about switching to it to
| get off... Well Oracle.
| not_elodin wrote:
| I've used this to set up a cluster to host a dogfooded
| journalling site.
|
| In one evening I had a cluster working.
|
| It works pretty well. I had one small problem when the auto-
| update wouldn't run on arm nodes which stopped the single node
| I had running at that point (with the control plane taint
| blocking the update pod running on them).
| mkreis wrote:
| I'm running two clusters on it, on for production and one for
| dev. Works pretty good. With a schedule to reboot machines
| every sunday for automatic security updates (SuSE Micro OS).
| Also expanded machines for increased workloads. You have to
| make sure to inspect every change terraform wants to do, but
| then you're pretty save. The only downside is that every node
| needs a public IP, even though they are behind a firewall. But
| that is being worked on.
| maestrae wrote:
| i recently read an article about running k8s on the oracle free
| tier and was looking to try it. i'm curious, are there any
| specific pain points that are making you think of switching?
| chipdart wrote:
| I loved the article. Insightful, and packed with real world
| applications. What a gem.
|
| I have a side-question pertaining to cost-cutting with
| Kubernetes. I've been musing over the idea of setting up
| Kubernetes clusters similar to these ones but mixing on-premises
| nodes with nodes from the cloud provider. The setup would be
| something like:
|
| - vCPUs for bursty workloads,
|
| - bare metal nodes for the performance-oriented workloads
| required as base-loads,
|
| - on-premises nodes for spiky performance-oriented workloads, and
| dirt-cheap on-demand scaling.
|
| What I believe will be the primary unknown is egress costs.
|
| Has anyone ever toyed around with the idea?
| oblio wrote:
| I'm a bit sad the aggressive comment by the new account was
| deleted :-(
|
| The comment was making fun of the wishful thinking and the
| realities of networking.
|
| It was a funny comment :-(
| rad_gruchalski wrote:
| It wasn't funny. I can still see it. The answer was vpn. If
| you want to go fancy you can do istio with vms.
| ffsm8 wrote:
| And if you wanna be lazy, there is a tailscale integration
| to run the cluster communication over it.
|
| https://tailscale.com/kb/1236/kubernetes-operator
|
| They've even improved it, so you can now actually resolve
| the services etc via the tailnet dns
|
| https://tailscale.com/learn/managing-access-to-kubernetes-
| wi...
|
| I haven't tried that second part though, only read about
| it.
| rad_gruchalski wrote:
| Okay, vpn it is.
| ffsm8 wrote:
| I just wanted to provide the link in case someone was
| interested, I know you already mentioned it . * _ * .
|
| (Setting up a k8s cluster over software VPN was kinda
| annoying the last time I tried it manually, but super
| easy with the tailscale integration)
| juiyhtybr wrote:
| yes, like i said, throw an overlay on that motherfucker
| and ignore the fact that when a customer request enters
| the network it does so at the cloud provider, then is
| proxied off to the final destination, possibly with
| multiple hops along the way.
|
| you can't just slap an overlay on and expect everything
| to work in a reliable and performant manner. yes, it will
| work for your initial tests, but then shit gets real when
| you find that the route from datacenter a to datacenter b
| is asymmetric and/or shifts between providers, altering
| site to site performance on a regular basis.
|
| the concept of bursting into on-prem is the most
| offensive bit about the original comment. when your site
| traffic is at its highest, you're going to add an extra
| network hop and proxy into the mix with a subset of your
| traffic getting shipped off to another datacenter over
| internet quality links.
| rad_gruchalski wrote:
| Nobody said ,,do it guerilla-style". Put some thought
| into it.
| chipdart wrote:
| > yes, like i said, (...)
|
| I'm sorry, you said absolutely nothing. You just sounded
| like you were confused and for a moment thought you were
| posting on 4chan.
| bdcravens wrote:
| Enable "showdead" on your profile and you can see it.
| mhuffman wrote:
| For dedicated they say this:
|
| >All root servers have a dedicated 1 GBit uplink by default and
| with it unlimited traffic.
|
| >Inclusive monthly traffic for servers with 10G uplink is 20TB.
| There is no bandwidth limitation. We will charge EUR 1/TB for
| overusage.
|
| So it sounds like it depends. I have used them for (I'm
| guessing) 20 years and have never had a network problem with
| them or a surprise charge. Of course I mostly worked in the low
| double digit terabytes. But have had servers with them that
| handled millions of requests per day with zero problems.
| pdpi wrote:
| 1 / 8 * 3600 * 24 * 30 = 324000 so that 1GBit/s server could
| conceivably get 324TB of traffic per month "for free". It
| obviously won't, but even a tenth of data is more than the
| data included with the 10G link.
| jorams wrote:
| They do have a fair use policy on the 1GBit uplink. I know
| of one report[1] of someone using over 250TB per month
| getting an email telling them to reduce their traffic
| usage.
|
| The 10GBit uplink is something you need to explicitly
| request, and presumably it is more limited because if you
| go through the trouble of requesting it, you likely intend
| to saturate it fairly consistently, and that server's
| traffic usage is much more likely to be an outlier.
|
| [1]: https://lowendtalk.com/discussion/180504/hetzner-
| traffic-use...
| lyu07282 wrote:
| 20TB egress on AWS runs you almost $2,000 btw. one of the
| biggest benefits of Hetzner
| chipdart wrote:
| > We will charge EUR 1/TB for overusage.
|
| It sounds like a good tradeoff. The monthly cost of a small
| vCPU is equivalent to a few TB of bandwidth.
| threeseed wrote:
| > Has anyone ever toyed around with the idea?
|
| Sidero Omni have done this: https://omni.siderolabs.com
|
| They run a Wireguard network between the nodes so you can have
| a mix of on-premise and cloud within one cluster. Works really
| well but unfortunately is a commercial product with a pricing
| model that is a little inflexible.
|
| But at least it shows it's technically possible so maybe open
| source options exist.
| sneak wrote:
| Slack's Nebula does something similar, and it is open source.
| SOLAR_FIELDS wrote:
| You could make a mesh with something like Netmaker to achieve
| similar using FOSS. Note I haven't used Netmaker in years but
| I was able to achieve this in some of their earlier releases.
| I found it to be a bit buggy and unstable at the time due to
| it being such young software but it may have matured enough
| now that it could work in an enterprise grade setup.
|
| The sibling comments recommendation, Nebula, does something
| similar with a slightly different approach.
| chipdart wrote:
| > They run a Wireguard network between the nodes so you can
| have a mix of on-premise and cloud within one cluster.
|
| Interesting.
|
| A quick search shows that some people already toyed with the
| idea of rolling out something similar.
|
| https://github.com/ivanmorenoj/k8s-wireguard
| slillibri wrote:
| When I worked in web hosting (more than 10 years ago), we would
| constantly be blackholeing Hetzner IPs due to bad behavior. Same
| with every other budget/cheap vm provider. For us, it had nothing
| to do with geo databases, just behavior.
|
| You get what you pay for, and all that.
| oblio wrote:
| They could put the backend on Hetzner, if it makes sense (for
| example queues or batch processors).
| SoftTalker wrote:
| Yep I had the same problem years ago when I tried to use
| Mailgun's free tier. Not picking on them, I loved the features
| of their product but the free tier IPs had a horrble reputation
| and mail just would not get accepted especially by hotmail or
| yahoo.
|
| Any free hosting service will be overwhelmed by spammers and
| fraudsters. Cheap services the same but less so, and the more
| expensive they are the less they will be used for scams and
| spams.
| thwarted wrote:
| Tragedy of the Commons Ruins Everything Around Me.
| Keyframe wrote:
| depending on the prices, maybe a valid strategy would be to
| have servers at hetzner and then tunnel ingress/egress
| somewhere more prominent. Maybe adding the network traffic to
| the calculation still makes financial sense?
| hipadev23 wrote:
| Be careful with Hetzner, they null routed my game server on
| launch day due to false positives from their abuse system, and
| then took 3 days for their support team to re-enable traffic.
|
| By that point I had already moved to a different provider of
| course.
| teitoklien wrote:
| where did you move, asking to keep a list of options for my
| game servers, i'm using ovh game servers atm
| hipadev23 wrote:
| I went back to AWS. Expensive but reliable and support I can
| get ahold of. I'd still like to explore OVH someday though.
| ronsor wrote:
| Reading comments from the past few days makes it seem like
| dealing with Hetzner is a pain (and as far as I can tell, they
| aren't really that cheaper than the competitors).
| gurchik wrote:
| > (and as far as I can tell, they aren't really that cheaper
| than the competitors)
|
| Can you say more? Their Cloud instances, for example, are
| less than half the cost of OVH's, and less than a fifth of
| the cost of a comparable AWS EC2 instance.
| lurking_swe wrote:
| even free servers are of no use if it's not usable during a
| product launch. :) You get what you pay for i guess.
|
| But i do agree, it is much cheaper.
| jjeaff wrote:
| What competitors are similar to Hetzner in pricing? Last I
| checked, they seemed quite a bit cheaper than most.
| jgalt212 wrote:
| > they aren't really that cheaper than the competitors
|
| This is demonstrably false.
| victorbjorklund wrote:
| I don't think so. We see the outliers. Those happens at
| Linode, Digital Ocean, etc also. And yes even at Google Cloud
| and AWS you sometimes get either unlucky or unfairly treated.
| danpalmer wrote:
| Digital Ocean did this to my previous company. They said we'd
| been the target of a DOS attack (no evidence we could see).
| They re-enabled the traffic, then did it again the next day,
| and then again. When we asked them to stop doing that they said
| we should use Cloudflare to prevent DOS attacks... all the box
| did was store backups that we transferred over SSH. Nothing
| that could go behind Cloudflare, no web server running,
| literally only one port open.
| Scotrix wrote:
| Very nicely written article. I'm also running a k8s cluster but
| on bare metal and qemu-kvms for the base load. Wonder why you
| would chose VMs instead of bare metal if you looking for cost
| optimisation (additional overhead maybe?), could you share more
| about this or did I miss it?
| BillFranklin wrote:
| Thank you! The cloud servers are sufficiently cheap for us that
| we could afford the extra flexibility we get from them. Hetzner
| can move around VMs without us noticing but in contrast they
| are rebooting a number of metal machines for maintenance now
| and for the last little while, which would have been disruptive
| especially during the migration. I might have another look next
| year at metal but I'm happy with the cloud VMs currently.
| karussell wrote:
| Note, they usually do not reboot or touch your servers. But
| yes, the current maintenance of their metal routers (rare,
| like once every 2 years) requires you to juggle a bit with
| different machines in different datacenters.
| kakoni wrote:
| Anybody running k3s/k8s on Hetzner using cax servers? How's that
| working?
| MuffinFlavored wrote:
| https://github.com/puppetlabs/puppetlabs-kubernetes
|
| What do the fine people of HN think about the size/scope/amount
| of technology of this repo?
|
| It is referenced in the article here:
| https://github.com/puppetlabs/puppetlabs-kubernetes/compare/...
| ArtTimeInvestor wrote:
| Can anybody speak to the pros and cons of Hetzner vs OVH?
|
| There ain't many large European cloud companies, and I would like
| to understand how they differentiate.
|
| Ionos is another European one. Currently, it looks like their
| cloud business is stagnating, though.
| thenaturalist wrote:
| I'd say stay clear of Ionos.
|
| Bonkers first experience in the last two weeks.
|
| Graphical "Data center designer", no ability to open multiple
| tabs, instead always rerouting to the main landing page.
|
| Attached 3 IGWs to a box, all public IPs, GUI shows "no active
| firewall rules".
|
| IGW 1: 100% packet loss over 1 minute.
|
| IGW 2: 85% packet loss over 1 minute.
|
| IGW3: 95% packet loss over 1 minute.
|
| Turns out "no active Firewall rules" just wasn't the case and
| explicit whitelisting is absolutely required.
|
| But wait, there's more!
|
| Created a hosted PostgreSQL instance, assigned a private subnet
| for creation.
|
| SSH into my server, ping the URL of the created Postgres
| instance: The DB's IP is outside the CIDR range of the assigned
| subnet and unreachable.
|
| What?
|
| Deleted the instance, created another one, exact same settings.
| Worked this time around.
|
| Support quality also varies extremely.
|
| Out of 3 encounters, I had a competent person once.
|
| Other two straight out said they have no idea what's going on.
| usrme wrote:
| This is probably out of left field, but what is the benefit of
| having a naming scheme for nodes without any delimiters? Reading
| at a glance and not knowing the region name convention of a given
| provider (i.e. Hetzner), I'm at a loss to quickly decipher the
| "<region><zone><environment><role><number>" to "euc1pmgr1". I
| feel like I'm missing something because having delimiters would
| make all sorts of automated parsing much easier.
| BillFranklin wrote:
| Quicker to type and scan! Though I admit this is preference,
| delimiters would work fine too.
|
| Parsing works the same but is based on a simple regex rather
| than splitting on a hyphen.
|
| euc=eu central; 1=zone/dc; p=production; wkr=worker; 1=node id
| usrme wrote:
| Thanks for getting back to me! Now that you've written it
| out, it's plainly obvious, but for me the readability and
| flexibility of delimiters beats the speed of typing and
| scanning. Many a times I've been grateful that I added
| delimiters because then I was no longer be hamstrung by any
| potential changes to the length of any particular segment
| within the name.
| o11c wrote:
| You can treat the numeric parts as self-delimiting ... that
| leaves only the assumption that "environment" is a single
| letter.
| devops000 wrote:
| Did you try Cloud66 for deploy?
| aravindputrevu wrote:
| Do you know that they are cutting their free tier bandwidth? Did
| not read too much into it, but heard a few friends were worried
| about.
|
| End of they day, they are a business!
| tutfbhuf wrote:
| I have experience running Kubernetes clusters on Hetzner
| dedicated servers, as well as working with a range of fully or
| highly managed services like Aurora, S3, and ECS Fargate.
|
| From my experience, the cloud bill on Hetzner can sometimes be as
| low as 20% of an equivalent AWS bill. However, this cost
| advantage comes with significant trade-offs.
|
| On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe
| storage, MariaDB operators, Cilium for networking, and ArgoCD for
| deploying Helm charts. We had to handle Kubernetes cluster
| updates ourselves, which included facing a complete cluster
| failure at one point. We also encountered various bugs in both
| Kubernetes and Ceph, many of which were documented in GitHub
| issues and Ceph trackers. The list of tasks to manage and monitor
| was endless. Depending on the number of workloads and the overall
| complexity of the environment, maintaining such a setup can
| quickly become a full-time job for a DevOps team.
|
| In contrast, using AWS or other major cloud providers allows for
| a more hands-off setup. With managed services, maintenance often
| requires significantly less effort, reducing the operational
| burden on your team.
|
| In essence, with AWS, your DevOps workload is reduced by a
| significant factor, while on Hetzner, your cloud bill is
| significantly lower.
|
| Determining which option is more cost-effective requires a
| thorough TCO (Total Cost of Ownership) analysis. While Hetzner
| may seem cheaper upfront, the additional hours required for
| DevOps work can offset those savings.
| spwa4 wrote:
| > Determining which option is more cost-effective requires a
| thorough TCO (Total Cost of Ownership) analysis. While Hetzner
| may seem cheaper upfront, the additional hours required for
| DevOps work can offset those savings.
|
| Sure, but the TLDR is going to be that if you employ n or more
| sysadmins, the cost savings will dominate. With 2 < n < 7. So
| for a given company size, Hetzner will start being cheaper at
| some point, and it will become more extreme the bigger you go.
|
| Second if you have a "big" cost, whatever it is, bandwidth,
| disk space (essentially anything but compute), cost savings
| will dominate faster.
| jonas21 wrote:
| This is an interesting writeup, but I feel like it's missing a
| description of the cluster and the workload that's running on it.
|
| How many nodes are there, how much traffic does it receive, what
| are the uptime and latency requirements?
|
| And what's the absolute cost savings? Saving 75% of $100K/mo is
| very different from saving 75% of $100/mo.
| sureglymop wrote:
| I went hetzner baremetal, set up a proxmox cluster over it and
| then have kubernetes on top. Gives me a lot of flexibility I
| find.
| Iwan-Zotow wrote:
| this is good
|
| well, running on bare metal would be even better
___________________________________________________________________
(page generated 2024-12-01 23:01 UTC)