[HN Gopher] How I learned to stop worrying and love userspace ne...
___________________________________________________________________
How I learned to stop worrying and love userspace networking
Author : todsacerdoti
Score : 131 points
Date : 2024-08-29 13:06 UTC (9 hours ago)
(HTM) web link (friendshipcastle.zip)
(TXT) w3m dump (friendshipcastle.zip)
| jauntywundrkind wrote:
| Really love seeing such a straightforward example, of starting
| with some desire or need, and ending up DIY'ing ones own
| operator.
|
| A lot of the pushback against Kubernetes revolves around whether
| you 'rewlly need it' or whether to do something else. Seeing
| someone go past running containers like this highlights the
| extensibility, shows the core of Kubernetes as a pattern &
| paradigm for building any kind of platform.
|
| It's neat seeing that done so quickly & readily here. That we can
| add and manage anything, consistently, quickly, is a promise that
| I love seeing fulfilled.
| Etheryte wrote:
| While I see what you're getting at, I find the comment funny
| after the first thing the article leads with is that certain
| things are a pain in Kubernetes.
| beeboobaa3 wrote:
| It's a pain because kubernetes is designed to run multiple
| workloads on multiple servers. So if you want to access the
| VPN from _some_ kubernetes containers you 're going to have
| to figure something out.
|
| But nothing is stopping you from just joining all your hosts
| into the VPN, just like a traditional deployment. Or set it
| up on your network gateway. This would make it available to
| all of your containers. Great. You're done.
|
| But if that's not what you want, you'll need to figure
| something out.
| Etheryte wrote:
| I'm not sure if I follow. You said Kubernetes is great for
| building any kind of a platform, but then when someone
| wants controlled access to a VPN it suddenly turns into a
| no, not like that? Giving only certain parts of your
| architecture access to certain capabilities is far from a
| niche use case.
| themgt wrote:
| _Giving only certain parts of your architecture access to
| certain capabilities is far from a niche use case_
|
| What is the "normal" best practice here then? I would
| just spin up multiple single-node k3s VM clusters and
| hook the AI k3s VM to the VPN and the others not.
| nonameiguess wrote:
| Sort of. What the author is trying to do is actually quite
| easy if the entire setup had been Kubernetes. k3s allows you
| to connect nodes to your cluster over a Tailscale VPN. If you
| do that, you don't need to expose your remote AI server to
| any network at all except the Kubernetes internal network.
| I'm guessing fly.io doesn't just give you bare servers you
| can run k3s on, though. The only real difficulty here is
| hooking up totally different abstraction engines that aren't
| designed to work with each other. Putting the blame
| specifically on one of them and not the other doesn't make
| sense.
| candiddevmike wrote:
| At its heart, Kubernetes is a workload scheduler, and a fairly
| opinionated one at that. IMO, it's too complicated for folks
| that just need an "extensible platform" when all they really
| need is a dumb scheduler API without all the Kubernetes bells
| and whistles that have been bolted on over the years.
|
| Kubernetes tries to be everything to everyone and makes the
| entire thing too complicated. Seeing Kubernetes broken up into
| smaller, more purpose built components and let folks pick and
| choose or swap could be helpful, at the risk of it becoming
| OpenStack.
|
| There's an alternate reality where Fleet is the defacto
| scheduler and cats and dogs live together in harmony.
| mrgaro wrote:
| I'd say the opposite instead: we need Kubernetes
| distributions, just like Linux needs distributions. Nobody
| wants to build their kernel from scratch and to hand pick
| various user space programs.
|
| Same for Kubernetes: Distributions which pack everything you
| need in an opinionated way, so that it's easy to use. Now
| it's kinda build-your-own-kubernetes at every platform:
| kubeadm, EKS etc all require you to install various add-on
| components before you have a fully suitable cluster.
| p-o wrote:
| I think the Operator pattern will grow to become exactly
| what you describe. We're still at the early stage of that,
| but I can see that a group of operators could become a
| "distribution", in your example.
| koito17 wrote:
| > At its heart, Kubernetes is a workload scheduler
|
| Wouldn't it be more accurate to state Kubernetes at its core
| is a distributed control plane?
|
| You want something that inspects the state of a distributed
| system and mutates it as necessary to reach some "desired
| state". That's the control plane bit. For fault tolerance,
| the control plane itself needs to be distributed. This is the
| exact reason for Kubernetes' existence.
| treflop wrote:
| What OP did could be done with any containerization platform
| that abstracted away networking, which should be all of them.
|
| Personally I think you should always be using containerization,
| because even single node Docker is easy. If you are running
| something for real, then definitely use Kubernetes.
|
| If you are using containerization, setting up Tailscale is
| trivial.
|
| Whenever you abstract away something, you can swap out core
| components like networking willy nilly. Of course, you can
| always over-abstract, but a healthy amount is wonderful for
| most non-very high performance use cases.
| sigmonsays wrote:
| "how I learned to ____ and love ____" titles seem so
| editorialized I don't want to read the article but it ended up
| being interesting.
|
| however, i'm not sure yet how it's useful outside of fly.io. It
| seems odd to say "I like to self host" and then yield to fly.io.
| ElevenLathe wrote:
| If you weren't aware of the reference, this comes from the full
| title of the 1964 movie _Dr. Strangelove_:
| https://en.wikipedia.org/wiki/Dr._Strangelove
| cpach wrote:
| For those that haven't seen it: Do yourself a favor and watch
| it. At least two times. It's such a strange (haha) movie.
| Quite unique IMO, can't think of any other movie that is
| similar.
| xiande04 wrote:
| The first blank is "stop worrying". It's always "stop
| worrying".
| tptacek wrote:
| I don't think this has anything to do with us? You can do
| userspace TCP over WireGuard anywhere you can run a WireGuard
| connection to.
| croisillon wrote:
| https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
| morning-coffee wrote:
| Didn't really convey where the magic happens to get whatever
| frames are produced by the userspace TCP/IP stack to whatever
| mechanism puts those frames on some wire... isn't it still making
| some syscall? IOW, where's the I/O happening?
| tptacek wrote:
| The simplest way to do this, and the way I think this post is
| referring to, is to bring up a WireGuard session. WireGuard is
| essentially just a UDP protocol (so driveable from userland
| SOCK_DGRAM) that encapsulates whole IP packets, so you can run
| a TCP stack in userland over it.
| morning-coffee wrote:
| Thanks. I read a bit about WireGuard... seems to create a
| tunnel over UDP as you say. Where I got off track is the
| following comment in the article:
|
| > Who says you need to do networking in the kernel though?
|
| I thought it implied they were bypassing the kernel entirely.
| This isn't true at all if it's still using a UDP socket to
| send... there's still a _ton_ of code in the kernel to handle
| sending and receiving UDP. It 's also a helluva lot less
| efficient in terms of CPU consumed, unless one is using
| sendmmsg, because it's a syscall per tiny datagram between
| user space and kernel space to get the data onto the UDP
| socket...
| tptacek wrote:
| When people talk about bypassing the kernel, they mean
| taking control of TCP/IP (UDP, IPv6, ICMP, whatever) in
| userland; for instance, doing your own congestion control
| algorithm, or binding to every port on an address all at
| once, or doing your own routing --- all things the kernel
| doesn't expose interfaces for. They usually don't mean
| you're literally not using the kernel for anything.
|
| There are near-total bypass solutions, in which the bulk of
| the code is the same as you'd use with this WireGuard
| bypass, where ultra-high performance is the goal, and
| you're wiring userland up to get packets directly to and
| from the card DMA rings. That's a possible thing to do too,
| but you basically need to build the whole system around
| what you're doing. The WireGuard thing you can do as an
| unprivileged user --- that is, you can ship an application
| to your users that does it, without them having to care.
|
| Long story short: sometimes this is about perf, here though
| not so much.
| akira2501 wrote:
| To me the key thing with any TCP implementation is the
| timers. Just sending a single TCP packet causes the
| kernel to have to track a lot of state and then operate
| asynchronously from your program to manage this state,
| which also requires the kernel to manage packet buffering
| and queuing for you.
| xxpor wrote:
| If you're interested in the latter technique (hitting the
| HW directly from userspace), take a look at
| https://www.dpdk.org/. That's the framework that 95% of
| things that do that use (number pulled directly out of my
| rear end).
| RandomThoughts3 wrote:
| I haven't followed for a long time but last time I checked the
| state of the art was getting your packet in and out via an eBPF
| module like XDP actually bypassing the kernel network stack
| (that's not what the article is doing). You would then be
| directly talking to the NIC through the module which is akin to
| sending packets directly through the wire.
| sweeter wrote:
| I'm always wondering how people are affording kubernetes clusters
| and all that to run stuff like ollama.
| skinkestek wrote:
| The author is using https://k3s.io/ not the full k8s, so it
| doesn't have to be extremely expensive.
| suprjami wrote:
| You can run a local LLM on a $100 minipc with Vulkan GPU
| acceleration and get a usable token generation count.
___________________________________________________________________
(page generated 2024-08-29 23:00 UTC)