[HN Gopher] How I learned to stop worrying and love userspace ne...
       ___________________________________________________________________
        
       How I learned to stop worrying and love userspace networking
        
       Author : todsacerdoti
       Score  : 131 points
       Date   : 2024-08-29 13:06 UTC (9 hours ago)
        
 (HTM) web link (friendshipcastle.zip)
 (TXT) w3m dump (friendshipcastle.zip)
        
       | jauntywundrkind wrote:
       | Really love seeing such a straightforward example, of starting
       | with some desire or need, and ending up DIY'ing ones own
       | operator.
       | 
       | A lot of the pushback against Kubernetes revolves around whether
       | you 'rewlly need it' or whether to do something else. Seeing
       | someone go past running containers like this highlights the
       | extensibility, shows the core of Kubernetes as a pattern &
       | paradigm for building any kind of platform.
       | 
       | It's neat seeing that done so quickly & readily here. That we can
       | add and manage anything, consistently, quickly, is a promise that
       | I love seeing fulfilled.
        
         | Etheryte wrote:
         | While I see what you're getting at, I find the comment funny
         | after the first thing the article leads with is that certain
         | things are a pain in Kubernetes.
        
           | beeboobaa3 wrote:
           | It's a pain because kubernetes is designed to run multiple
           | workloads on multiple servers. So if you want to access the
           | VPN from _some_ kubernetes containers you 're going to have
           | to figure something out.
           | 
           | But nothing is stopping you from just joining all your hosts
           | into the VPN, just like a traditional deployment. Or set it
           | up on your network gateway. This would make it available to
           | all of your containers. Great. You're done.
           | 
           | But if that's not what you want, you'll need to figure
           | something out.
        
             | Etheryte wrote:
             | I'm not sure if I follow. You said Kubernetes is great for
             | building any kind of a platform, but then when someone
             | wants controlled access to a VPN it suddenly turns into a
             | no, not like that? Giving only certain parts of your
             | architecture access to certain capabilities is far from a
             | niche use case.
        
               | themgt wrote:
               | _Giving only certain parts of your architecture access to
               | certain capabilities is far from a niche use case_
               | 
               | What is the "normal" best practice here then? I would
               | just spin up multiple single-node k3s VM clusters and
               | hook the AI k3s VM to the VPN and the others not.
        
           | nonameiguess wrote:
           | Sort of. What the author is trying to do is actually quite
           | easy if the entire setup had been Kubernetes. k3s allows you
           | to connect nodes to your cluster over a Tailscale VPN. If you
           | do that, you don't need to expose your remote AI server to
           | any network at all except the Kubernetes internal network.
           | I'm guessing fly.io doesn't just give you bare servers you
           | can run k3s on, though. The only real difficulty here is
           | hooking up totally different abstraction engines that aren't
           | designed to work with each other. Putting the blame
           | specifically on one of them and not the other doesn't make
           | sense.
        
         | candiddevmike wrote:
         | At its heart, Kubernetes is a workload scheduler, and a fairly
         | opinionated one at that. IMO, it's too complicated for folks
         | that just need an "extensible platform" when all they really
         | need is a dumb scheduler API without all the Kubernetes bells
         | and whistles that have been bolted on over the years.
         | 
         | Kubernetes tries to be everything to everyone and makes the
         | entire thing too complicated. Seeing Kubernetes broken up into
         | smaller, more purpose built components and let folks pick and
         | choose or swap could be helpful, at the risk of it becoming
         | OpenStack.
         | 
         | There's an alternate reality where Fleet is the defacto
         | scheduler and cats and dogs live together in harmony.
        
           | mrgaro wrote:
           | I'd say the opposite instead: we need Kubernetes
           | distributions, just like Linux needs distributions. Nobody
           | wants to build their kernel from scratch and to hand pick
           | various user space programs.
           | 
           | Same for Kubernetes: Distributions which pack everything you
           | need in an opinionated way, so that it's easy to use. Now
           | it's kinda build-your-own-kubernetes at every platform:
           | kubeadm, EKS etc all require you to install various add-on
           | components before you have a fully suitable cluster.
        
             | p-o wrote:
             | I think the Operator pattern will grow to become exactly
             | what you describe. We're still at the early stage of that,
             | but I can see that a group of operators could become a
             | "distribution", in your example.
        
           | koito17 wrote:
           | > At its heart, Kubernetes is a workload scheduler
           | 
           | Wouldn't it be more accurate to state Kubernetes at its core
           | is a distributed control plane?
           | 
           | You want something that inspects the state of a distributed
           | system and mutates it as necessary to reach some "desired
           | state". That's the control plane bit. For fault tolerance,
           | the control plane itself needs to be distributed. This is the
           | exact reason for Kubernetes' existence.
        
         | treflop wrote:
         | What OP did could be done with any containerization platform
         | that abstracted away networking, which should be all of them.
         | 
         | Personally I think you should always be using containerization,
         | because even single node Docker is easy. If you are running
         | something for real, then definitely use Kubernetes.
         | 
         | If you are using containerization, setting up Tailscale is
         | trivial.
         | 
         | Whenever you abstract away something, you can swap out core
         | components like networking willy nilly. Of course, you can
         | always over-abstract, but a healthy amount is wonderful for
         | most non-very high performance use cases.
        
       | sigmonsays wrote:
       | "how I learned to ____ and love ____" titles seem so
       | editorialized I don't want to read the article but it ended up
       | being interesting.
       | 
       | however, i'm not sure yet how it's useful outside of fly.io. It
       | seems odd to say "I like to self host" and then yield to fly.io.
        
         | ElevenLathe wrote:
         | If you weren't aware of the reference, this comes from the full
         | title of the 1964 movie _Dr. Strangelove_:
         | https://en.wikipedia.org/wiki/Dr._Strangelove
        
           | cpach wrote:
           | For those that haven't seen it: Do yourself a favor and watch
           | it. At least two times. It's such a strange (haha) movie.
           | Quite unique IMO, can't think of any other movie that is
           | similar.
        
         | xiande04 wrote:
         | The first blank is "stop worrying". It's always "stop
         | worrying".
        
         | tptacek wrote:
         | I don't think this has anything to do with us? You can do
         | userspace TCP over WireGuard anywhere you can run a WireGuard
         | connection to.
        
         | croisillon wrote:
         | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
        
       | morning-coffee wrote:
       | Didn't really convey where the magic happens to get whatever
       | frames are produced by the userspace TCP/IP stack to whatever
       | mechanism puts those frames on some wire... isn't it still making
       | some syscall? IOW, where's the I/O happening?
        
         | tptacek wrote:
         | The simplest way to do this, and the way I think this post is
         | referring to, is to bring up a WireGuard session. WireGuard is
         | essentially just a UDP protocol (so driveable from userland
         | SOCK_DGRAM) that encapsulates whole IP packets, so you can run
         | a TCP stack in userland over it.
        
           | morning-coffee wrote:
           | Thanks. I read a bit about WireGuard... seems to create a
           | tunnel over UDP as you say. Where I got off track is the
           | following comment in the article:
           | 
           | > Who says you need to do networking in the kernel though?
           | 
           | I thought it implied they were bypassing the kernel entirely.
           | This isn't true at all if it's still using a UDP socket to
           | send... there's still a _ton_ of code in the kernel to handle
           | sending and receiving UDP. It 's also a helluva lot less
           | efficient in terms of CPU consumed, unless one is using
           | sendmmsg, because it's a syscall per tiny datagram between
           | user space and kernel space to get the data onto the UDP
           | socket...
        
             | tptacek wrote:
             | When people talk about bypassing the kernel, they mean
             | taking control of TCP/IP (UDP, IPv6, ICMP, whatever) in
             | userland; for instance, doing your own congestion control
             | algorithm, or binding to every port on an address all at
             | once, or doing your own routing --- all things the kernel
             | doesn't expose interfaces for. They usually don't mean
             | you're literally not using the kernel for anything.
             | 
             | There are near-total bypass solutions, in which the bulk of
             | the code is the same as you'd use with this WireGuard
             | bypass, where ultra-high performance is the goal, and
             | you're wiring userland up to get packets directly to and
             | from the card DMA rings. That's a possible thing to do too,
             | but you basically need to build the whole system around
             | what you're doing. The WireGuard thing you can do as an
             | unprivileged user --- that is, you can ship an application
             | to your users that does it, without them having to care.
             | 
             | Long story short: sometimes this is about perf, here though
             | not so much.
        
               | akira2501 wrote:
               | To me the key thing with any TCP implementation is the
               | timers. Just sending a single TCP packet causes the
               | kernel to have to track a lot of state and then operate
               | asynchronously from your program to manage this state,
               | which also requires the kernel to manage packet buffering
               | and queuing for you.
        
               | xxpor wrote:
               | If you're interested in the latter technique (hitting the
               | HW directly from userspace), take a look at
               | https://www.dpdk.org/. That's the framework that 95% of
               | things that do that use (number pulled directly out of my
               | rear end).
        
         | RandomThoughts3 wrote:
         | I haven't followed for a long time but last time I checked the
         | state of the art was getting your packet in and out via an eBPF
         | module like XDP actually bypassing the kernel network stack
         | (that's not what the article is doing). You would then be
         | directly talking to the NIC through the module which is akin to
         | sending packets directly through the wire.
        
       | sweeter wrote:
       | I'm always wondering how people are affording kubernetes clusters
       | and all that to run stuff like ollama.
        
         | skinkestek wrote:
         | The author is using https://k3s.io/ not the full k8s, so it
         | doesn't have to be extremely expensive.
        
         | suprjami wrote:
         | You can run a local LLM on a $100 minipc with Vulkan GPU
         | acceleration and get a usable token generation count.
        
       ___________________________________________________________________
       (page generated 2024-08-29 23:00 UTC)