[HN Gopher] We're Leaving Kubernetes
___________________________________________________________________
We're Leaving Kubernetes
Author : filiptronicek
Score : 242 points
Date : 2024-11-04 14:41 UTC (8 hours ago)
(HTM) web link (www.gitpod.io)
(TXT) w3m dump (www.gitpod.io)
| lolinder wrote:
| > This is not a story of whether or not to use Kubernetes for
| production workloads that's a whole separate conversation. As is
| the topic of how to build a comprehensive soup-to-nuts developer
| experience for shipping applications on Kubernetes.
|
| > This is the story of how (not) to build development
| environments in the cloud.
|
| I'd like to request that the comment thread not turn into a bunch
| of generic k8s complaints. This is a legitimately interesting
| article about complicated engineering trade-offs faced by an
| organization with a very unique workload. Let's talk about that
| instead of talking about the title!
| kitd wrote:
| Agreed. It's actually a very interesting use case and I can
| easily see that K8s wouldn't be the answer. My dev env is very
| definitely my "pet", thank you very much!
| ethbr1 wrote:
| It'd be nice to editorialize the title a bit with "... (for
| dev envs)" for clarity.
|
| Super useful negative example, and the lengths they pursued
| to make it fit! And no knock on the initial choice or
| impressive engineering, as many of the k8s problems they hit
| likely weren't understood gaps at the time they chose k8s.
|
| Which makes sense, given k8s roots in (a) not being a
| security isolation tool & (b) targeting up-front
| configurability over runtime flexibility.
|
| Neither of which mesh well with the co-hosted dev environment
| use case.
| preommr wrote:
| Can someone clarify if they mean development environments, or
| if they're talking about a service that they sell that's
| related to development environments.
|
| Because I don't understand most of the article if it's the
| former. How are things like performance are a concern for
| internal development environments? And why are so many things
| stateful - ideally there should be some kind of
| configuration/secret management solution so that deployments
| are consistent.
|
| If it's the latter, then this is incredibly niche and maybe
| interesting, but unlikely to be applicable to anyone else.
| clvx wrote:
| I tried doing a dev environment on Kubernetes but the fact you
| have to be dealing with a set of containers that could change if
| the base layer changed meant instability in certain cases which
| threw me off.
|
| I ended up with a mix of nix and it's vm build system which is
| based on qemu. The issue is too tied to NixOS and all services
| run in the same place which forces you to manage ports and other
| things.
|
| How I wish it could work is having a flake that defines certain
| services, these services could or could not run in different uVMs
| sharing an isolated linux network layer. Your flake could define
| your versions, your commands to interact and manage the lifecyle
| of those uVM's. As the nix store can be cached/shared, it can be
| provide fast and reproducible builds after the first build.
| candiddevmike wrote:
| > the fact you have to be dealing with a set of containers that
| could change if the base layer changed meant instability
|
| Can you expand on this? Are you talking about containers you
| create?
| eptcyka wrote:
| Have you tried https://github.com/astro/microvm.nix ? You can
| use the same NixOS module for both declarative VMs and
| imperatively configured and spawned VMs.
| bhouston wrote:
| I also recently left Kubernetes. It was a huge waste of time and
| money. I've replaced it with just a series of services on Google
| Cloud Run and then using Google's Cloud Run Tasks services for
| longer running tasks.
|
| The infrastructure now incredibly understandable and simple and
| cost effective.
|
| Kubernetes cost us >$million in both DevOps time and actually
| Google Cloud costs unnecessarily, and even worse it cost us time
| to market. Stay off of Kubernetes as long as you can in your
| company, unless you are basically forced onto it. You should view
| it as an unnecessary evil that comes with massive downsides in
| terms of complexity and cost.
| elcomet wrote:
| Aren't you afraid of being now stuck with GCP?
| bhouston wrote:
| It is just a bunch of docker containers. Some run in tasks
| and some run as auto-scaling services. Would probably take a
| week to switch to AWS as there are equivalent managed
| services there.
|
| But this is really a spurious concern. I myself used to care
| about it years ago. But in practice, rarely do people switch
| between cloud providers because the incremental benefits are
| minor, they are nearly equivalent, there is nothing much to
| be gained by moving from one to the other unless politics are
| involved (e.g. someone high up wants a specific provider.)
| spwa4 wrote:
| How does the orchestration work? How do you share storage?
| How do the docker containers know how to find each other?
| How does security work?
|
| I feel like Kubernetes' downfall, for me, is the number of
| "enterprise" features it (got convinced into) supporting
| and enterprise features doing what they do best: turning
| the simplest of operations into a disaster.
| bhouston wrote:
| > How does the orchestration work?
|
| Github Actions CI. Take this and make a few more
| dependencies and a matrix strategy and you are good to
| go: https://github.com/bhouston/template-typescript-
| monorepo/blo... For dev environments, you can add post-
| fixes to the services based on branches.
|
| > How do you share storage?
|
| I use managed DBs and Cloud Storage for shared storage. I
| think that provisioning your own SSDs/HDs to the cloud is
| indicative of an anti-pattern in your architecture.
|
| > How do the docker containers know how to find each
| other?
|
| I try to avoid too much communication between services
| directly, rather try to go through pub-sub or similar.
| But you can set up each service with a domain name and
| access them that way. With https://web3dsurvey.com, I
| have an api on https://api.web3dsurvey.com and then a
| review environment (connected to the main branch) with
| https://preview.web3dsurvey.com /
| https://api.preview.web3dsurvey.com.
|
| > How does security work?
|
| You can configure Cloud Run services to be internal only
| and not to accept outside connections. Otherwise one can
| just use JWT or whatever is normal on your routes in your
| web server.
| sofixa wrote:
| One of Cloud Run's main advantages is that it's literally
| just telling it how to run containers. You could run those
| same containers in OpenFaaS, Lambda, etc relatively easily.
| rglover wrote:
| What stack are you deploying?
| bhouston wrote:
| Stuff like this, just at larger scale:
|
| https://github.com/bhouston/template-typescript-monorepo
|
| This is my living template of best practices.
| rglover wrote:
| I'd investigate getting a build out to Node.js (looks like
| you already have this) and then just doing a simple SCP of
| the build to a VPS. From there, just use a systemd script
| to handle startup/restart on errors. For logging, something
| like the Winston package does the trick.
|
| If you want some guidance, shoot me an email (in profile).
| You can run most stuff for peanuts.
| Imustaskforhelp wrote:
| yeh I have same thoughts , also if possible , bun can
| also reduce memory usage in very very basic scenarios
| https://www.youtube.com/watch?v=yJmyYosyDDM
|
| Or just https://github.com/mightymoud/sidekick or coolify
| or dokku or dockify , like there are million of such
| things , oh just remembered kamala deploy from DHH and
| docker swarm IIRC (though people have seemed to forget
| docker swarm !)
|
| I like this idea very much !
| Imustaskforhelp wrote:
| there was some recent HN post which showed that they
| didn't even use docker but rather there was some other
| mechanism and it was so so simple , I really enjoyed that
| article
| bhouston wrote:
| > I'd investigate getting a build out to Node.js (looks
| like you already have this) and then just doing a simple
| SCP of the build to a VPS. From there, just use a systemd
| script to handle startup/restart on errors. For logging,
| something like the Winston package does the trick. If you
| want some guidance, shoot me an email (in profile). You
| can run most stuff for peanuts.
|
| I appreciate the offer! But it is not as robust and it is
| more expensive and misses a lot of benefits.
|
| Back in the 1990s I did FTP my website to a VPS after I
| graduated from Geocities.
|
| Google Cloud charges based on CPU used. Thus my servers
| have no traffic, they cost less than a $1/month. If they
| have traffic, they are still cost effective.
| https://web3dsurvey.com has about 500,000 hits per month
| and it costs me $4/month to run both the Remix web server
| and the Fastify API server. Details here:
| https://x.com/benhouston3d/status/1840811854911668641
|
| Also it will autoscale under load. Thus when one of my
| posts was briefly the top story on Hacker News last
| month, Google Cloud Run added more instances to my server
| to handle the load (because I do not run my personal site
| behind a CDN, it cost too much, I prefer to pay $1/month
| for hosting.)
|
| Also deploying Docker containers that build on Github
| Actions CI in a few minutes is a great automated
| experience.
|
| I do also use Google services like Cloud Storage,
| Firestore, BigQuery etc. And it is easier to just run it
| on GCP infrastructure for speed.
|
| I also have to version various tools that get installed
| in the docker like Blender, Chromium, etc. This is the
| perfect use case for Docker.
|
| I feel this is pretty close to optimal. Fast, cheap,
| scalable, automated and robust.
| candiddevmike wrote:
| You know that Cloud Run is effectively a Kubernetes PaaS,
| right?
| richards wrote:
| Google employee here. Not the case. Cloud Run doesn't run on
| Kubernetes. It supports the Knative interface which is an OSS
| project for Kubernetes-based serverless. But Cloud Run is a
| fully managed service that sits directly atop Borg
| (https://cloud.google.com/run/docs/securing/security).
| bhouston wrote:
| > You know that Cloud Run is a Kubernetes PaaS, right?
|
| Yup. Isn't it Knative Serving or a home grown Google
| alternative to it? https://knative.dev/docs/serving/
|
| The key is I am not managing Kubernetes and I am not paying
| for it - it is a fool's errand, and incredibly rarely needed.
| Who cares what is underneath the simple Cloud Run developer
| UX? What matters for me is cost, simplicity, speed and
| understandability. You get that with Cloud Run, and you don't
| with Kubernetes.
| chanux wrote:
| I guess the point is that for the OP, Kubernetes is now
| someone else's problem.
| kbolino wrote:
| As far as I can tell, there actually is no AWS equivalent to
| GCP Cloud Run. The closest equivalents I know of are ECS on
| Fargate, which is more like managed Kubernetes except without
| Kubernetes compatibility or modern features, or AppRunner,
| which is closer in concept but also sorely lacking in
| comparable features.
| Imustaskforhelp wrote:
| wow very very interesting. I think we can discuss about it on
| hours.
|
| 1.) What would you think of things like hetzner / linode /
| digitalocean (if stable work exists)
|
| 2.) What do you think of https://sst.dev/ or
| https://encore.dev/ ? (They support rather easier migration)
|
| 3.) Could you please indicate the split of that 1 million $ in
| devops time and google cloud costs unnecessarily & were there
| some outliers (like oh our intern didn't add this specific
| variable and this misconfigured cloud and wasted 10k on gcloud
| oops! or was it , that bandwidth causes this much more in
| gcloud (I don't think latter to be the case though))
|
| Looking forward to chatting with you!
| ensignavenger wrote:
| The article does a great job of explaining the challenges they
| ran into with Kubernetes, and some of the things they tried...
| but I feel like it drops the ball at the end by not telling us at
| least a little what they chose instead. The article mentions they
| call their new solution "Gitpod Flex" but there is nothing about
| what Gitpod Flex is. They said they tried microVMs and decided
| against them, and of course Kubernetes, the focus of the article.
| So is GitpodFlex based on full VM's? Docker? Some other container
| runtime??
|
| Perhaps a followup article will go into detail about their
| replacement.
| loujaybee wrote:
| Yeah, that's fair. The blog was getting quite long, so we need
| to do some deeper dives in follow-ups.
|
| Gitpod Flex is runner-based. The runner interface is
| intentionally generic so that we can support different clouds,
| on-prem or just Linux in future.
|
| The first implemented runner is built around AWS primitives
| like EC2, EBS and ECS. But because of the more generic
| interface Gitpod now supports local / desktop environments on
| MacOS. And again, future OS support will come.
|
| There's a bit more information in the docs, but we will do some
| follow ups!
|
| - https://www.gitpod.io/docs/flex/runners/aws/setup-aws-
| runner... - https://www.gitpod.io/docs/flex/gitpod-desktop
|
| (I work at Gitpod)
| nickstinemates wrote:
| Echoing the parent you're replying to. You built up all of
| the context and missed they payoff.
| ethbr1 wrote:
| I thought it was fair.
|
| >> _We'll be posting a lot more about Gitpod Flex
| architecture in the coming weeks or months._
|
| Cramming more detail into this post would have exceeded the
| average user read time ceiling.
| Bombthecat wrote:
| Still No idea what you did technically... Maybe a second
| post?
|
| Did you use consul?
| ensignavenger wrote:
| Awesome, looking forward to hearing more. I only recently
| began testing out Theia and OpenVSCodeServer, I really
| appreciate Gitpod's contributions to open source!
| datadeft wrote:
| The original k8s paper mentioned that the only use case was a low
| latency and a high latency workflow combination and the resource
| allocation is based on that. The generic idea is that you can
| easily move low latency work between nodes and there are no
| serios repercussions when a high latency job fails.
|
| Based on this information, it is hard to justify to even consider
| k8s for the problem that gitpod has.
| xyst wrote:
| I do agree with the points in article that k8s is not a good fit
| for development environments.
|
| In my opinion, k8s is great for stable and consistent
| deployment/orchestration of applications. Dev environments by
| default are in a constant state of flux.
|
| I don't understand the need for "cloud development environments"
| though. Isn't the point of containerized apps is to avoid the
| need for synchronizing dev envs amongst teams?
|
| Or maybe this product is supposed to decrease onboarding
| friction?
| sofixa wrote:
| It's to ensure a consistent environment for all developers,
| with the resources required. E.g. they mention GPUs, for
| developers working with GPU-intensive workloads. You can ship
| all developers gaming laptops with 64GB RAM and proper GPUs,
| and have them fight the environment to get the correct
| libraries as you have in prod (even with containers that's not
| trivial), or you can ship them Macbook Airs and similar, and
| have them run consistent (the same) dev environments remotely
| (you can self-host gitpod, it's not only a cloud service, it's
| more the API/environment to get consistent remote dev
| enviornments).
| loujaybee wrote:
| Yeah, exactly. Containers locally are a basic foundation. But
| usually those containers or services need to talk to one
| another, they need some form of auth and credentials, they
| need some networking setup. There's a lot of configuration in
| all of that. The more devs swap projects or the more complex
| the thing you're working on the more the challenge grows.
| Automating depedencies, secret access, ensuring projects have
| the right memory, cpu, gpu etc. Also security - moving source
| code off your laptop and devices and standardizing your
| setups helps if you need to do a lot of audit and compliance
| as you can automate it.
| dikei wrote:
| Sarcastically, CDE is one way to move cost from CAPEX (get your
| developer a Mac Book Pro) to OPEX (a monthly subscription that
| you only need to pay as long as the dev has not been lay off)
|
| It's also much cheaper to hire contractors and give them the
| CDE that can be terminated on a moment notice.
| roshbhatia wrote:
| In my experience, the case where this becomes really valuable
| is if your team needs access to either different kinds of
| hardware or really expensive hardware that changes relatively
| quickly (i.e. GPUs). At a previous small startup I setup
| https://devpod.sh/ (similar to gitpod) for our MLE/Data team.
| It was a big pro to leverage our existing k8s setup w/ little
| configuration needed to get these developer envs up and running
| as-needed, and we could piggyback off of our existing cost
| tracking tooling to measure usage, but I do feel like we
| already had infra conducive to running dev envs on k8s before
| making this decision -- we had cost tracking tooling, we had a
| dedicated k8s cluster for tooling, we had already been
| supporting GPU based workloads in k8s, and our platform team
| that managed all the k8s infra also were the SMEs for anything
| devenv releated. In a world where we started fresh and
| absolutely needed ephemeral devenvs, I think the native
| devcontainer functionality in vscode or something like github
| codespaces would have been our go to, but even then I'd push
| for a docker-compose based workflow prior to touching any of
| these other tools.
|
| The rest of our eng team just did dev on their laptops though.
| I do think there was a level of batteries-included-ness that
| came with the ephemeral dev envs which our less technical data
| scientists appreciated, but the rest of our developers did not.
| Just my 2c
| lmeyerov wrote:
| I was intrigued because the development environment problem is
| similar to the data scientist one - data gravity, GPU sharing,
| etc - but I'm confused on the solution?
|
| Oddly, I left with a funny alternate takeaway: One by one, their
| clever inhouse tweaks & scheduling preferences were recognized by
| the community and turned into standard k8s knobs
|
| So I'm back to the original question... What is fundamentally
| left? It sounds like one part is maintaining a clean container
| path to simplify a local deploy, which a lot of k8s teams do (ex:
| most of our enterprise customers prefer our docker compose & AMIs
| over k8s). But more importantly, something fundamental
| architecturally about how envs run that k8s cannot do, but they
| do not identify?
| thenaturalist wrote:
| > We'll be posting a lot more about Gitpod Flex architecture in
| the coming weeks or months. I'd love to invite you on November
| the 6th to a virtual event where I'll be giving a demo of
| Gitpod Flex and I'll deep-dive into the architecture and
| security model at length.
|
| Bottom of the post.
| csweichel wrote:
| OP here. The Kubernetes community has been fantastic at
| evolving the platform, and we've greatly enjoyed being in the
| middle of it. Indeed, many of the things we had to build next
| to Kubernetes have now become part of k8s itself.
|
| Still, some of the core challenges remain: - the flexibility
| Kubernetes affords makes it hard to build and distribute a
| product with such specific requirements across the broad swath
| of differently set up Kubernetes installations. Managed
| Kubernetes services help, but come with their own restrictions
| (e.g. Kernel versions on GKE). - state handling and storage
| remains unsolved. PVCs are not reliable enough, subject to a
| lot of variance (see point above), and depending on the backing
| storage have vastly different behaviour. Local disks (which we
| use to this day), make workspace startup and backup expensive
| from a resource perspective and hard to predict timing wise. -
| user namespaces have come a long way in Kubernetes, but by
| themselves are not enough. /proc is still masked, FUSE is still
| not usable. - startup times, specifically container pulls and
| backup restoration, are hard to optimize because they depend on
| a lot of factors outside of our control (image homogeneity,
| cluster configuration)
|
| Fundamentally, Kubernetes simply isn't the right choice here.
| It's possible to make it work, but at some point the ROI of
| running on Kubernetes simply isn't there.
| lmeyerov wrote:
| Thanks!
|
| AFAICT, a lot of that comes down to storage abstractions,
| which I'll be curious to see the answer on! Pinned
| localstorage <> cloud native is frustrating.
|
| I sense another big chunk is the fast secure start problems
| that firecracker (noted in the blogpost) solve but k8s is not
| currently equipped for. Our team has been puzzling that one
| for awhile, and part of our guess is incentives. It's been 5+
| years since firecracker came out, so likewise been
| frustrating to see.
| debarshri wrote:
| Phew, it is absolutely true. Building dev environments on k8s
| become wasteful. To add to this complexity, if you are building a
| product that is self hosted on customer's infrastructure.
| Debugging and support also become non homogeneous and difficult.
|
| What we have seen works especially when you are building
| developer centric product is expose these native issues around
| network, memory, compute and storage to engineers and they are
| more willing to work around it. Abstracting those issues leads to
| shift in responsibility on the product.
|
| Having said that, I still think k8s is an upgrade when you have a
| large team.
| horsawlarway wrote:
| Personally - just let the developer own the machine they use for
| development.
|
| If you _really_ need consistency for the environment - Let them
| own the machine, and then give them a stable base VM image, and
| pay for decent virtualization tooling that they run... on their
| own machine.
|
| I have seen several attempts to move dev environments to a remote
| host. They _invariably_ suck.
|
| Yes - that means you need to pay for decent hardware for your
| devs, it's usually cheaper than remote resources (for a lot of
| reasons).
|
| Yes - that means you need to support running your stack locally.
| This is a good constraint (and a place where containers are your
| friend for consistency).
|
| Yes - that means you need data generation tooling to populate a
| local env. This can be automated relatively well, and it's
| something you need with a remote env anyways.
|
| ---
|
| The only real downside is data control (ie - the company has less
| control over how a developer manages assets like source code).
| I'm my experience, the vast majority of companies should worry
| less about this - your value as a company isn't your source code
| in 99.5% of cases, it's the team that executes that source code
| in production.
|
| If you're in the 0.5% of other cases... you know it and you
| should be in an air-gapped closed room anyways (and I've worked
| in those too...)
| shriek wrote:
| And the reason they suck is the feedback loop is just too high
| as compared to running it locally. You have to jump through
| hoops to debug/troubleshoot your code or any issues that you
| come across between your code and output of your code. And it's
| almost impossible to work on things when you have spotty
| internet. I haven't worked on extremely sensitive data but for
| PII data from prod to dev, scrubbing is a good practice to
| follow. This will vary based on the project/team you're on of
| course.
| ethbr1 wrote:
| Aka 'if a developer knew beforehand everything they needed,
| it wouldn't be development'
| binary132 wrote:
| Hello. Currently debugging my kubernetes-based dev pod and not
| getting anything else done. What fun!
| haolez wrote:
| Sometimes I don't even use virtual envs when developing locally
| in Python. I just install everything that I need with pip
| --user and be done with it. Never had any conflicts with system
| packages whatsoever. If I somehow break my --user environment,
| I simply delete it and start again. Never had any major version
| mismatch in dependencies between my machine and what was
| running in production. At least not anything that would impact
| the actual task that I was working on.
|
| I'm not recommending this as a best practice. I just believe
| that we, as developers, end up creating some myths to ourselves
| of what works and what doesn't. It's good to re-evaluate these
| beliefs now and then.
| ctippett wrote:
| I'm not going to second-guess what works for you, but Python
| makes it so easy to work with an ephemeral environment.
| python -m venv .venv
| haolez wrote:
| Yeah, I know. But then you have to make sure that your IDE
| is using the correct environment, that the notebook is
| using the correct environment, that the debugger is using
| the correct environment.
|
| It's trivial to setup a venv, but sometimes it's just not
| worth it for me.
| zo1 wrote:
| This is one of the main reasons I tell people _not_ to
| use VSCode. The people most likely to use it are juniors
| and people new to python specifically, and they 're the
| most likely to fall victim to 'but my "IDE" says it's
| running 3.8 with everything installed, but when I run it
| from my terminal it's a different python 3.8'
|
| I watched it last week. With 4 (I hope junior) Devs in a
| "pair programming" session that forced me to figure out
| how VSCode does virtual envs, and _still_ I had to tell
| them like 3 times "stop opening a damn new terminal,
| it's obviously not setup with our python version, run the
| command inside the one that has the virtual env
| activated".
| fastasucan wrote:
| Weird, in my experience vscode makes it very clear by
| making you explicitly choose a .venv when running or
| debugging.
|
| When it comes to opening a new terminal, you would have
| the exact same problem by... running commands in a
| terminal, cant see how vscode related that is.
| ok_computer wrote:
| The only time I've had version issues running python code is
| that someone prior was referencing a deprecated library API
| or using an obscure package that shouldn't see the light of
| day in a long lived project.
|
| If you stick to the tried and true libs and change your
| function kwargs or method names when getting warnings, then
| I've had pretty rock steady reproducibility using even an un-
| versioned "python -m pip install -r requirements.txt"
| experience
|
| I could also be a slob or just not working at the bleeding
| edge of python lib deployment tho so take it with a grain of
| salt.
| __MatrixMan__ wrote:
| When doing this re-evaluation, please consider that others
| might be quietly working very hard to discover and recreate
| locally whatever secret sauce you and production share.
| csweichel wrote:
| OP here. There definitely is a place for running things on your
| local machine. Exactly as you say: one can get a great deal of
| consistency using VMs.
|
| One of the benefits of moving away from Kubernetes, to a
| runner-based architecture , is that we can now seamlessly
| support cloud-based and local environments
| (https://www.gitpod.io/blog/introducing-gitpod-desktop).
|
| What's really nice about this is that with this kind of
| integration there's very little difference in setting up a dev
| env in the cloud or locally. The behaviour and qualities of
| those environments can differ vastly though (network bandwidth,
| latency, GPU, RAM, CPUs, ARM/x86).
| master_crab wrote:
| Hi Christian. We just deployed Gitpod EKS at our company in
| NY. Can we get some details on the replacement architecture?
| I'm sure it's great but the devil is always in the details.
| michaelt wrote:
| _> The behaviour and qualities of those environments can
| differ vastly though (network bandwidth, latency, GPU, RAM,
| CPUs, ARM /x86)._
|
| For example, when you're running on your local machine you've
| actually got the amount of RAM and CPU advertised :)
| sethammons wrote:
| "Hm, why does my Go service on a pod with 2.2 cpu's think
| it has 6k? Oh, it thinks it has the whole cluster. Nice;
| that is why scheduling has been an issue"
| lotharcable wrote:
| I strongly recommend just switching the Dev environment over to
| Linux and taking advantage of tools like "distrobox" and
| "toolbx".
|
| https://github.com/89luca89/distrobox
|
| https://containertoolbx.org/
|
| It is sorta like Vagrant, but instead of using virtualbox
| virtual machines you use podman containers. This way you get to
| use OCI images for your "dev environment" that integrates
| directly into your desktop.
|
| https://podman.io/
|
| There is some challenges related to usermode networking for
| non-root-managed controllers and desktop integration has some
| additional complications. But besides that it has almost no
| overhead and you can have unfettered access to things like
| GPUs.
|
| Also it is usually pretty easy to convert your normal docker or
| kubernetes containers over to something you can run on your
| desktop.
|
| Also it is possible to use things like Kubernetes pods
| definitions to deploy sets of containers with podman and manage
| it with systemd and such things. So you can have "clouds of
| containers" that your dev container needs access to locally.
|
| If there is a corporate need for window-specific applications
| then running Windows VMs or doing remote applications over RDP
| is a possible work around.
|
| If everything you are targeting as a deployment is going to be
| Linux-everything then it doesn't make a lot of sense to jump
| through a bunch of hoops and cause a bunch of headaches just to
| avoid having it as workstation OS.
| trog wrote:
| If you're doing this, there are many cases where you might as
| well just spin up a decent Linux server and give your
| developers accounts on that? With some pretty basic setup
| everyone can just run their own stuff within their own user
| account.
|
| You'll run into occasional issues (e.g. if everyone is trying
| to run default node.js on default port) but with some basic
| guardrails it feels like it should be OK?
|
| I'm remembering back to when my old company ran a lot of PHP
| projects. Each user just had their own development
| environment and their own Apache vhost. They wrote their code
| and tested it in their own vhost. Then we'd merge to a single
| separate vhost for further testing.
|
| I am trying to remember anything about what was painful about
| it but it all basically Just Worked. Everyone had remote
| access via VPN; the worst case scenario for them was they'd
| have to work from home with a bit of extra latency.
| idunnoman1222 wrote:
| Sounds like you are not using a lot of hardware - Rfid, POS,
| top-spec video cards, etc
| 0xbadcafebee wrote:
| Most teams/products I have been involved in, the stack always
| grows to the point that a dev can no longer test it on their
| own machine, regardless of how big the machine is. And having a
| different development machine than production leads to
| completely predictable and unavoidable problems. Devs need to
| create the software tooling to make remote dev less painful. I
| mean, they're devs... making software is kind of their whole
| thing.
| hosh wrote:
| I have used remote dev machines just fine, but my workflow
| vastly differs from many of my coworkers: terminal-only
| spacemacs + tmux + mosh. I have a lot of CLI and TUI tools,
| and I do not use VScode at all. The main GUI app I run is a
| browser, and that runs locally.
|
| I have worked on developing VMs for other developers that
| rely on a local IDE such. The main sticking point is syncing
| and schlepping source code (something my setup avoids because
| the source code and editor is on the remote machine). I have
| tried a number of approaches, and I sympathize with the
| article author. So, in response to "Devs need to create the
| software tooling to make remote dev less painful. I mean,
| they're devs... making software is kind of their whole
| thing." <-- syncing and schlepping source code is by no means
| a solved problem.
|
| I can also say that, my spacemacs config is very vanilla.
| Like my phone, I don't want to be messing with it when I want
| to code. Writing tooling for my editor environment is a
| sideshow for the work I am trying to finish.
| kgeist wrote:
| We have a project which spawns around 80 Docker containers
| and runs pretty OK on a 5 year old Dell laptop with 16GB RAM.
| The fans run crazy and the laptop is always very hot but I
| haven't noticed considerable lags, even with IntelliJ
| running. Most services are written in Go though and are
| pretty lightweight.
| speedisavirus wrote:
| That's fine for some. However it's not always that. I wrote an
| entire site on my ipad in spare time with GitPods. Maybe you
| are at a small company with a small team so if things get
| critical you are likely to get a call. Do you say F'it, do you
| carry your laptop, or do you carry your ipad like you already
| are knowing you can still at least do triage if needed because
| you have a perfectly configured gitpod to use.
| 2075 wrote:
| I think nowadays the value of source code is rarely a more
| valuable asset than the data being processed. Also I would
| prefer to give my devs just a second machine to run workloads
| and eventually pull in data or mock the data so they get moving
| more easily.
| neilv wrote:
| > _The only real downside is data control (ie - the company has
| less control over how a developer manages assets like source
| code). ). I 'm my experience, the vast majority of companies
| should worry less about this [...]_
|
| I once had to burn a ton of political capital (including some
| on credit), because someone who didn't understand software
| thought that cutting-edge tech startup software developers,
| even including systems programmers working close to metal,
| could work effectively using only virtual remote desktops...
| with a terrible VM configuration... from servers literally
| halfway around the world... through a very dodgy firewall and
| VPN... of 10Mb/s total bandwidth... for the entire office of
| dozens of developers.
|
| (And no other Internet access from the VMs. Administrators
| would copy whatever files from the Internet that are needed for
| work. And there was a bureaucratic form for a human process, if
| you wanted to request any code/data to go in or out. And the
| laptops/workstations used only as thin-clients for the remote
| VMs would have to be Windows and run this ridiculous obscure
| 'endpoint security' software that had changed hands from its
| ancient developer, and hadn't even updated the marketing
| materials (e.g., a top bulletpoint was keeping your employees
| from wasting time on a Web site that famously was wiped out
| over a decade earlier), and presumably was littered with
| introduced vulnerabilities and instabilities.)
|
| Note that this was _not_ something like DoD, nor HIPAA, nor
| finance. Just cutting-edge tech on which (ironically) we wanted
| first-mover advantage.
|
| This escalated to the other top-titled software engineer and I
| together doing a presentation to C-suite, on why not only would
| this kill working productivity (especially in a startup that
| needed to do creative work fast!), but the bad actors someone
| was paranoid about could easily circumvent it anyway to
| exfiltrate data (using methods obvious to the skilled software
| people like they hired, some undetectable by any security
| product or even human monitoring they imagined), and all the
| good rule-following people would quit in incredulous
| frustration.
|
| Unfortunately, it might not have been even the CEO's call, but
| a crazy investor.
| jt2190 wrote:
| I'm not sure we should leap from:
|
| > I have seen several attempts to move dev environments to a
| remote host. They invariably suck.
|
| To "therefore they will _always_ suck and have no benefits and
| nobody should ever use them ever". Apologies for the hyperbole
| but I'm making a point that comments like these tend to shut
| down interesting explorations of the state of the art of remote
| computing and what the pros /cons are.
|
| Edit: In a world where users demand that companies implement
| excellent security then we must allow those same companies to
| limit physical access to their machines as much as possible.
| horsawlarway wrote:
| But they don't suck because of lack of effort - they suck
| because there are real physical constraints.
|
| Ex - even on a _VERY_ good connection, RTT on the network is
| going to exceed your frame latency for a computer sitting in
| front of you (before we even get into the latency of the
| actual frame rendering of that remote computer). There 's
| just not a solution for "make the light go faster".
|
| Then we get into the issues the author actually laid out
| quite compellingly - Shared resources are unpredictable. Is
| my code running slowly right now because I just introduced an
| issue, or is it because I'm sharing an env and my neighbor
| just ate 99% of the CPU/IO, or my network provider has picked
| a different route and my latency just went up 500ms?
|
| And that's before we even touch the "My machine is
| down/unreachable, I don't know why and I have no visibility
| into resolving the issue, when was my last commit again?"
| style problems...
|
| > Edit: In a world where users demand that companies
| implement excellent security then we must allow those same
| companies to limit physical access to their machines as much
| as possible.
|
| And this... is just bogus. We're not talking about machines
| running production data. We're talking about a developer
| environment. Sure - limit access to prod machines all you
| like, while you're at it, don't give me any production user
| data either - I sure as hell don't want it for local dev.
| What I do want is a fast system that I control so that I can
| actually tweak it as needed to develop and debug the system -
| it is almost impossible to give a developer "the least access
| needed" to do development locally because if you know what
| that access was you wouldn't be developing still.
| sangnoir wrote:
| > But they don't suck because of lack of effort - they suck
| because there are real physical constraints.
|
| They do suck due to lack of effort or investment. FANG
| companies have remote dev experiences that do not suck
| because they invest obscene amounts into dev tooling.
|
| There physical constraints on the flipside: especially for
| gigantic codebases or datasets that don't fit on dev
| laptops or have need lower latencies to other services in
| the DC.
|
| Added bonus: smaller attack surface area for adversaries
| who want to gain access to your code.
| hintymad wrote:
| > Personally - just let the developer own the machine they use
| for development.
|
| It'll work if the company can offer something similar to EC2.
| Unfortunately most of the companies are not capable of doing so
| if they are not on cloud.
| nixdev wrote:
| > Personally - just let the developer own the machine they use
| for development.
|
| Overall I agree with you that this is how it should be, but as
| DevOps working with so many development teams, I can tell you
| that too many developers know a language or two but beyond that
| barely know how to use a computer. Most developers (yes even
| most of the ones in Silicon Valley or the larger Bay Area) with
| Macbooks will smile and nod at when you tell them that Docker
| Desktop runs a virtual machine to run a copy of Linux to run
| oci images, and then not too much later reveal themselves to
| have been clueless.
|
| Commenters on this site are generally expected to be in a
| different category. Just wanted to share that, as a seasoned
| DevOps pro, I can tell you it's pretty rough out there.
| pmarreck wrote:
| In my last role as a director of engineering at a startup, I
| found that a project `flake.nix` file (coupled with simply
| asking people to use
| https://determinate.systems/posts/determinate-nix-installer/ to
| install Nix) led to the fastest "new-hire-to-able-to-
| contribute" time of anything I've seen.
|
| Unfortunately, after a few hires (hand-picked by me), this is
| what happened:
|
| 1) People didn't want to learn Nix, neither did they want to
| ask me how to make something work with Nix, neither did they
| tell me they didn't want to learn Nix. In essence, I told them
| to set the project up with it, which they'd do (and which would
| be successful, at least initially), but _forgot that I also had
| to sell them on it._ In one case, a developer spent all weekend
| (of HIS time) uninstalling Nix and making things work using the
| "usual crap" (as I would call it), all because of an issue I
| could have fixed in probably 5 minutes if he had just reached
| out to me (which he did not, to my chagrin). The first time I
| heard them comment their true feelings on it was when I pushed
| back regarding this because I would have gladly helped... I've
| mentioned this on various Slacks to get feedback and people
| have basically said "you either insist on it and say it's the
| only supported developer-environment-defining framework, or you
| will lose control over it" /shrug
|
| 2) Developers really like to have control over their own
| machines (but I failed to assume they'd also want this control
| over the project dependencies, since, after all, I was the one
| who decided to control mine with the flake.nix in the first
| place!)
|
| 3) At a startup, execution is everything and time is possibly
| too short (especially if you have kids) to learn new things
| that aren't simple, even if better... that unfortunately may
| include Nix.
|
| 4) Nix would also be perfect for deployments... except that
| there is no (to my knowledge) general-purpose, broadly-accepted
| way to deploy via Nix, except to convert it to a Docker image
| and deploy that, which (almost) defeats most of the purpose of
| Nix.
|
| I still believe in Nix but actually trying to use it to
| "perfectly control" a team's project dependencies (which I will
| insist _it does do, pretty much, better than anything else_ )
| has been a mixed bag. And I will still insist that for every 5
| minutes spent wrestling with Nix trying to get it to do what
| you need it to do, you are saving _at least_ an order of
| magnitude more time spent debugging non-deterministic
| dependency issues that (as it turns out) were only
| "accidentally" working in the first place.
| oblio wrote:
| I think if you take about 80% of your comment and replace
| "Nix" with "Haskell/Lisp" and a few other techs, you'd
| basically have the same thing. Especially point #1.
| rfoo wrote:
| In a worse world, worse is better.
| kalaksi wrote:
| From my perspective, installing Nix seems pretty invasive. I
| can understand if someone doesn't want to mess with their
| system "unnecessarily" especially if the tool and it's
| workings are foreign. And I can't really remember the last
| time I had issues with non-deterministic dependencies either.
| Dependency versions are locked. Maybe I'm missing something?
| bamboozled wrote:
| Try Devbox, you can basically ignore nix entirely and reap
| all the benefits of it.
| brunoborges wrote:
| > Personally - just let the developer own the machine they use
| for development.
|
| I wonder if Microsoft's approach for Dev Box is the right one.
| to11mtm wrote:
| _laughs in "Here's a VDI with 2vCPUs and 32GB of RAM but the
| cluster is overloaded, also you get to budget which IDEs you
| have installed because you have only a couple hundred GB of
| storage for everything including what we install on the base
| image that you will never use"_
| javier_e06 wrote:
| The article is an excellent cautionary tale. Debugging an app in
| a container is one thing. Debugging and app running inside a
| Kubernetes node is a rabbit hole that demands more hours and
| expertise.
| pphysch wrote:
| The problem with "development environments", like other
| interactive workloads, is that there is a human at the other end
| that desires a good interactive experience with every keypress.
| It's a radically different problem space than what k8s was
| designed for.
|
| From a resource provider productive, the only way to squeeze a
| margin out of that space would be to reverse engineer 100% of
| human developer behavior so that you can ~perfectly predict
| "slack" in the system that could be reallocated to other users.
| Otherwise it's just a worse DX, like TFA gives examples of. Not a
| business I'm envious too be in... Just give everyone a dedicated
| VM or desktop, and make sure there's a batch system for big
| workloads.
| rekoros wrote:
| We've been using Nix flakes and direnv (https://direnv.net/) for
| developer environments and NixOS with
| https://github.com/serokell/deploy-rs for prod/deploys - takes
| serious digging and time to set up, but excellent experience with
| it so far.
| andreweggleston wrote:
| I've been using Nix for the past year and it really feels like
| the holy grail for stable development environments. Like you
| said--it takes serious time to set up, but it seems like that's
| an unavoidable reality of easily sharable dev envs.
| aliasxneo wrote:
| Serious time to set up _and_ maintain as the project changes.
| At least, that was my experience. I really _want_ to have Nix-
| powered development environments, but I do _not_ want to spend
| the rest of my career maintaining them because developers
| refuse to "seriously dig" to understand how it works and why it
| decided to randomly break when they added a new dependency.
|
| I think this approach works best in small teams where everyone
| agrees to drink the Nix juice. Otherwise, it's caused nothing
| but strife in my company.
| rekoros wrote:
| This may be the one area where some form of autocracy has
| merit :-)
| geoctl wrote:
| I've worked on something similar to gitpod in a slightly
| different context that's part of a much bigger personal project
| related to secure remote access that I've actually spent a few
| years building now and hope to open source in a few months from
| now. While I agree on many of the points in the article, I just
| don't understand how using micro VMs by itself replaces K8s
| unless they actually start building their own K8s that
| orchestrates their micro VMs (as opposed to containers in the
| case of k8s) ending up with the same thing basically when k8s
| itself can be used to orchestrate the outer containers that run
| the micro VMs used to run the dev containers. Yes, k8s has many
| challenges when it comes to nesting containers, cgroups, creating
| rootless containers inside the outer k8s containers and other
| stuff such as multi-region scaling, but actually the biggest
| challenge that I've faced so far isn't related to networkPolicies
| or cgroups but is actually by far related to storage, both when
| it comes to (lazily) pulling big OCI images which are extremely
| unready to be used for dev containers whose sizes are typically
| in the GBs or 10s of GBs as well as also when it comes to storage
| virtualization over the underlying k8s node storage. There are
| serious attempts to accelerate image pulling (e.g. Nydus) but
| such solutions would still probably be needed whether you use
| micro VMs or rootless/userns containers in order to load and run
| your dev containers.
| cheptsov wrote:
| I can completely relate to anyone abandoning K8s. I'm working
| with dstack, an open-source alternative to K8s for AI infra [1].
| We talk to many people who are frustrated with K8s, especially
| for GPU and AI workloads.
|
| [1] https://github.com/dstackai/dstack
| Muhtasham wrote:
| I really like dstack, keep up the great work
| rohitghumare wrote:
| You just simplified Kubernetes Management System
| myestery wrote:
| Leaving this comment here so I'll always come back to read this
| as someone who was considering kubernetes for a platform like
| gitpod
| teach wrote:
| Remember that you can favorite posts.
| alecfong wrote:
| Our first implementation of brev.dev was built on top of
| kubernetes. We were also building a remote dev environment tool
| at the time. Treating dev environments like cattle seemed to be
| the wrong assumption. Turning kubernetes into a pet manager was a
| huge endeavor with long tail of issues. We rewrote our platform
| against vms and were immediately able to provide a better
| experience. Lots of tradeoffs but makes sense for dev envs.
| junkaccount wrote:
| The real reason for this shift is that kubernetes moved to
| containerd which they cannot handle. Docker was much easier.
| Differential workloads is not correct to blame.
|
| Also, there is a long tail of issues to be fixed if you do it
| with Kubernetes.
|
| Kubernetes does not just give you scaling, it gives you many
| things: run on any architecture, be close to your deployment etc.
| moondev wrote:
| https://github.com/Mirantis/cri-dockerd
| junkaccount wrote:
| Most of the kubernetes providers (GKE, EKS) do not support
| this new shim. Even on baremetal it is possibly hard to run.
| tacone wrote:
| On a side note: has anybody experience with MicroK8s? I'd love to
| learn stories about it. I'm interested in both dev and production
| experiences.
| concerndc1tizen wrote:
| Sounds more to me like they need a new CTO.
|
| And that they're desperate to tell customers that they've fixed
| their problems.
|
| Kubernetes is absolutely the wrong tool for this use case, and I
| argue that this should be obvious to someone in a CTO-level
| position, or their immediate advisors.
|
| Kubernetes excels as a microservices platform, running reasonably
| trustworthy workloads. The key features of Kubernetes are rollout
| (highly available upgrades), elasticity (horizontal scaleout),
| bin packing (resource limits), CSI (dynamically mounted block
| storage), and so on. All this relates to a highly dynamic
| environment.
|
| This is not at all what Gitpod needs. They need high performance
| disks, ballooning memory, live migrations, and isolated
| workloads.
|
| Kubernetes does not provide you sufficient security boundaries
| for untrusted workloads. You need virtualization for that, and
| ideally physically separate machines.
|
| Another major mistake they made was trying to build this on
| public cloud infrastructure. Of course the performance will be
| ridiculous.
|
| However, one major reason for using Kubernetes is sharing the
| GPU. That is, to my knowledge, not possible with virtualization.
| But again, do you want to risk sharing your data, on a shared
| GPU?
| dilyevsky wrote:
| I agree on the cloud thing. Don't agree that "high performance
| disks, ballooning memory, live migrations, and isolated
| workloads" preclude from using k8s - you can still run it as
| base layer. You get some central configuration storage, machine
| management and some other niceties for free and you can push
| your VM-specific features into your application pod. In fact,
| that's how Google Cloud is designed (except they use Borg not
| k8s but same idea).
| ed_mercer wrote:
| Why would you say that performance is bad on public cloud
| infrastructure?
| hintymad wrote:
| I was wondering if there's productivity angle too. Take Ceph vs
| Rook for example. If a Ceph cluster needs all the resources on
| its machines and the cluster manages its resources too, then
| moving to Rook does not give any additional features. All the 50K
| additional lines of code in Rook is to set up CSIs and
| statefulsets and whatnot just to get Ceph working on Kubernetes.
| vbezhenar wrote:
| I read this article and I still don't understand what's wrong
| with Kubernetes for this task. Everything you would do with
| virtual machines could be done with Kubernetes with very similar
| results.
|
| I guess team just wants to rewrite everything, it happens.
| Manager should prevent that.
| dwroberts wrote:
| > Autoscaler plugins: In June 2022, we switched to using cluster-
| autoscaler plugins when they were introduced.
|
| Does anyone have any links for cluster-autoscaler plugins?
| Searching drawing a blank, even in the cluster-autoscaler repo
| itself. Did this concept get ditched/removed?
| rahen wrote:
| Kubernetes works great for stateless workloads.
|
| For anything stateful, monolithic, or that doesn't require
| autoscaling, I find LXC more appropriate:
|
| - it can be clusterized (LXD/Incus), like K8S but unlike Compose
|
| - it exposes some tooling to the data plane, especially a load
| balancer, like K8S
|
| - it offers system instances with a complete distribution and a
| init system, like a VM but unlike a Docker container
|
| - it can orchestrate both VMs (including Windows VMs) and LXC
| containers at the same time in the same cluster
|
| - LXC containers have the same performance as Docker containers
| unlike a VM
|
| - it uses a declarative syntax
|
| - it can be used as a foundation layer for anything stateful or
| stateless, including the Kubernetes cluster
|
| LXD/Incus sits somewhere between Docker Swarm and a vCenter
| cluster, which makes it one of the most versatile platform. Nomad
| is also a nice contender, it cannot orchestrate LXC containers
| but can autoscale a variety of workloads, including Java apps and
| qemu VMs.
| deepsun wrote:
| > SSD RAID 0
|
| > A simpler version of this setup is to use a single SSD attached
| to the node. This approach provides lower IOPS and bandwidth, and
| still binds the data to individual nodes.
|
| Are you sure SSD is that slow? NVMe devices are so fast that I
| hardly believe there's any need for RAID 0.
| abofh wrote:
| I feel like anyone who was building a CI solution to sell to
| others and chose kubernetes didn't really understand the problem.
|
| You're running hot pods for crypto miners and against people who
| really want to see the rest of the code that box has ever seen.
| You should be isolating with something purpose built like
| firecracker, and do your own dispatch & shred for security.
| geoctl wrote:
| Firecracker is more comparable to container runtimes than to
| orchestrators such as K8s. You still need an orchestrator to
| schedule, manage and garbage-collect all your uVMs on top of
| your infrastructure exactly like you would do with containers
| via k8s. In other words, you will probably have to either use
| k8s or build your own k8s to run "supervisor"
| containers/processes that launch uVMs which in turn launch the
| customer dev containers.
| abofh wrote:
| For sure, but that's the point - containers aren't really
| good for an adversarial CI solution. You can run that shit in
| house on kubernetes on a VM in a simulated VR if you want.
| But if you have adversarial builds, you have a) builds that
| may well need close to root, and b) customers who may well
| want to break your shit. Containers are not the right
| solution for that, VM's get you mostly there, and the right
| answer is burning bare metal instances with fire after every
| change-of-tenant - but nobody does that (anymore), because
| VM's are close enough and it's faster to zero out a virtual
| disk than a real one.
|
| So if you started with kubernetes and fought the whole
| process of why it's not a great solution to the problem, I
| have to assume you didn't understand the problem. I :heart:
| kubernetes, its complexity pays my bills - but it's barely a
| good CI solution when you trust everyone involved, it's
| definitely not a good one where you're trying to be general-
| purpose to everyone with a makefile.
| eYrKEC2 wrote:
| Have folks seen success with https://earthly.dev/ as a tool in
| their dev cycle?
| riiii wrote:
| > development environments
|
| Kubernetes has never ever struck me as a good idea for a
| development environment. I'm surprised it took the author this
| long to figure out.
|
| K8s can be a lifesaver for production, staging, testing, ...
| depending on your requirements and infrastructure.
___________________________________________________________________
(page generated 2024-11-04 23:00 UTC)