[HN Gopher] De-cloud and de-k8s - bringing our apps back home
___________________________________________________________________
De-cloud and de-k8s - bringing our apps back home
Author : mike1o1
Score : 113 points
Date : 2023-03-22 16:12 UTC (6 hours ago)
(HTM) web link (dev.37signals.com)
(TXT) w3m dump (dev.37signals.com)
| rad_gruchalski wrote:
| Good for them. Now they have a one-off to manage themselves. It's
| pretty easy to de-cloud using something like k3s. So much value
| added in Kubernetes to leverage. But they have Chef and they're a
| Ruby shop, I guess they'll be good.
|
| TBH, Kubernetes has some really rough edges. Helm charts aren't
| that great and Kustomize gets real messy real fast.
| acedTrex wrote:
| This seems like an application/stack that didn't have a valid
| need for k8s in the first place. Don't just use K8s because its
| what people say you should do. evaluate the pros and the VERY
| real cons and make an informed decision.
| birdyrooster wrote:
| "Need" Eh, I do it because it's awesome for a single box or
| thousands. Single sign on, mTLS everywhere, cert-manager, BGP
| or L2 VIPs to any pod, etc and I can expand horizontally as
| needed. It's the best for an at home lab. I pity the people who
| only use Proxmox.
| bdcravens wrote:
| That's why we've had good results with ECS. Feels like 80% of
| the result for 20% of the effort, and I haven't found our use
| cases needing that missing 20%.
| sgarland wrote:
| With EC2 launch types, probably. Setting up ECS for Fargate
| with proper IaC templates/modules isn't much easier than EKS,
| IMO.
| brigadier132 wrote:
| On the Google cloud side, using Google cloud build with
| cloud run with automatic CI/CD is very straightforward. I
| setup automated builds and deploys for staging in 2 hours.
| For production I set it up to track tagged branches
| matching a regex.
| Aeolun wrote:
| Mostly because CF and CDK have spawned from the deepest
| pits of hell. It's ok when using terraform, and downright
| pleasant when using Pulumi.
| bdcravens wrote:
| We use Fargate, and what we launch is tightly coupled to
| our application (background jobs spin down and spin up
| tasks via the SDK) so for now, we aren't doing anything
| with IaC, other than CI deployment.
| dustedcodes wrote:
| If I was Netflix I would de-cloud, but if I was a small team like
| 37signals then de-clouding is just insanity. I think DHH is
| either very stupid or extremely naive in his cost calculations or
| probably a mix of both. Hey and Basecamp customers will see many
| issues in the next few years and hackers will feast off their on-
| premise infrastructure.
| mike1o1 wrote:
| To me the really surprising thing is that they still use
| Capistrano for deploying basecamp!
| camsjams wrote:
| Maybe because Capistrano is written in Ruby and the language
| matches their internal products? That was my only guess.
| mbreese wrote:
| I was guessing that they kept using Capistrano because it
| still worked. No need to change something that's working...
|
| (Somewhat of an ironic comment when talking about an article
| about ditching K8s...)
| Melingo wrote:
| I'm so lost on so many choices this company did.
|
| You de cloud and now use some mini tool like mrskd?
|
| I'm running k8s on azure (small), gke (very big), rke2 on a setup
| with 5 nodes and k3s.
|
| I'm totally lost why they would de k8s after investing already so
| much time? They should be able to work with k8s really well at
| this point.
|
| Sry to say but for me it feels the company has a much bigger
| issue than cloud vs non cloud: no one with proper experience and
| strategy.
| Aeolun wrote:
| I feel like this is the reason so many horrible kubernetes
| stacks exist.
| drewda wrote:
| If one of the co-founders/owners is writing the devops tooling
| from scratch... well, that's a decision.
|
| Not saying it's necessarily a bad decision. But it's
| potentially driven more by personal interests than a
| dispassionate and strategic plan.
| chologrande wrote:
| It does seem like they just moved all of their infra
| components, and got rid of autoscaling.
|
| Load balancing, logging, and other associated components are
| all still there. Almost nothing changed in the actual
| architecture, just how it was hosted.
|
| I have a hard time seeing why this was beneficial.
| slackfan wrote:
| Cost, mostly.
| jrockway wrote:
| Those k8s license fees will get ya.
| rad_gruchalski wrote:
| You reckon they cannot afford running some vms for the k8s
| control plane?
| camsjams wrote:
| Is this cheaper though?
|
| For a medium-to-large app, K8s should offset a lot of the
| operational difficulties. Also you don't have to use K8s.
|
| Cloud is turn-on/turn-off, whereas on-premises you pay up front
| investment.
|
| Here are all of the hidden costs of on-prem that folks forget
| about when thinking about cloud being "expensive":
|
| - hardware
|
| - maintenance
|
| - electricity
|
| - air conditioning
|
| - security
|
| - on-call and incident response
|
| Here are all of the hidden time-consumers of on-prem that folks
| forget about when thinking about cloud being "difficult":
|
| - os patching and maintenance
|
| - network maintenance
|
| - driver patching
|
| - library updating and maintenance
|
| - BACKUPS
|
| - redundancy
|
| - disaster recovery
|
| - availability
| andrewstuart wrote:
| This is the fiction that CTOs believe - "it's simply not
| practical to run your own computers, you need cloud".
| sn0wf1re wrote:
| Every one of your examples in the second list is relevant to
| both on-prem and cloud. Also cloud also has on-call, just not
| for the hardware issues (still likely get a page for reduced
| availability of your software).
| vlunkr wrote:
| "Just not for the hardware issues" is a huge deal though.
| That's an entire skillset you can eliminate from your
| requirements if you're only in the cloud. Depending on the
| scale of your team this might be a massive amount of savings.
| ilyt wrote:
| Right. The skillset to pull the right drive from the server
| and put replacement one.
|
| Says that you know nothing at all about actually running
| hardware as the bigger problem is by far "the DC might be
| drive 1-5 hour away" or "we have no spare parts at hand",
| not "fiddling with server is super hard"
| josho wrote:
| The flip side is there is an entirely new skillset required
| to successfully leverage the cloud.
|
| I suspect those cloud skills are also higher demand and
| therefore more expensive than hiring for people to handle
| hardware issues.
|
| Personally, I appreciate the contrarian view because I
| think many businesses have been naive in their decision to
| move some of their workloads into the cloud. I'd like to
| see a broader industry study that shows what benefits are
| actually realized in the cloud.
| jrockway wrote:
| At my last job, I would have happily gone into the office
| at 3am to swap a hard drive if it meant I didn't have to
| pay my AWS bill anymore. Computers are cheap. Backups are
| annoying, but you have to do them in the cloud too.
| (Deleting your Cloud SQL instance accidentally deletes all
| the automatic backups; so you have to roll your own if you
| care at all. Things like that; cloud providers remove some
| annoyance, and then add their own. If you operate software
| in production, you have to tolerate annoyance!)
|
| Self-managed Kubernetes is no picnic, but nothing
| operational is ever a picnic. If it's not debugging a weird
| networking issue with tcpdump while sitting on the
| datacenter floor, it's begging your account rep for an
| update on your ticket twice a day for 3 weeks. Pick your
| poison.
| ilyt wrote:
| We have 7 racks, 3 people and actual hardware stuff is
| minuscule part of that. Few hundred VMs, anything from "just a
| software running on server" to k8s stack (biggest one is 30
| nodes), 2 ceph cluster (our and clients), and a bunch of other
| shit
|
| The stuff you mentioned is, amortized, around 20% (automation
| ftw). The rest of it is stuff that we would do in cloud anyway
| and cloud is in general harder to debug too (we have few
| smaller projects managed in cloud for customers.
|
| We did calculation to move to cloud few times now, never was
| even close to profotable and we woudn't save on manpower anyway
| as 24/7 on-call is still required.
|
| So I call bullshit on that.
|
| If you are startup, by all means go cloud
|
| If you are small, go ahead, not worth it.
|
| If you have spiky load, cloud or hybrid will most likely be
| cheaper.
|
| But if you have constant (by that I mean difference between
| peak and lowest traffic is "only" like 50-60%) load and need a
| bunch of servers to run it (say 3+ racks), it might actually be
| cheaper on-site.
|
| Or a bunch of dedicated servers. Then you don't need to bother
| to manage hardware, and in case of boom can even scale
| relatively quickly
| Melingo wrote:
| And don't get me wrong: whatever works for the company but k8s
| experience alone is already super helpful.
|
| A lightweight k8s stack out of the box + argocd + cert manager is
| like I fra steroids.
| rektide wrote:
| The whole kubernetes section of this writeup is two sentences.
| They went with a vendor provided kube & it was expensive &
| didn't go great.
|
| It just sounds like it was poorly executed, mostly? There's
| enough blogs & YouTube of folk setting up HA k8s on a couple
| rpi, & even the 2GB model works fine if having not-quite-half
| the ram as overhead on apiservers/etcd nodes.
|
| It's not like 37signals has hundreds of teams & thousands of
| services to juggle, so it' s not like they need a beefy control
| plane. I dont know what went wrong & there's no real info to
| guess by, but 37s seems like a semi-ideal easy lock for k8s on
| prem.
| ocdtrekkie wrote:
| De-clouding is going to be a huge trend as companies are
| pressured to save costs, and they realize on-prem is still a
| fraction of the cost of comparable cloud services.
|
| This whole cloud shift has been one of the most mind-blowing
| shared delusions in the industry, and I'm glad I've mostly
| avoided working with it outright.
| dopylitty wrote:
| The thing that gets me about it is the very real physical cost
| of all this cloud waste.
|
| The big cloud providers have clear cut thousands of acres in
| Ohio, Northern VA, and elsewhere to build their huge windowless
| concrete bunkers in support of this delusion of unlimited
| scale.
|
| Hopefully as the monetary costs become clear their growth will
| be reversed and these bunkers can be torn down
| ocdtrekkie wrote:
| For what it's worth, large providers will always need
| datacenters. But perhaps datacenters run by public cloud
| providers today will be sold off to larger businesses running
| their own infrastructure someday at a discount. Most of the
| infrastructure itself all will age out in five or ten years,
| and would've been replaced either way.
|
| Heck, datacenters in Virginia are likely to end up being sold
| directly to the federal government.
| ilyt wrote:
| They ain't going to be unused lmao. If migration happen they
| will just stop building new ones or have to compete harder on
| pricing.
| adamsb6 wrote:
| The big cloud providers are likely packing machines more
| densely and powering them more efficiently than alternatives
| like colos.
| icedchai wrote:
| On-prem has its own issues. Many small applications need little
| more than a VPS and a sane backup/recovery strategy.
| rr808 wrote:
| Our firm started the big cloud initiative last year. We have
| our own datacenters already, but all the cool startups used
| cloud. Our managers figure it'll make us cool too.
| erulabs wrote:
| Only one sentence about why they chose to abandon K8s:
|
| > It all sounded like a win-win situation, but [on-prem
| kubernetes] turned out to be a very expensive and operationally
| complicated idea, so we had to get back to the drawing board
| pretty soon.
|
| It was very expensive and operationally complicated to self-host
| k8s, so they decided to _build their own orchestration tooling_?
| Sort of undercuts their main argument that this bit isn 't even
| remotely fleshed out.
| benatkin wrote:
| Well to be fair Kubernetes doesn't always pluralize the names
| of collections, since you can run "kubectl get
| deployment/myapp". You don't want to do the equivalent of
| "select * from user" do you? That doesn't make any sense!!! And
| don't translate that to "get all the records from the user
| table"! That's "get all the records from the _users_ table ".
| (Rails defaults to plural, Django to singular for table names.
| Not sure about the equivalent for Kubernetes but in the CLI
| surprisingly you can use either)
| imiric wrote:
| Sometimes there's value in building bespoke solutions. If you
| don't need many of the features of the off-the-shelf solution,
| and find the complexity overwhelming and the knowledge and
| operational costs too high, then building a purpose-built
| solution to fit your use case exactly can be very beneficial.
|
| You do need lots of expertise and relatively simple
| applications to replace something like k8s, but 37signals seems
| up for the task, and judging by the article, they picked their
| least critical apps to start with. It sounds like a success
| story so far. Kudos to them for releasing mrsk, it definitely
| looks interesting.
|
| As a side note, I've become disgruntled at k8s becoming the
| defacto standard for deploying services at scale. We need
| different approaches to container orchestration, that do things
| differently (perhaps even rethinking containers!), and focus on
| simplicity and usability instead of just hyper scalability,
| which many projects don't need.
|
| I was a fan of Docker Swarm for a long time, and still use it
| at home, but I wouldn't dare recommend it professionally
| anymore. Especially with the current way Docker Inc. is
| managed.
| mike1o1 wrote:
| To be fair, from the article it says that they built the bulk
| of the tool and did the first migration in a 6-week cycle. mrsk
| looks fairly straight forward, and feels like Capistrano but
| for containers. The first commit of mrsk is only on January 7th
| of this year.
|
| _In less than a six-week cycle, we built those operational
| foundations, shaped mrsk to its functional form and had
| Tadalist running in production on our own hardware._
| bastawhiz wrote:
| They spent a month and a half building tooling _capable of
| handling their smallest application_, representing an
| extremely tiny fraction of their cloud usage.
| bastawhiz wrote:
| I'm not going to put this down, because it sounds like they're
| quite happy with the results. But they haven't written about a
| few things that I find to be important details:
|
| First, one of the promises of a standardized platform (be it k8s
| or something else) is that you don't reinvent the wheel for each
| application. You have one way of doing logging, one way of doing
| builds/deployments, etc. Now, they have two ways of doing
| everything (one for their k8s stuff that remains in the cloud,
| one for what they have migrated). And the stuff in the cloud is
| the mature, been-using-it-for-years stuff, and the new stuff
| seemingly hasn't been battle-tested beyond a couple small
| services.
|
| Now that's fine, and migrating a small service and hanging the
| Mission Accomplished banner is a win. But it's not a win that
| says "we're ready to move our big, money-making services off of
| k8s". My suspicion is that handling the most intensive services
| means replacing all of the moving parts of k8s with lots of
| k8s-shaped things, and things which are probably less-easily
| glued together than k8s things are.
|
| Another thing that strikes me is that if you look at their cloud
| spend [0], three of their four top services are _managed_
| services. You simply will not take RDS and swap it out 1:1 for
| Percona MySQL, it is not the same for clusters of substance. You
| will not simply throw Elasticsearch at some linux boxes and get
| the same result as managed OpenSearch. You will not simply
| install redis/memcached on some servers and get elasticache. The
| managed services have substantial margin, but unless you have
| Elasticsearch experts, memcached/redis experts, and DBAs on-hand
| to make the thing do the stuff, you're also going to likely end
| up spending more than you expect to run those things on hardware
| you control. I don't think about SSDs or NVMe or how I'll
| provision new servers for a sudden traffic spike when I set up an
| Aurora cluster, but you can't not think about it when you're
| running it yourself.
|
| Said another way, I'm curious as to how they will reduce costs
| AND still have equally performant/maintainable/reliable services
| while replacing some unit of infrastructure N with N+M (where M
| is the currently-managed bits). And also while not being able to
| just magically make more computers (or computers of a different
| shape) appear in their datacenter at the click of a button.
|
| I'm also curious how they'll handle scaling. Is scaling your k8s
| clusters up and down in the cloud really more expensive than
| keeping enough machines to handle unexpected load on standby? I
| guess their load must be pretty consistent.
|
| [0] https://dev.37signals.com/our-cloud-spend-in-2022/
| ethicalsmacker wrote:
| If only Ruby had real concurrency and the memory didn't bloat
| like crazy.... you wouldn't need 90% of the hardware.
| sys_64738 wrote:
| Larry Ellison was right.
| not_enoch_wise wrote:
| It's almost as if...
|
| ...kubernetes isn't the solution to every compute need...
| birdyrooster wrote:
| Except it was created to model virtually every solution to
| every compute need. It's not about the compute itself, it's
| about the taxonomy, composability, and verifiability of
| specifications which makes Kubernetes excellent substrate for
| nearly any computing model from the most static to the most
| dynamic. You find kubernetes everywhere because of how flexible
| it is to meet different domains. It's the next major revolution
| in systems computing since Unix.
| ranger207 wrote:
| I (roughly) believe this as well[0], but more flexibility
| generally means more complexity. Right now, if you don't need
| the flexibility that k8s offers, it's probably better to use
| a solution with less flexibility and therefore less
| complexity. Maybe in a decade if k8s has eaten the world
| there'll be simple k8s-based solutions to most problems, but
| right now that's not always the case
|
| [0] I think that in the same way that operating systems
| abstract physical hardware, memory management, process
| management, etc, k8s abstracts storage, network, compute
| resources, etc
| aliasxneo wrote:
| Always two extremes to any debate. I've personally enjoyed my
| journey with it. I've even been in an anti-k8s company
| running bare metal on the Hashi stack (won't be running back
| to that anytime soon). I think the two categories I've seen
| work best are either something lik ECS or serverless and
| Kubernetes.
| klardotsh wrote:
| Tell that to the myriad of folks making their money off of
| peddling it. You'd swear it were the only tool available based
| on the hype circles (and how many hiring manager strictly look
| for experience with it).
| ilyt wrote:
| I gotta say from dev perspective it is very convenient
| solution. But I wouldn't recommend it to anyone that runs
| anything less complex than "a few services in a database".
| The _tens of minutes_ you save in writing deploy scrips will
| be replaced by hours of figuring out how to do it k8s way.
|
| From ops perspective let's say I ran it from scratch (as in
| "writing systemd units to run k8s daemons and setting up CA
| to feed them", because back then there was not that much
| reliable automation around deploying it) and the complexity
| tax is insane. Yeah you can install some automation doing
| that but if it ever breaks (and I've seen some breaking) good
| fucking luck, non-veteran will have better chance with
| reinstalling it from scratch.
| illiarian wrote:
| Cloud Native Landscape....
| https://landscape.cncf.io/images/landscape.pdf
|
| It's more than just peddlers at this point. There are
| peddlers peddling to other peddlers, several layers deep.
| lloydatkinson wrote:
| I've been following their move to on premise with interest and
| this was a great read. I'm curious how they are wiring up GitHub
| actions with their on premise deployment. How are they doing
| this?
|
| The best I can think of for my own project is to run one of the
| self hosted GitHub actions runners on the same machine which
| could then run an action to trigger running the latest docker
| image.
|
| Without something like that you miss the nice instant push model
| cloud gives you and you have to use the pull model of polling
| some service regularly for newer versions.
| asjkaehauisa wrote:
| They could just use nomad and call it a day
| electric_mayhem wrote:
| We used to call this a "sudden outbreak of common sense"
| mfer wrote:
| I wonder how they are going to handle fault tolerance when
| machines go offline?
|
| So as to avoid being paged in the middle of the night, I grew to
| really like automation that keeps things online.
| turtlebits wrote:
| Their app is running on at least 2 machines, so the load
| balancer takes care of it.
| guilhas wrote:
| How easier is mrsk vs k3s?
| turtlebits wrote:
| Looks like an apple vs oranges comparison. They seem to have a
| low number of distinct services, so there isn't a real need for
| k3s/k8s (ie orchestration), on the other hand, they need config
| management.
| bdcravens wrote:
| I'm not sure if anyone other than 37Signals is using it at
| scale yet, so you may get a better idea by looking at the docs
| yourself.
|
| https://github.com/mrsked/mrsk
| ilyt wrote:
| Oh, YAML-based DSL to deploy stuff, how original!
|
| Now we only need template-based generator for those YAMLs and
| we will have all worst practices for orchestration right
| here, just like k8s + helm
___________________________________________________________________
(page generated 2023-03-22 23:00 UTC)