[HN Gopher] Kamal Proxy - A minimal HTTP proxy for zero-downtime...
       ___________________________________________________________________
        
       Kamal Proxy - A minimal HTTP proxy for zero-downtime deployments
        
       Author : norbert1990
       Score  : 191 points
       Date   : 2024-09-21 07:55 UTC (15 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | oDot wrote:
       | DHH mentioned they built it to move from the cloud to bare metal.
       | He glorifies the simplicity but I can't help thinking they are a
       | special use case of predictable, non-huge load.
       | 
       | Uber, for example, moved to the cloud. I feel like in the span
       | between them there are far more companies for which Kamal is not
       | enough.
       | 
       | I hope I'm wrong, though. It'll be nice for many companies to be
       | have the choice of exiting the cloud.
        
         | pinkgolem wrote:
         | I mean most B2B company have a pretty predictable load when
         | providing services to employees..
         | 
         | I can get weeks advance notice before we have a load increase
         | through new users
        
         | appendix-rock wrote:
         | You can't talk about typical cases and then bring up Uber.
        
         | olieidel wrote:
         | > I feel like in the span between them there are far more
         | companies for which Kamal is not enough.
         | 
         | I feel like this is a bias in the HN bubble: In the real world,
         | 99% of companies with any sort of web servers (cloud or
         | otherwise) are running very boring, constant, non-Uber
         | workloads.
        
           | ksec wrote:
           | Not just HN but overall the whole internet. Because all the
           | news and article, tech achievements are pumped out from Uber
           | and other big tech companies.
           | 
           | I am pretty sure Uber belongs to the 1% of the internet
           | companies in terms of scale. 37Signals isn't exactly small
           | either. They spend $3M a year on infrastructure in 2019.
           | Likely a lot higher now.
           | 
           | The whole Tech cycle needs to stop having a top down approach
           | where everyone are doing what Big tech are using. Instead we
           | should try to push the simplest tool from low end all the way
           | to 95% mark.
        
             | nchmy wrote:
             | They spend considerably less on infra now - this was the
             | entire point of moving off cloud. DHH has written and
             | spoken lots about it, providing real numbers. They bought
             | their own servers and the savings paid for it all in like 6
             | months. Now its just money in the bank til they replace the
             | hardware in 5 years.
             | 
             | Cloud is a scam for the vast majority of companies.
        
         | martinald wrote:
         | I don't think that's the real point. The real point is that
         | 'big 3' cloud providers are so overpriced that you could run
         | hugely over provisioned infra 24/7 for your load (to cope with
         | any spikes) and still save a fortune.
         | 
         | The other thing is that cloud hardware is generally very very
         | slow and many engineers don't seem to appreciate how bad it is.
         | Slow single thread performance because of using the most
         | parallel CPUs possible (which are the cheapest per W for the
         | hyperscalers), very poor IO speeds, etc.
         | 
         | So often a lot of this devops/infra work is solved by just
         | using much faster hardware. If you have a fairly IO heavy
         | workload then switching from slow storage to PCIe4 7gbyte/sec
         | NVMe drives is going to solve so many problems. If your app
         | can't do much work in parallel then CPUs with much faster
         | single threading performance can have huge gains.
        
           | jsheard wrote:
           | It's sad that what should have been a huge efficiency win,
           | amortizing hardware costs across many customers, ended up
           | often being more expensive than just buying big servers and
           | letting them idle most of the time. Not to say the efficiency
           | isn't there, but the cloud providers are pocketing the
           | savings.
        
             | toomuchtodo wrote:
             | If you want a compute co-op, build a co-op (think VCs
             | building their own GPU compute clusters for portfolio
             | companies). Public cloud was always about using marketing
             | and the illusion of need for dev velocity (which is real,
             | hypergrowth startups and such, just not nearly as prevalent
             | as the zeitgeist would have you believe) to justify the eye
             | watering profit margin.
             | 
             | Most businesses have fairly predictable interactive
             | workload patterns, and their batch jobs are not high
             | priority and can be managed as such (with the usual
             | scheduling and bin packing orchestration). Wikipedia is one
             | of the top 10 visited sites on the internet, and they run
             | in their own datacenter, for example. The FedNow instant
             | payment system the Federal Reserve recently went live with
             | still runs on a mainframe. Bank of America was saving $2B a
             | year running their own internal cloud (although I have
             | heard they are making an attempt to try to move to a public
             | cloud).
             | 
             | My hot take is public cloud was an artifact of ZIRP and
             | cheap money, where speed and scale were paramount, cost
             | being an afterthought (Russ Hanneman pre-revenue bit here,
             | "get big fast and sell"; great fit for cloud). With that
             | macro over, and profitability over growth being the go
             | forward MO, the equation might change. Too early to tell
             | imho. Public cloud margins are compute customer
             | opportunities.
        
               | miki123211 wrote:
               | Wikipedia is often brought up in these discussions, but
               | it's a really bad example.
               | 
               | To a vast majority of Wikipedia users who are not logged
               | in, all it needs to do is show (potentially pre-rendered)
               | article pages with no dynamic, per-user content. Those
               | pages are easy to cache or even offload to a CDN. FOr all
               | the users care, it could be a giant key-value store,
               | mapping article slugs to HTML pages.
               | 
               | This simplicity allows them to keep costs down, and the
               | low costs mean that they don't have to be a business and
               | care about time-on-page, personalized article
               | recommendations or advertising.
               | 
               | Other kinds of apps (like social media or messaging) have
               | very different usage patterns and can't use this kind of
               | structure.
        
               | toomuchtodo wrote:
               | > Other kinds of apps (like social media or messaging)
               | have very different usage patterns and can't use this
               | kind of structure.
               | 
               | Reddit can't turn a profit, Signal is in financial peril.
               | Meta runs their own data centers. WhatsApp could handle
               | ~3M open TCP connections per server, running the
               | operation with under 300 servers [1] and serving ~200M
               | users. StackOverflow was running their Q&A platform off
               | of 9 on prem servers as of 2022 [2]. Can you make a
               | profitable business out of the expensive complex machine?
               | That is rare, based on the evidence. If you're not a
               | business, you're better off on Hetzner (or some other
               | dedicated server provider) boxes with backups. If you're
               | down you're down, you'll be back up shortly. Downtime is
               | cheaper than five 9s or whatever.
               | 
               | I'm not saying "cloud bad," I'm saying cloud where it
               | makes sense. And those use cases are the exception, not
               | the rule. If you're not scaling to an event where you can
               | dump these cloud costs on someone else (acquisition
               | event), or pay for them yourself (either donations,
               | profitability, or wealthy benefactor), then it's
               | pointless. It's techno performance art or fancy make
               | work, depending on your perspective.
               | 
               | [1] https://news.ycombinator.com/item?id=33710911
               | 
               | [2] https://www.datacenterdynamics.com/en/news/stack-
               | overflow-st...
        
           | miki123211 wrote:
           | You can always buy some servers to handle your base load, and
           | then get extra cloud instances when needed.
           | 
           | If you're running an ecommerce store for example, you could
           | buy some extra capacity from AWS for Christmas and Black
           | Friday, and rely on your own servers exclusively for the rest
           | of the year.
        
           | sgarland wrote:
           | > The other thing is that cloud hardware is generally very
           | very slow and many engineers don't seem to appreciate how bad
           | it is.
           | 
           | This. Mostly disk latency, for me. People who have only ever
           | known DBaaS have no idea how absurdly fast they can be when
           | you don't have compute and disk split by network hops, and
           | your disks are NVMe.
           | 
           | Of course, it doesn't matter, because the 10x latency hit is
           | overshadowed by the miasma of everything else in a modern
           | stack. My favorite is introducing a caching layer because you
           | can't write performant SQL, and your DB would struggle to
           | deliver it anyway.
        
           | igortg wrote:
           | I'm using a managed Postgres instance in a well known
           | provider and holy shit, I couldn't believe how slow it is.
           | For small datasets I couldn't notice, but when one of the
           | tables reached 100K rows, queries started to take 5-10
           | seconds (the same query takes 0.5-0.6 in my standard i5 Dell
           | laptop).
           | 
           | I wasn't expecting blasting speed on the lowest tear, but 10x
           | slower is bonkers.
        
         | toberoni wrote:
         | I feel Uber is the outlier here. For every unicorn company
         | there are 1000s of companies that don't need to scale to
         | millions of users.
         | 
         | And due to the insane markup of many cloud services it can make
         | sense to just use beefier servers 24/7 to deal with the peaks.
         | From my experience crazy traffic outliers that need
         | sophisticated auto-scaling rarely happens outside of VC-fueled
         | growth trajectories.
        
       | 000ooo000 wrote:
       | Strange choice of language for the actions:
       | 
       | >To route traffic through the proxy to a web application, you
       | *deploy* instances of the application to the proxy. *Deploying*
       | an instance makes it available to the proxy, and replaces the
       | instance it was using before (if any).
       | 
       | >e.g. `kamal-proxy deploy service1 --target web-1:3000`
       | 
       | 'Deploy' is a fairly overloaded term already. Fun conversations
       | ahead. Is the app deployed? Yes? No I mean is it deployed to the
       | proxy? Hmm our Kamal proxy script is gonna need some changes and
       | a redeployment so that it deploys the deployed apps to the proxy
       | correctly.
       | 
       | Unsure why they couldn't have picked something like 'bind', or
       | 'intercept', or even just 'proxy'... why 'deploy'..
        
         | irundebian wrote:
         | Deployed = running and registered to the proxy.
        
           | vorticalbox wrote:
           | Then wouldn't "register" be a better term?
        
             | rad_gruchalski wrote:
             | It's registered but is it deployed?
        
               | IncRnd wrote:
               | Who'd a thunk it?
        
         | 8organicbits wrote:
         | If your ingress traffic comes from a proxy, what would deploy
         | mean other than that traffic from the proxy is now flowing to
         | the new app instance?
        
         | nahimn wrote:
         | "Yo dawg, i heard you like deployments, so we deployed a
         | deployment in your deployment so your deployment can deploy"
         | -Xzibit
        
           | chambored wrote:
           | "Pimp my deployment!"
        
       | shafyy wrote:
       | Also exciting that Kamal 2 (currently RC
       | https://github.com/basecamp/kamal/releases) will support auto-SSL
       | and make it easy to run multiple apps on one server with Kamal.
        
         | francislavoie wrote:
         | They're using the autocert package which is the bare minimum.
         | It's brittle, doesn't allow for horizontal scaling of your
         | proxy instances because you're subject to Let's Encrypt rate
         | limits and simultaneous cert limits. (Disclaimer: I help
         | maintain Caddy) Caddy/Certmagic solves this by writing the data
         | to a shared storage so only a single cert will be issued and
         | reused/coordinated across all instances through the storage. It
         | also doesn't have issuer fallback, doesn't do rate limit
         | avoidance, doesn't respect ARI, etc.
         | 
         | Holding requests until an upstream is available is also
         | something Caddy does well, just configure the reverse_proxy
         | with try_duration and try_interval, it will keep trying to
         | choose a healthy upstream (determined via active health checks
         | done in a separate goroutine) for that request until it times
         | out.
         | 
         | Their proxy headers handling doesn't consider trusted IPs so if
         | enabled, someone could spoof their IP by setting X-Forwarded-
         | For. At least it's off by default, but they don't warn about
         | this.
         | 
         | This looks pretty undercooked. I get that it's simple and
         | that's the point, but I would never use this for anything in
         | its current state. There's just so many better options out
         | there.
        
           | markusw wrote:
           | I just want to say I use Caddy for this exact thing and it
           | works beautifully! :D Thank you all for your work!
        
           | shafyy wrote:
           | Interesting, thanks for the detailed explanation! I'm not
           | very experienced with devops, so this is very helpful!
        
       | kh_hk wrote:
       | I don't understand how to use this, maybe I am missing something.
       | 
       | Following the example, it starts 4 replicas of a 'web' service.
       | You can create a service by running a deploy to one of the
       | replicas, let's say example-web-1. What does the other 3 replicas
       | do?
       | 
       | Now, let's say I update 'web'. Let's assume I want to do a zero-
       | downtime deployment. That means I should be able to run a build
       | command on the 'web' service, start this service somehow (maybe
       | by adding an extra replica), and then run a deploy against the
       | new target?
       | 
       | If I run a `docker compose up --build --force-recreate web` this
       | will bring down the old replica, turning everything moot.
       | 
       | Instructions unclear, can anyone chime in and help me understand?
        
         | thelastparadise wrote:
         | Why would I not just do k8s rollout restart deployment?
         | 
         | Or just switch my DNS or router between two backends?
        
           | jgalt212 wrote:
           | You still need some warm-up routine to run for the newly
           | online server before the hand-off occurs. I'm not a k8s
           | expert, but the above described events can be easily handled
           | by a bash or fab script.
        
             | ahoka wrote:
             | What events do you mean? If the app needs a warm up, then
             | it can use its readiness probe to ask for some delay until
             | it gets request routed to it.
        
               | jgalt212 wrote:
               | GET requests to pages that fill caches or those that make
               | apache start up more than n processes.
        
             | thelastparadise wrote:
             | This is a health/readiness probe in k8s. It's already
             | solved quite solidly.
        
           | joeatwork wrote:
           | I think this is part of a lighter weight Kubernetes
           | alternative.
        
             | ianpurton wrote:
             | Lighter than the existing light weight kubernetes
             | alternatives i.e. k3s :)
        
               | diggan wrote:
               | Or, hear me out: Kubernetes alternatives that don't
               | involve any parts of Kubernetes at all :)
        
           | ozgune wrote:
           | I think the parent project, Kamal, positions itself as a
           | simpler alternative to K8s when deploying web apps. They have
           | a question on this on their website: https://kamal-deploy.org
           | 
           | "Why not just run Capistrano, Kubernetes or Docker Swarm?
           | 
           | ...
           | 
           | Docker Swarm is much simpler than Kubernetes, but it's still
           | built on the same declarative model that uses state
           | reconciliation. Kamal is intentionally designed around
           | imperative commands, like Capistrano.
           | 
           | Ultimately, there are a myriad of ways to deploy web apps,
           | but this is the toolkit we've used at 37signals to bring HEY
           | and all our other formerly cloud-hosted applications home to
           | our own hardware."
        
         | sisk wrote:
         | For the first part of your question about the other replicas,
         | docker will load balance between all of the replicas either
         | with a VIP or by returning multiple IPs in the DNS request[0].
         | I didn't check if this proxy balances across multiple records
         | returned in a DNS request but, at least in the case of VIP-
         | based load balancing, should work like you would expect.
         | 
         | For the second part about updating the service, I'm a little
         | less clear. I guess the expectation would be to bring up a
         | differently-named service within the same network, and then
         | `kamal-proxy deploy` it? So maybe the expectation is for
         | service names to include a version number? Keeping the old
         | version hot makes sense if you want to quickly be able to route
         | back to it.
         | 
         | [0]: https://docs.docker.com/reference/compose-
         | file/deploy/#endpo...
        
       | blue_pants wrote:
       | Can someone briefly explain how ZDD works in general?
       | 
       | I guess both versions of the app must be running simultaneously,
       | with new traffic being routed to the new version of the app.
       | 
       | But what about DB migrations? Assuming the app uses a single
       | database, and the new version of the app introduces changes to
       | the DB schema, the new app version would modify the schema during
       | startup via a migration script. However, the previous version of
       | the app still expects the old schema. How is that handled?
        
         | andrejguran wrote:
         | Migrations have to be backwards compatible so the DB schema can
         | serve both versions of the app. It's an extra price to pay for
         | having ZDD or rolling deployments and something to keep in
         | mind. But it's generally done by all the larger companies
        
         | efortis wrote:
         | Yes, both versions must be running at some point.
         | 
         | The load balancer starts accepting connections on Server2 and
         | stops accepting new connections on Server1. Then, Server1
         | disconnects when all of its connections are closed.
         | 
         | It could be different Servers or multiple Workers on one
         | server.
         | 
         | During that window, as the other comments said, migrations have
         | to be backwards compatible.
        
         | diggan wrote:
         | First step is to decouple migrations from deploys, you want
         | manual control over when the migrations run, contrary to many
         | frameworks default of running migrations when you deploy the
         | code.
         | 
         | Secondly, each code version has to work with the current schema
         | and the schema after a future migration, making all code
         | effectively backwards compatible.
         | 
         | Your deploys end up being something like:
         | 
         | - Deploy new code that works with current and future schema
         | 
         | - Verify everything still works
         | 
         | - Run migrations
         | 
         | - Verify everything still works
         | 
         | - Clean up the acquired technical debt (the code that worked
         | with the schema that no longer exists) at some point, or run
         | out of runway and it won't be an issue
        
           | wejick wrote:
           | This is very good explanation, no judgment and simply
           | educational. Appreciated
           | 
           | Though I'm still surprised that some people run DB alteration
           | on application start up. Never saw one in real life.
        
             | diggan wrote:
             | > Though I'm still surprised that some people run DB
             | alteration on application start up
             | 
             | I think I've seen it more commonly in the Golang ecosystem,
             | for some reason. Also not sure how common it is nowadays,
             | but seen lots of deployments (contained in Ansible scripts,
             | Makefiles, Bash scripts or whatever) where the
             | migration+deploy is run directly in sequence automatically
             | for each deploy, rather than as discrete steps.
             | 
             | Edit: Maybe it's more of an educational problem than
             | something else, where learning resources either don't
             | specify _when_ to actually run migrations or straight up
             | recommend people to run migrations on application startup
             | (one example: https://articles.wesionary.team/integrating-
             | migration-tool-i...)
        
             | miki123211 wrote:
             | It makes things somewhat easier if your app is smallish and
             | your workflow is something like e.g. Github Actions
             | automatically deploying all commits on main to Fly or
             | Render.
        
             | whartung wrote:
             | We do this. It has worked very well for us.
             | 
             | There's a couple of fundamental rules to follow. First,
             | don't put something that will have insane impact into the
             | application deploy changes. 99% of the DB changes are very
             | cheap, and very minor. If the deploy is going to be very
             | expensive, then just don't do it, we'll do it out of band.
             | This has not been a problem in practice with our 20ish
             | person team.
             | 
             | Second, it was kind of like double entry accounting. Once
             | you committed the change, you can not go back and "fix it".
             | If you did something really wrong (i.e. see above), then
             | sure, but if not, you commit a correcting entry instead.
             | Because you don't know who has recently downloaded your
             | commit, and run it against their database.
             | 
             | The changes are a list of incremental steps that the system
             | applies in order, if they had not been applied before. So,
             | they are treated as, essentially, append only.
             | 
             | And it has worked really well for us, keeping the diverse
             | developers who deploy again local databases in sync with
             | little drama.
             | 
             | I've incorporated the same concept in my GUI programs that
             | stand up their own DB. It's a very simple system.
        
             | e_y_ wrote:
             | At my company, DB migrations on startup was a flag that was
             | enabled for local development and disabled for production
             | deploys. Some teams had it enabled for staging/pre-
             | production deploys, and a few teams had it turned on for
             | production deploys (although those teams only had
             | infrequent, minor changes like adding a new column).
             | 
             | Personally I found the idea of having multiple instances
             | running the same schema update job at the same time (even
             | if locks would keep it from running in practice) to be
             | concerning so I always had it disabled for deploys.
        
           | jacobsimon wrote:
           | This is the way
        
           | svvvy wrote:
           | I thought it was correct to run the DB migrations for the new
           | code first, then deploy the new code. While making sure that
           | the DB schema is backwards compatible with both versions of
           | the code that will be running during the deployment.
           | 
           | So maybe there's something I'm missing about running DB
           | migrations after the new code has been deployed - could you
           | explain?
        
             | ffsm8 wrote:
             | I'm not the person you've asked, but I've worked in devops
             | before.
             | 
             | It kinda doesn't matter which you do first. And if you
             | squint a little, it's effectively the same thing, because
             | the migration will likely only become available via a
             | deployment too
             | 
             | So yeah, the only things that's important is that the DB
             | migration can't cause an incompatibility with any currently
             | deployed version of the code - and if it would, you'll have
             | to split the change so it doesn't. It'll force another
             | deploy for the change you want to do, but it's what you're
             | forced to do if maintenance windows aren't an option. Which
             | is kinda a given for most b2c products
        
           | shipp02 wrote:
           | So if you add any constraints/data, you can't rely on them
           | being there until version n+2 or you need to have 2 paths 1
           | for the old date, 1 for new?
        
             | simonw wrote:
             | Effectively yes. Zero downtime deployments with database
             | migrations are fiddly.
        
           | globular-toast wrote:
           | There's a little bit more to it. Firstly you can deploy the
           | migration first as long as it's forwards compatible (ie. old
           | code can read from it). That migration needs to be zero
           | downtime; it can't, for example, rewrite whole tables or
           | otherwise lock them, or requests will time out. Doing a whole
           | new schema is one way to do it, but not always necessary. In
           | any case you probably then need a backfill job to fill up the
           | new schema with data before possibly removing the old one.
           | 
           | There's a good post about it here:
           | https://rtpg.co/2021/06/07/changes-checklist.html
        
         | stephenr wrote:
         | Others have described the _how_ part if you do need truly zero
         | downtime deployments, but I think it 's worth pointing out that
         | for most organisations, and most migrations, the amount of
         | downtime due to a db migration is virtually indistinguishable
         | from zero, particularly if you have a regional audience, and
         | can aim for "quiet" hours to perform deployments.
        
           | diggan wrote:
           | > the amount of downtime due to a db migration is virtually
           | indistinguishable from zero
           | 
           | Besides, once you've run a service for a while that has
           | acquired enough data for migrations to take a while, you
           | realize that there are in fact two different types of
           | migrations. "Schema migrations" which are generally fast and
           | "Data migrations" that depending on the amount of data can
           | take seconds or days. Or you can do the "data migrations"
           | when needed (on the fly) instead of processing all the data.
           | Can get gnarly quickly though.
           | 
           | Splitting those also allows you to reduce maintenance
           | downtime if you don't have zero-downtime deployments already.
        
             | stephenr wrote:
             | Very much so, we handle these very differently for $client.
             | 
             | Schema migrations are versioned in git with the app, with
             | up/down (or forward/reverse) migration scripts and are
             | applied automatically during deployment of the associated
             | code change to a given environment.
             | 
             | SQL Data migrations are stored in git so we have a record
             | but are never applied automatically, always manually.
             | 
             | The other thing we've used along these lines, is having one
             | or more low priority job(s) added to a queue, to apply some
             | kind of change to records. These are essentially still data
             | migrations, but they're written as part of the application
             | code base (as a Job) rather than in SQL.
        
             | sgarland wrote:
             | Schema migrations can be quite lengthy, mostly if you made
             | a mistake earlier. Some things that come to mind are
             | changing a column's type, or extending VARCHAR length (with
             | caveats; under certain circumstances it's instant).
        
               | lukevp wrote:
               | Not OP, but I would consider this a data migration as
               | well. Anything that requires an operation on every row in
               | a table would qualify. Really changing the column type is
               | just a built in form of a data migration.
        
             | globular-toast wrote:
             | Lengthy migrations doesn't matter. What matters is whether
             | they hold long locks or not. Data migrations might take a
             | while but they won't lock anything. Schema migrations, on
             | the other hand, can easily do so, like if you add a new
             | column with a default value. The whole table must be
             | rewritten and it's locked for the entire time.
        
           | jakjak123 wrote:
           | Most are not affected by db migrations in the sense that
           | migrations are run before the service starts the web server
           | during boot. the database might block traffic for other
           | already running connections though,in which case you have a
           | problem with your database design.
        
         | gsanderson wrote:
         | I haven't tried it but it looks like Xata has come up with a
         | neat solution to DB migrations (at least for postgres). There
         | can be two versions of the app running.
         | 
         | https://xata.io/blog/multi-version-schema-migrations
        
         | efxhoy wrote:
         | Strong migrations helps writing migrations that are safe for
         | ZDD deploys. We use it in our rails app, catches quite a few
         | potential footguns. https://github.com/ankane/strong_migrations
        
       | risyachka wrote:
       | Did they mention anywhere why they decided to write their own
       | proxy instead of using Traefik or something else battle tested?
        
         | yla92 wrote:
         | They were actually using Traefik until this "v2.0.0" (pre-
         | release right now) version.
         | 
         | There are some context about why they switched and decided to
         | roll their own, from the PR.
         | 
         | https://github.com/basecamp/kamal/pull/940
        
           | stackskipton wrote:
           | As SRE, that PR scares me. There is no long explanation of
           | why we are throwing out third party, extremely battle tested
           | HTTP Proxy software for our own homegrown except "Traefik
           | didn't do what we wanted 100%".
           | 
           | Man, I've been there where you wish third party software had
           | some feature but writing your own is WORST thing you can do
           | for a company 9/10 times. My current company is dealing with
           | massive tech debt because of all this homegrown software.
        
       | mt42or wrote:
       | NIH. Nothing else to add.
        
         | elktown wrote:
         | Meanwhile in the glorious land of "never invented here":
         | https://news.ycombinator.com/item?id=38526780
        
       | ahdfyasdf wrote:
       | Is there a way to configure timeouts?
       | 
       | https://github.com/basecamp/kamal-proxy/blob/main/internal/s...
       | 
       | https://github.com/basecamp/kamal-proxy/blob/main/internal/s...
       | 
       | https://blog.cloudflare.com/exposing-go-on-the-internet/
        
       | ianpurton wrote:
       | DHH in the past has said "This setup helped us dodge the
       | complexity of Kubernetes"
       | 
       | But this looks like somehow a re-invention of what Kubernetes
       | provides.
       | 
       | Kubernetes has come a long way in terms of ease of deployment on
       | bare metal.
        
         | wejick wrote:
         | No downtime deployment is always there long before kube. It
         | does look as simple as ever been, not like kube for sure.
        
       | moondev wrote:
       | Does this handle a host reboot?
        
         | francislavoie wrote:
         | In theory it should, because they do health checking to track
         | status of the upstreams. The upstream server being down would
         | be a failed TCP connection which would fail the health check.
         | 
         | Obviously, rebooting the machine the proxy is running on is
         | trickier though. I don't feel confident they've done enough to
         | properly support having multiple proxy instances running side
         | by side (no shared storage mechanism for TLS certs at least),
         | which would allow upgrading one at a time and using a
         | router/firewall/DNS in front of it to route to both normally,
         | then switch it to one at a time while doing maintenance to
         | reboot them, and back to both during normal operations.
        
       | rohvitun wrote:
       | Aya
        
       | 0xblinq wrote:
       | 3 years from now they'll have invented their own AWS. NIH
       | syndrome in full swing.
        
         | bdcravens wrote:
         | It's a matter of cost, not NIH syndrome. In Basecamp's case,
         | saving $2.4M a year isn't something to ignore.
         | 
         | https://basecamp.com/cloud-exit
         | 
         | Of course, it's fair to say that rebuilding the components that
         | the industry uses for hosting on bare metal is NIH syndrome.
        
           | mannyv wrote:
           | A new proxy is a proxy filled with issues. It's nice that
           | it's go, but in production I'd go with nginx or something
           | else and replay traffic to kamal. There are enough weird
           | behaviors out there (and bad actors) that I'd be worried
           | about exploits etc.
        
       | viraptor wrote:
       | It's an interesting choice to make this a whole app, when the
       | zero-downtime deployments can be achieved with other servers
       | trivially these days. For example any app+web proxy which
       | supports Unix sockets can do zero-downtime by moving the file.
       | It's atomic and you can send the warm-up requests with curl.
       | Building a whole system with registration feels like an overkill.
        
         | dewey wrote:
         | That's just a small part of Kamal (https://kamal-deploy.org),
         | their deployment tool they build and used to move from the
         | cloud to their own hardware, saving millions
         | (https://basecamp.com/cloud-exit).
        
       | ksajadi wrote:
       | This primarily exists to take care of a fundamental issue in
       | Docker Swarm (Kamal's orchestrator of choice) where replacing
       | containers of a service disrupts traffic. We had the same problem
       | (when building JAMStack servers at Cloud 66) and used Caddy
       | instead of writing our own proxy and also looked at Traefik which
       | would have been just as suitable.
       | 
       | I don't know why Kamal chose Swarm over k8s or k3s (simplicity
       | perhaps?) but then, complexity needs a home, you can push it
       | around but cannot hide it, hence a home grown proxy.
       | 
       | I have not tried Kamal proxy to know, but I am highly skeptical
       | of something like this, because I am pretty sure I will be
       | chasing it for support for anything from WebSockets to SSE, to
       | HTTP/3 to various types of compression and encryption.
        
         | jauntywundrkind wrote:
         | Kamal feels built around the premise that "Kubernetes is too
         | complicated" (after Basecamp got burned by some hired help),
         | and from that justification it goes out and recreates a sizable
         | chunk of the things Kubernetes does.
         | 
         | Your list of things a reverse proxy might do is a good example
         | to me of how I expect this to go: what starts out as an
         | ambition to be simple inevitably has to grow & grow more of
         | complexity it sought to avoid.
         | 
         | Part of me strongly thinks we need competition & need other
         | things trying to create broad ideally extensible ways or
         | running systems. But a huge part of me sees Kamal & thinks,
         | man, this is a lot of work being done only to have to keep
         | walking backwards into the complexity they were trying to
         | avoid. Usually second system syndrome is the first system being
         | simple the second being overly complicated, and on the tin the
         | case is inverse, but man, the competency of Kube & it's
         | flexibility/adaptability as being a framework for Desired State
         | Management really shows through for me.
        
           | ksajadi wrote:
           | I agree with you and at the risk of self-promotion, that's
           | why we built Cloud 66 (which takes care of Day-1 (build and
           | deploy) as well as Day-2 (scale and maintenance) part of
           | infrastructure. As we all can see there is a lot to this than
           | just wrapping code in a Dockerfile and pushing it out to a
           | Swarm cluster.
        
         | hipadev23 wrote:
         | I feel like you're conflating the orchestration with proxying.
         | There's no reason they couldn't be using caddy or traefik or
         | envoy for the proxy (just like k8s ends up using them as an
         | ingress controller), while still using docker.
        
           | ksajadi wrote:
           | Docker is the container engine. Swarm is the orchestration,
           | the same as Kubernetes. The concept of "Service" in k8s takes
           | care of a lot of the proxying, while still using Docker (not
           | anymore tho). In Swarm, services exist but only take care of
           | container lifecycle and not traffic. While networking is left
           | to the containers, Swarm services always get in the way,
           | causing issues that will require a proxy.
           | 
           | In k8s for example, you can use Docker and won't need a proxy
           | for ZDD (while you might want one for Ingress and other uses)
        
       | simonw wrote:
       | Does this implement the "traffic pausing" pattern?
       | 
       | That's where you have a proxy which effectively pauses traffic
       | for a few seconds - incoming requests appear to take a couple of
       | seconds longer than usual, but are still completed after that
       | short delay.
       | 
       | During those couple of seconds you can run a blocking
       | infrastructure change - could be a small database migration, or
       | could be something a little more complex as long as you can get
       | it finished in less than about 5 seconds.
        
         | ignoramous wrote:
         | tbh, sounds like "living dangerously" pattern to me.
        
           | francislavoie wrote:
           | Not really, works quite well as long as your proxy/server
           | have enough memory to hold the requests for a little while.
           | As long as you're not serving near your max load all the
           | time, it's a breeze.
        
         | tinco wrote:
         | Have you seen that done in production? It sounds really
         | dangerous, I've worked for an app server company for years and
         | this is the first I've heard of this pattern. I'd wave it away
         | if I didn't notice in your bio that you co-created Django so
         | you've probably seen your fair share of deployments.
        
           | written-beyond wrote:
           | Just asking, isn't this what every serverless platform uses
           | while it spins up an instance? Like it's why cold starts are
           | a topic at all, or else the first few requests would just
           | fail until the instance spun up to handle the request.
        
           | simonw wrote:
           | I first heard about it from Braintree.
           | https://simonwillison.net/2011/Jun/30/braintree/
        
         | francislavoie wrote:
         | Caddy does this! (As you know, I think. I feel I remember we
         | discussed this some time ago)
        
       ___________________________________________________________________
       (page generated 2024-09-21 23:00 UTC)