[HN Gopher] Plane: Per-user backends for web apps
___________________________________________________________________
Plane: Per-user backends for web apps
Author : paulgb
Score : 206 points
Date : 2022-10-12 15:52 UTC (1 days ago)
(HTM) web link (driftingin.space)
(TXT) w3m dump (driftingin.space)
| l72 wrote:
| I've always thought it'd be neat to use something like this in
| addition to the broadway[1] backend for GTK+ to stream X11 GUI
| apps directly and independently for each user.
|
| [1] https://docs.gtk.org/gtk4/broadway.html
| andrewguenther wrote:
| Super excited to see an open source implementation of this!
|
| I built a similar service for AWS back around 2015 to do web-
| based pixel streaming applications. That service's legacy still
| lives on today in many descendants, but I was always bummed that
| no team was willing to invest in making it generic. Everyone who
| needed it either forked it or re-implemented around the original
| design.
|
| Warms my heart to see something like it on the outside. It's a
| super powerful concept. Great work!
| hayst4ck wrote:
| From an operational perspective this seems like a nightmare. Is
| it fair to characterize this as SPOF (single point of failure) as
| a service?
|
| Does the service expose time series metrics?
|
| How would I detect and remedy a hot shard?
|
| Are resource caps well defined before hand/are all instances
| expected to have similar resource consumption?
|
| What would administratively draining an instance look like?
| wizwit999 wrote:
| Very awesome, this should unlock some pretty cool usecases.
| thelastbender12 wrote:
| This looks super cool! Typically, products/projects claiming
| "browser is the new OS" either,
|
| * run a full blown browser in the cloud and stream back pixels
|
| * emulate a native application in the browser (like, github.dev)
|
| Both are okay but per user backends feel like a nicer primitive
| to build applications with. Apps that run locally in the browser
| but can access cloud compute/storage on demand.
| [deleted]
| paulgb wrote:
| Hey HN!
|
| Plane came from our desire to build tools that have the low
| friction to use of running in the browser, but use more memory
| and compute than the browser will allocate. The basic idea is to
| run a remote background process, connect to it over WebSocket,
| and stream data.
|
| This ends up being a surprisingly useful primitive to have, and
| it's been used to:
|
| - Pixel-stream X11 applications to the browser over WebRTC[1]
|
| - Run IDEs and notebooks
|
| - Power Figma-style realtime collaboration backends (including
| https://rayon.design).
|
| Here's a direct link to our repo: https://github.com/drifting-in-
| space/plane and docs: https://plane.dev/
|
| [1] https://twitter.com/drifting_corp/status/1552773567649091584
| ushakov wrote:
| Hey Paul, congrats on your HN launch!
| elanning wrote:
| Wow this could be incredibly useful for a lot of applications.
| I'm excited to see the new wave of tools this could spawn.
| _jezell_ wrote:
| The implementation makes some weird choices like rebuilding a
| bunch of services like DNS, cert, weird dependency on SQLite.
| Wish people would stop reimplementing Kubernetes and just build
| on top of it.
|
| I think "per-user" is probably the wrong killer feature for
| something like this. Much more potential in shared distributed
| processes that support multiple users (chat, CRDT/coauthoring).
| Appears that the underlying layer can probably do that.
|
| In any case, super cool idea, and I hope something like this
| lands in the serverless platforms from all the major cloud
| providers. It's always been mind blowing to me that Google Cloud
| Functions supports websockets without allowing you to route
| multiple incoming connections from different users to the same
| process. That simple change would unlock so many useful
| scenarios.
| paulgb wrote:
| Thanks for taking the time to look through the architecture.
| There are definitely some choices that would have seemed weird
| to me when we set out to build this, but that we did not make
| lightly.
|
| We actually initially built this on Kubernetes, twice. The MVP
| was Kubernetes + nginx where we created pods through the API
| and used the built-in DNS resolver. The post-MVP attempt fully
| embraced k8s, with our own CRD and operator pattern. It still
| exists in another branch of the repo[1].
|
| Our decision to move off came because we realized we cared
| about a different set of things than Kubernetes did. For
| example, cold start time generally doesn't matter that much to
| a stateless server architecture (k8s' typical use), but is
| vital for us because a user is actively waiting on each cold
| start. Moving away from k8s let us own the scheduling process,
| which helped us reduce cold start times significantly. There
| are other things we gain from it, some of which I've talked
| about in this comment tree[2]. I will say, it seemed like a
| crazy decision when I proposed it, but I have no regrets about
| it.
|
| The point of sqlite was to allow the "drone" version to be
| updated in place without killing running backends. It also
| allows (but does not require) the components of the drone to
| run as separate containers. I originally wanted to use LMDB,
| but landed on sqlite. It's a pretty lightweight dependency, it
| provides another point of introspection for a running system
| (the sqlite cli), and it's not something people otherwise have
| to interact with. I wrote up my thought process for it at the
| time in this design doc[3].
|
| You're right about shared backends among multiple users being
| supported by Plane. I use per-user to convey that we treat
| container creation as so cheap and ephemeral you could give one
| to every user, but users can certainly share one and we've done
| that for exactly the data sync use case you describe.
|
| [1] https://github.com/drifting-in-space/plane/tree/original-
| kub...
|
| [2] https://news.ycombinator.com/item?id=32305234
|
| [3] https://docs.google.com/document/d/1CSoF5Fgge_t1vY0rKQX--
| dWu...
| POPOSYS wrote:
| Hi Paul, thanks for your explanation - you should add that to
| the documentation, e.g. in a chapter "Why not K8S?".
|
| Also you should give some advice about how to deploy when the
| default for deploying apps in an organization is K8S, what
| might be not too exotic nowadays. Will Plane need it's own
| cluster? Does it run on top of K8S? How is the relation to
| K8S in general for a deployment scenario?
|
| THANKS!
| paulgb wrote:
| Good idea on both counts. Documentation will be one of my
| priorities over the coming months and it's great to have
| feedback on what's missing.
|
| Re. the "Why not k8s" question, you might enjoy this post
| from a couple months back; although it only touches on
| Plane briefly, it shows the framework we used to make the
| decision. https://driftingin.space/posts/complexity-
| kubernetes
| _jezell_ wrote:
| https://developer.ibm.com/articles/reducing-cold-start-
| times...
|
| Knative has solved most of those pod start time problems
| since it's dealing with a similar scenario, unless 0.008s
| startup time isn't good enough for you.
| schainks wrote:
| It's funny how SQLite gets so much flak, but every time I've
| used it in production, it just _worked_.
| rmetzler wrote:
| I don't think I have ever read something negative about
| SQLite.
|
| I also don't read the GP comment as being negative toward
| SQLite. It sounds more like the author was surprised about
| the architecture, since a naive view would think Kubernetes
| would be good enough.
| vcryan wrote:
| This seems similar to an Elixir/Phoenix use case where you have a
| GenServer per user. At first glance, it seems like that approach
| would be functionally equivalent.
| paulgb wrote:
| Yes, the BEAM/OTP/{Erlang/Elixir} stack is unique in that it
| provides similar primitives as part of the runtime.
|
| My impression of that approach is that it's good for IO-bound
| work and stateful business logic, but less so for the
| CPU/memory-bound applications that we're targeting. I'd love to
| know if there are counterexamples to that though. It's
| admittedly been over a decade since I touched Erlang and I'm
| not up to date, only peripherally familiar with Elixir and
| Phoenix.
| dugmartin wrote:
| Yes, for CPU bound processed on the BEAM you'll want to use a
| NIF (native implemented function) but that leaves you open to
| taking down the entire VM with bad NIF code (segfaults,
| infinite loops, etc). A purported safer means to create NIFs
| is to use Rustler (https://github.com/rusterlium/rustler)
| which lets you easily write NIFs in Rust instead of C. I
| haven't used it but I've heard good things.
| nulld3v wrote:
| Feels like we are kinda back to using inetd
| dcmccallum wrote:
| Say more!
| nulld3v wrote:
| inetd spawned a process for each incoming request/connection.
| This seems kinda similar except instead of a process it's a
| whole container.
| paulgb wrote:
| It is similar! Years ago I first tried websocketd, which
| was explicitly inspired by initd, and wished I could build
| applications on top of something like it. Plane is sort of
| a natural evolution of that.
|
| Of course, one of the big differences with initd is that it
| runs on a cluster of machines instead of locally, which
| turns out to be most of the difficulty.
|
| http://websocketd.com/
| endorphine wrote:
| Correction (you probably know this but for the rest):
| it's "inetd" not "initd".
| paulgb wrote:
| Oops, thanks. Too late to edit now.
| [deleted]
| kylecordes wrote:
| A related and also very useful usage pattern: "backend instance
| per customer".
|
| Because...
|
| There are many ways to implement multi-tenant SaaS; but a highly
| underrated approach is to write a single tenant app, then use
| infrastructure to run an instance of it per (currently logged in)
| customer. Plus a persistent database per customer of course.
|
| This has the tremendous advantage that you can add another column
| to your pricing page easily: the "bring a truckload of money and
| we will set you up to run this behind your firewall" tier. There
| are still a lot of orgs out there, large ones with considerable
| financial capacity, who really want this.
| iLoveOncall wrote:
| I think you are putting together two concepts that are actually
| different: "backend instance per customer" and "deploy the
| service in the customer's infrastructure".
|
| I don't have much to say about the 2nd one, but my team does
| the first one for our (internal) customers, so we deploy and
| manage in our own accounts a whole service stack for each of
| our customers. It's a nightmare.
|
| We are going to do some rearchitecture work next year to move
| away from it, because it drains so much of our time in
| operational load.
|
| One example of a major pain point that we have is encountering
| random failures in 3rd party services, such as AWS. For a more
| classic service that you deploy 2 or 3 times a week, if
| CloudFormation deployments fail once every thousand deployments
| for random errors, you'll have one failure per year. Well, we
| deploy thousands of instances of our service 2-3 times a week.
| So we have failures every single time we deploy.
|
| Oncall is a nightmare (I don't love oncall anymore, I created
| my login in my previous team) because we just fix a tsunami of
| similar tickets that are each from a different instance of our
| service, for a single customer, but often with a different root
| cause.
|
| We probably have half of our headcount dedicated to initiatives
| that wouldn't need to exist if we had a more classic 1 service
| = 1 instance approach.
|
| Just don't do it.
| travisjungroth wrote:
| I can see how this is _horrible_ for internal tools. The
| incentives are all messed up.
| kylecordes wrote:
| The connection between the concepts is: if your multi-tenant
| strategy is to deploy (hopefully only while in active use) an
| instance per customer, then the additional effort to provide
| on-site installations is relatively low, compared to other
| multi-tenant strategies.
|
| Whether on site installs are a good thing to offer, of course
| depends on how valuable that is to your customers versus the
| cost/effort to support it.
|
| As you point out there are major trade-offs! If your product
| architecture requires substantial multi-service complexity
| per tenant, that points away from the instance per customer
| strategy.
| [deleted]
| bluejekyll wrote:
| This model works but requires that the minimum costs of the
| stack for supporting a single tenant are low enough for the
| smallest tenant and revenue stream from them.
|
| Often times an overlooked aspect of this is that, even without
| a freemium option, that revenue can be 0. Consider all the
| demos for potential customers, and all the examples setup for
| testing, etc. these will all cost money, and if they can't be
| shared, than those costs may make it unworkable on a per-
| instance basis.
| travisjungroth wrote:
| They could still be shared. Each customer is an org, not a
| single user. This model would only work at business pricing
| levels anyway.
|
| However many environments you have of Dev, Staging, QA,
| Integration, whatever is the same as normal. Sales gets an
| instance, but I've seen that anyway and it's just one more.
| Can have a public Demo. As long as you don't scale O(n) with
| prospects, it's not a meaningful cost. Even if you did that,
| I'd bet it's a tiny fraction of your cost of sales.
| bluejekyll wrote:
| Right, but that's the point. Even orgs can be costly when
| you consider how you plan on sharing infrastructure. My
| point is these can add up as you have databases, k8s
| clusters, load balancers, CDN endpoints, etc. so if your
| strategy doesn't include driving these costs down on a per
| instance basis, with idle usage and whatnot, it will become
| a cost problem quickly.
| kylecordes wrote:
| Yes certainly, this approach to multi-tenancy is not well-
| suited for apps that will support a large number of active
| free users.
|
| The idea is to only allocate a running instance while there
| is at least one active user of that instance. So for
| occasional-use apps the ongoing cost per idle customer would
| be just the storage cost of the database, very close to 0.
| Obviously for something like a chat app, email app, anything
| else that people tend to leave open all day, a different
| approach is better.
| travisjungroth wrote:
| > The idea is to only allocate a running instance while
| there is at least one active user of that instance.
|
| That first user is going to have a bit of a wait while you
| turn on their server. Maybe you keep some empties warmed up
| that just need a reconfig and restart so it's fast.
|
| Personally, I'd only do any type of multi-tenant if the
| cost of some micro instance behind an auto scaler was
| negligible to the value of the contract anyway.
| mattbee wrote:
| I wrote a system at work that does this for VS Code
| instances - on a commodity server and not much
| optimisation effort it goes from a click to the UI
| starting to appear in about 10s (mostly thank you to
| Firecracker and Alpine) . There's a loading screen that's
| instant though, and probes for when the VM is ready.
|
| I think that would work fine for a lot of other apps, at
| least where you're looking to start a lengthier session.
| quickthrower2 wrote:
| Sweet. Where I work we do a process per user stateful model and
| it takes away a heap of issues when compared to the more
| traditional share everything in the web server and let the RAM
| blow up.
|
| If the user does nothing the process still ticks along. It may be
| doing background work or doing nothing. It can keep state and
| periodically save. It is like a desktop app experience on the
| web, if you like.
|
| If each user or org is doing their own thing and the service is
| not too "social" requiring cross interactions (so more like say a
| CRM than a LinkedIn) I think it is an interesting model.
|
| Let alone slow feature roll outs!
___________________________________________________________________
(page generated 2022-10-13 23:01 UTC)