[HN Gopher] Plane: Per-user backends for web apps
       ___________________________________________________________________
        
       Plane: Per-user backends for web apps
        
       Author : paulgb
       Score  : 206 points
       Date   : 2022-10-12 15:52 UTC (1 days ago)
        
 (HTM) web link (driftingin.space)
 (TXT) w3m dump (driftingin.space)
        
       | l72 wrote:
       | I've always thought it'd be neat to use something like this in
       | addition to the broadway[1] backend for GTK+ to stream X11 GUI
       | apps directly and independently for each user.
       | 
       | [1] https://docs.gtk.org/gtk4/broadway.html
        
       | andrewguenther wrote:
       | Super excited to see an open source implementation of this!
       | 
       | I built a similar service for AWS back around 2015 to do web-
       | based pixel streaming applications. That service's legacy still
       | lives on today in many descendants, but I was always bummed that
       | no team was willing to invest in making it generic. Everyone who
       | needed it either forked it or re-implemented around the original
       | design.
       | 
       | Warms my heart to see something like it on the outside. It's a
       | super powerful concept. Great work!
        
       | hayst4ck wrote:
       | From an operational perspective this seems like a nightmare. Is
       | it fair to characterize this as SPOF (single point of failure) as
       | a service?
       | 
       | Does the service expose time series metrics?
       | 
       | How would I detect and remedy a hot shard?
       | 
       | Are resource caps well defined before hand/are all instances
       | expected to have similar resource consumption?
       | 
       | What would administratively draining an instance look like?
        
       | wizwit999 wrote:
       | Very awesome, this should unlock some pretty cool usecases.
        
       | thelastbender12 wrote:
       | This looks super cool! Typically, products/projects claiming
       | "browser is the new OS" either,
       | 
       | * run a full blown browser in the cloud and stream back pixels
       | 
       | * emulate a native application in the browser (like, github.dev)
       | 
       | Both are okay but per user backends feel like a nicer primitive
       | to build applications with. Apps that run locally in the browser
       | but can access cloud compute/storage on demand.
        
       | [deleted]
        
       | paulgb wrote:
       | Hey HN!
       | 
       | Plane came from our desire to build tools that have the low
       | friction to use of running in the browser, but use more memory
       | and compute than the browser will allocate. The basic idea is to
       | run a remote background process, connect to it over WebSocket,
       | and stream data.
       | 
       | This ends up being a surprisingly useful primitive to have, and
       | it's been used to:
       | 
       | - Pixel-stream X11 applications to the browser over WebRTC[1]
       | 
       | - Run IDEs and notebooks
       | 
       | - Power Figma-style realtime collaboration backends (including
       | https://rayon.design).
       | 
       | Here's a direct link to our repo: https://github.com/drifting-in-
       | space/plane and docs: https://plane.dev/
       | 
       | [1] https://twitter.com/drifting_corp/status/1552773567649091584
        
         | ushakov wrote:
         | Hey Paul, congrats on your HN launch!
        
       | elanning wrote:
       | Wow this could be incredibly useful for a lot of applications.
       | I'm excited to see the new wave of tools this could spawn.
        
       | _jezell_ wrote:
       | The implementation makes some weird choices like rebuilding a
       | bunch of services like DNS, cert, weird dependency on SQLite.
       | Wish people would stop reimplementing Kubernetes and just build
       | on top of it.
       | 
       | I think "per-user" is probably the wrong killer feature for
       | something like this. Much more potential in shared distributed
       | processes that support multiple users (chat, CRDT/coauthoring).
       | Appears that the underlying layer can probably do that.
       | 
       | In any case, super cool idea, and I hope something like this
       | lands in the serverless platforms from all the major cloud
       | providers. It's always been mind blowing to me that Google Cloud
       | Functions supports websockets without allowing you to route
       | multiple incoming connections from different users to the same
       | process. That simple change would unlock so many useful
       | scenarios.
        
         | paulgb wrote:
         | Thanks for taking the time to look through the architecture.
         | There are definitely some choices that would have seemed weird
         | to me when we set out to build this, but that we did not make
         | lightly.
         | 
         | We actually initially built this on Kubernetes, twice. The MVP
         | was Kubernetes + nginx where we created pods through the API
         | and used the built-in DNS resolver. The post-MVP attempt fully
         | embraced k8s, with our own CRD and operator pattern. It still
         | exists in another branch of the repo[1].
         | 
         | Our decision to move off came because we realized we cared
         | about a different set of things than Kubernetes did. For
         | example, cold start time generally doesn't matter that much to
         | a stateless server architecture (k8s' typical use), but is
         | vital for us because a user is actively waiting on each cold
         | start. Moving away from k8s let us own the scheduling process,
         | which helped us reduce cold start times significantly. There
         | are other things we gain from it, some of which I've talked
         | about in this comment tree[2]. I will say, it seemed like a
         | crazy decision when I proposed it, but I have no regrets about
         | it.
         | 
         | The point of sqlite was to allow the "drone" version to be
         | updated in place without killing running backends. It also
         | allows (but does not require) the components of the drone to
         | run as separate containers. I originally wanted to use LMDB,
         | but landed on sqlite. It's a pretty lightweight dependency, it
         | provides another point of introspection for a running system
         | (the sqlite cli), and it's not something people otherwise have
         | to interact with. I wrote up my thought process for it at the
         | time in this design doc[3].
         | 
         | You're right about shared backends among multiple users being
         | supported by Plane. I use per-user to convey that we treat
         | container creation as so cheap and ephemeral you could give one
         | to every user, but users can certainly share one and we've done
         | that for exactly the data sync use case you describe.
         | 
         | [1] https://github.com/drifting-in-space/plane/tree/original-
         | kub...
         | 
         | [2] https://news.ycombinator.com/item?id=32305234
         | 
         | [3] https://docs.google.com/document/d/1CSoF5Fgge_t1vY0rKQX--
         | dWu...
        
           | POPOSYS wrote:
           | Hi Paul, thanks for your explanation - you should add that to
           | the documentation, e.g. in a chapter "Why not K8S?".
           | 
           | Also you should give some advice about how to deploy when the
           | default for deploying apps in an organization is K8S, what
           | might be not too exotic nowadays. Will Plane need it's own
           | cluster? Does it run on top of K8S? How is the relation to
           | K8S in general for a deployment scenario?
           | 
           | THANKS!
        
             | paulgb wrote:
             | Good idea on both counts. Documentation will be one of my
             | priorities over the coming months and it's great to have
             | feedback on what's missing.
             | 
             | Re. the "Why not k8s" question, you might enjoy this post
             | from a couple months back; although it only touches on
             | Plane briefly, it shows the framework we used to make the
             | decision. https://driftingin.space/posts/complexity-
             | kubernetes
        
           | _jezell_ wrote:
           | https://developer.ibm.com/articles/reducing-cold-start-
           | times...
           | 
           | Knative has solved most of those pod start time problems
           | since it's dealing with a similar scenario, unless 0.008s
           | startup time isn't good enough for you.
        
           | schainks wrote:
           | It's funny how SQLite gets so much flak, but every time I've
           | used it in production, it just _worked_.
        
             | rmetzler wrote:
             | I don't think I have ever read something negative about
             | SQLite.
             | 
             | I also don't read the GP comment as being negative toward
             | SQLite. It sounds more like the author was surprised about
             | the architecture, since a naive view would think Kubernetes
             | would be good enough.
        
       | vcryan wrote:
       | This seems similar to an Elixir/Phoenix use case where you have a
       | GenServer per user. At first glance, it seems like that approach
       | would be functionally equivalent.
        
         | paulgb wrote:
         | Yes, the BEAM/OTP/{Erlang/Elixir} stack is unique in that it
         | provides similar primitives as part of the runtime.
         | 
         | My impression of that approach is that it's good for IO-bound
         | work and stateful business logic, but less so for the
         | CPU/memory-bound applications that we're targeting. I'd love to
         | know if there are counterexamples to that though. It's
         | admittedly been over a decade since I touched Erlang and I'm
         | not up to date, only peripherally familiar with Elixir and
         | Phoenix.
        
           | dugmartin wrote:
           | Yes, for CPU bound processed on the BEAM you'll want to use a
           | NIF (native implemented function) but that leaves you open to
           | taking down the entire VM with bad NIF code (segfaults,
           | infinite loops, etc). A purported safer means to create NIFs
           | is to use Rustler (https://github.com/rusterlium/rustler)
           | which lets you easily write NIFs in Rust instead of C. I
           | haven't used it but I've heard good things.
        
       | nulld3v wrote:
       | Feels like we are kinda back to using inetd
        
         | dcmccallum wrote:
         | Say more!
        
           | nulld3v wrote:
           | inetd spawned a process for each incoming request/connection.
           | This seems kinda similar except instead of a process it's a
           | whole container.
        
             | paulgb wrote:
             | It is similar! Years ago I first tried websocketd, which
             | was explicitly inspired by initd, and wished I could build
             | applications on top of something like it. Plane is sort of
             | a natural evolution of that.
             | 
             | Of course, one of the big differences with initd is that it
             | runs on a cluster of machines instead of locally, which
             | turns out to be most of the difficulty.
             | 
             | http://websocketd.com/
        
               | endorphine wrote:
               | Correction (you probably know this but for the rest):
               | it's "inetd" not "initd".
        
               | paulgb wrote:
               | Oops, thanks. Too late to edit now.
        
       | [deleted]
        
       | kylecordes wrote:
       | A related and also very useful usage pattern: "backend instance
       | per customer".
       | 
       | Because...
       | 
       | There are many ways to implement multi-tenant SaaS; but a highly
       | underrated approach is to write a single tenant app, then use
       | infrastructure to run an instance of it per (currently logged in)
       | customer. Plus a persistent database per customer of course.
       | 
       | This has the tremendous advantage that you can add another column
       | to your pricing page easily: the "bring a truckload of money and
       | we will set you up to run this behind your firewall" tier. There
       | are still a lot of orgs out there, large ones with considerable
       | financial capacity, who really want this.
        
         | iLoveOncall wrote:
         | I think you are putting together two concepts that are actually
         | different: "backend instance per customer" and "deploy the
         | service in the customer's infrastructure".
         | 
         | I don't have much to say about the 2nd one, but my team does
         | the first one for our (internal) customers, so we deploy and
         | manage in our own accounts a whole service stack for each of
         | our customers. It's a nightmare.
         | 
         | We are going to do some rearchitecture work next year to move
         | away from it, because it drains so much of our time in
         | operational load.
         | 
         | One example of a major pain point that we have is encountering
         | random failures in 3rd party services, such as AWS. For a more
         | classic service that you deploy 2 or 3 times a week, if
         | CloudFormation deployments fail once every thousand deployments
         | for random errors, you'll have one failure per year. Well, we
         | deploy thousands of instances of our service 2-3 times a week.
         | So we have failures every single time we deploy.
         | 
         | Oncall is a nightmare (I don't love oncall anymore, I created
         | my login in my previous team) because we just fix a tsunami of
         | similar tickets that are each from a different instance of our
         | service, for a single customer, but often with a different root
         | cause.
         | 
         | We probably have half of our headcount dedicated to initiatives
         | that wouldn't need to exist if we had a more classic 1 service
         | = 1 instance approach.
         | 
         | Just don't do it.
        
           | travisjungroth wrote:
           | I can see how this is _horrible_ for internal tools. The
           | incentives are all messed up.
        
           | kylecordes wrote:
           | The connection between the concepts is: if your multi-tenant
           | strategy is to deploy (hopefully only while in active use) an
           | instance per customer, then the additional effort to provide
           | on-site installations is relatively low, compared to other
           | multi-tenant strategies.
           | 
           | Whether on site installs are a good thing to offer, of course
           | depends on how valuable that is to your customers versus the
           | cost/effort to support it.
           | 
           | As you point out there are major trade-offs! If your product
           | architecture requires substantial multi-service complexity
           | per tenant, that points away from the instance per customer
           | strategy.
        
         | [deleted]
        
         | bluejekyll wrote:
         | This model works but requires that the minimum costs of the
         | stack for supporting a single tenant are low enough for the
         | smallest tenant and revenue stream from them.
         | 
         | Often times an overlooked aspect of this is that, even without
         | a freemium option, that revenue can be 0. Consider all the
         | demos for potential customers, and all the examples setup for
         | testing, etc. these will all cost money, and if they can't be
         | shared, than those costs may make it unworkable on a per-
         | instance basis.
        
           | travisjungroth wrote:
           | They could still be shared. Each customer is an org, not a
           | single user. This model would only work at business pricing
           | levels anyway.
           | 
           | However many environments you have of Dev, Staging, QA,
           | Integration, whatever is the same as normal. Sales gets an
           | instance, but I've seen that anyway and it's just one more.
           | Can have a public Demo. As long as you don't scale O(n) with
           | prospects, it's not a meaningful cost. Even if you did that,
           | I'd bet it's a tiny fraction of your cost of sales.
        
             | bluejekyll wrote:
             | Right, but that's the point. Even orgs can be costly when
             | you consider how you plan on sharing infrastructure. My
             | point is these can add up as you have databases, k8s
             | clusters, load balancers, CDN endpoints, etc. so if your
             | strategy doesn't include driving these costs down on a per
             | instance basis, with idle usage and whatnot, it will become
             | a cost problem quickly.
        
           | kylecordes wrote:
           | Yes certainly, this approach to multi-tenancy is not well-
           | suited for apps that will support a large number of active
           | free users.
           | 
           | The idea is to only allocate a running instance while there
           | is at least one active user of that instance. So for
           | occasional-use apps the ongoing cost per idle customer would
           | be just the storage cost of the database, very close to 0.
           | Obviously for something like a chat app, email app, anything
           | else that people tend to leave open all day, a different
           | approach is better.
        
             | travisjungroth wrote:
             | > The idea is to only allocate a running instance while
             | there is at least one active user of that instance.
             | 
             | That first user is going to have a bit of a wait while you
             | turn on their server. Maybe you keep some empties warmed up
             | that just need a reconfig and restart so it's fast.
             | 
             | Personally, I'd only do any type of multi-tenant if the
             | cost of some micro instance behind an auto scaler was
             | negligible to the value of the contract anyway.
        
               | mattbee wrote:
               | I wrote a system at work that does this for VS Code
               | instances - on a commodity server and not much
               | optimisation effort it goes from a click to the UI
               | starting to appear in about 10s (mostly thank you to
               | Firecracker and Alpine) . There's a loading screen that's
               | instant though, and probes for when the VM is ready.
               | 
               | I think that would work fine for a lot of other apps, at
               | least where you're looking to start a lengthier session.
        
       | quickthrower2 wrote:
       | Sweet. Where I work we do a process per user stateful model and
       | it takes away a heap of issues when compared to the more
       | traditional share everything in the web server and let the RAM
       | blow up.
       | 
       | If the user does nothing the process still ticks along. It may be
       | doing background work or doing nothing. It can keep state and
       | periodically save. It is like a desktop app experience on the
       | web, if you like.
       | 
       | If each user or org is doing their own thing and the service is
       | not too "social" requiring cross interactions (so more like say a
       | CRM than a LinkedIn) I think it is an interesting model.
       | 
       | Let alone slow feature roll outs!
        
       ___________________________________________________________________
       (page generated 2022-10-13 23:01 UTC)