[HN Gopher] Launch HN: Drifting in Space (YC W22) - A server pro...
___________________________________________________________________
Launch HN: Drifting in Space (YC W22) - A server process for every
user
Hi HN, we're Paul and Taylor, and we're launching Drifting in Space
(https://driftingin.space). We build server software for
performance-intensive browser-based applications. We make it easy
to give every user of your app a dedicated server-side process,
which starts when they open your application and stops when they
close the tab. Many high-end web apps give every user a dedicated
connection to a server-side process. That is how they get the low
latency that you need for ambitious products like full-fledged
video editing tools and IDEs. This is hard for smaller teams to
recreate, because it takes a significant ongoing engineering
investment. That's where we come in--we make this architecture
available to everyone, so you can focus on your app instead of its
infrastructure. You can think of it like Heroku, except that each
of your users gets their own server instance. I realized that
something like this was needed while working on data-intensive
tools at a hedge fund. I noticed that almost all new application
software, whether it was built in-house or third-party SaaS, was
delivered as a browser application rather than native. Although
browsers are more powerful than ever, I knew from experience that
industrial-scale data-heavy apps posed problems, because neither
the browser or a traditional stateless server architecture could
provide the compute resources needed for low-latency interaction
with large datasets. I began talking about this with my friend
Taylor, who had encountered similar limitations while working on
data analysis and visualization tools at Datadog and Uber. We
decided to team up and build a company around solving it. We have
two products, an open source package and a managed platform.
Spawner, the open source part, provides an API for web apps to
spawn a session-lived process. It manages the process's lifecycle,
exposing it over HTTPS, tracking inbound connections, and shutting
it down when it becomes idle (i.e. when the user closes their tab).
It's open source (MIT) and available at
https://github.com/drifting-in-space/spawner. Jamsocket is our
managed platform, which uses Spawner internally. It provides the
same API, but frees you from having to deal with any cluster or
network configuration to ship code. From an app developer's point
of view, using it is similar to using platforms like Netlify or
Render. You stay in the web stack and never have to touch
Kubernetes. Here's an example. Imagine you make an application for
investigating fraud in a large transaction database. Users want to
interactively filter, aggregate, and visualize gigabytes of
transactions as a graph. Instead of sending all of the data down to
the browser and doing the work there, you would put your code in a
container and upload it to our platform. Then, whenever a fraud
analyst opens your application, you hit an API we provide to spin
up a dedicated backend for that analyst. Your browser code then
opens a WebSocket connection directly to that backend, which it
uses to stream data as the analyst applies filters or zooms/pans
the visualization. We're different from most managed platforms
because we give each user a dedicated process. That said, there are
a few other services that do run long-lived processes for each
user. Architecturally, we're most similar to Agones. Agones is
targeted at games where the client can speak UDP to an arbitrary
IP; we target applications that want to connect directly from
browsers to a hostname over HTTPS. In the Erlang world, the OTP
stack provides similar functionality, but you have to embrace
Erlang/Elixir to get the benefits of it; we are entirely language-
agnostic. Cloudflare Durable Objects support a form of long-lived
processes, but are focused on use cases around program state
synchronization rather than arbitrary high-compute/memory use
cases. We have a usage-based billing model, similar to Heroku. We
charge you for the compute you use and take a cut. Usage billing
scales to zero, so it's approachable for weekend experiments. We
have not solidified a price plan yet, but we're aiming to provide
an instance capable of running VS Code (as an example) for about 10
cents an hour, fractionally metered. High-memory and high-CPU
backends will cost more, and heavy users will get volume discounts.
Our target customers are desktop-like SaaS apps and internal data
tools. As mentioned, our core API is open source and available at
https://github.com/drifting-in-space/spawner. The managed platform
is in beta and we're currently onboarding users from a waitlist, to
make sure that we have the server capacity to scale. If you're
interested, you're welcome to sign up for it here:
https://driftingin.space. Have you built a similar infrastructure
for your application? We're interested in hearing the approaches
people have already taken to this problem and what the pain points
are.
Author : paulgb
Score : 53 points
Date : 2022-02-28 18:10 UTC (4 hours ago)
(HTM) web link (driftingin.space)
(TXT) w3m dump (driftingin.space)
| justsomeuser wrote:
| Looks like a cool project, but I am not sure I understand the
| need for one process per user.
|
| Some questions:
|
| - Why do you need one process per user? For low latency, would
| you just need to make sure you have idle CPU to serve their
| request, even if that CPU time is multiplexed onto an event loop
| (one event loop serves many users)?
|
| - Wouldn't this "event loop" actually be more efficient that one
| user/process, as there would be less context switching cost from
| the OS?
|
| - Can I just keep a map of (connection, thread_id) on my server,
| and spawn one thread per user on my own server?
|
| - Could I just load up my server with many cores, and give each
| user a SQLite database which runs each query in its own thread?
|
| - This way a multi GB database would not be loaded into RAM, the
| query would filter it down to a result set.
| paulgb wrote:
| Good questions!
|
| > Why do you need one process per user? / Wouldn't this "event
| loop" actually be more efficient that one user/process, as
| there would be less context switching cost from the OS?
|
| We're particularly interested in apps that are often CPU-bound,
| so a traditional event-loop would be blocked for long periods
| of time. A typical solution is to put the work into a thread,
| so there would still be a context switch, albeit a smaller one.
|
| The process-per-user approach makes the most sense when a
| significant amount of the data used by each user does not
| overlap with other users. VS Code (in client/server mode) is a
| good example of this -- the overhead of siloing each process is
| relatively low compared to the benefits it gives. We think more
| data-heavy apps will make the same trade-offs.
|
| > Can I just keep a map of (connection, thread_id) on my
| server, and spawn one thread per user on my own server?
|
| If you don't have to scale beyond one server, this approach
| works fine, but it makes scaling horizontally complicated
| because you suddenly can't just use a plain old load balancer.
| It's not just about routing requests to the right server;
| deciding which server to run the threads on becomes complicated
| because you ideally want to decide based on the server load of
| each. We started going down this path, realized we'd end up re-
| inventing Kubernetes, so decided to embrace it instead.
|
| > Could I just load up my server with many cores, and give each
| user a SQLite database which runs each query in its own thread?
| This way a multi GB database would not be loaded into RAM, the
| query would filter it down to a result set.
|
| If, for a particular use case, it's economical to keep the data
| ready in a database that supports the query pattern users will
| make, it's probably not a good fit for a session-lived backend.
| In database terms, where our architecture makes sense is when
| you need to create an index on a dataset (or subset of a
| dataset) during the runtime of an application. For example, if
| you have thousands of large parquet files in blob storage and
| you want a user to be able to load one and run Falcon-type[1]
| analysis on it.
|
| [1] https://github.com/vega/falcon
| wizwit999 wrote:
| This looks cool, but it is making my "solution looking for a
| problem" bell ring a bit :) Have people you talked to needed
| this? Your example seems somewhat contrived tbh.
|
| Good luck!
| paulgb wrote:
| Always a valid concern :)
|
| I've experienced the need first-hand as well as talked to
| people who experienced it. The most prominent group of users
| are development tools, because that world has already embraced
| this architecture -- software like VS Code and Jupyter already
| takes the same approach, we just generalized it. One way of
| looking at it is that our bet is that applications other than
| dev tools will embrace this architecture too.
|
| The example is only partly contrived; I began my career doing
| fraud analysis on ad market data and would run jobs overnight
| that computed an embedding layout, I wished for a way to
| recompute the embeddings on-the-fly as I filtered the data.
| wizwit999 wrote:
| Ah, the analogy to VSCode and Jupyter actually helps me
| understand it.
| kamikazeturtles wrote:
| What are your thoughts on using Drifting in Space as a code
| executor/dev environment in the browser?
| paulgb wrote:
| That's definitely a use case we're interested in. For example,
| here's a demo of spinning up a VS Code instance just by hitting
| an API endpoint: https://www.youtube.com/watch?v=ON-mHFxd04U
| kamikazeturtles wrote:
| Super interesting!
|
| Have you guys tested Drifting in Space with executing users
| code and opening ports? (like replit)
| paulgb wrote:
| Currently we only expose one port per host, and it needs to
| speak HTTP. I do have a use-case in mind that requires
| exposing arbitrary TCP/UDP ports, as long as they're
| specified at "spawn time", which might not quite get at the
| functionality replit has if it allows you to map ports
| dynamically while a service is running.
|
| So I guess the answer is "probably not in the near future,
| but maybe eventually" :)
| KloudTrader wrote:
| What are you using for the server resource provisioning for your
| hosted service? Firecracker on KVM? Current services like AWS
| Fargate/Lightsail containers/Google Cloud run are not competitive
| pricing wise for dynamic container spawning at scale unless you
| provision ahead of time. For this sort of services, your managed
| solution needs to be competitive with e.g. raw compute providers
| like DigitalOcean and Hetzner.
| paulgb wrote:
| We're running on GKE right now, which allows us to iterate
| quickly, and we'll focus on the unit economics as we scale. As
| part of our research we've talked to dozens of teams who have
| already implemented this architecture, and most of them ended
| up using EKS or GKE (a few did use Firecracker or raw VMs), so
| they're already subject to those prices and it isn't a problem
| for them. We know that the unit economics may never make sense
| for hosting free tools and services, but we're focused on high-
| value SaaS and internal tools. For our target users, our value
| proposition is that we replace engineering/devops effort, not
| just the raw compute we provide.
| smashah wrote:
| Interesting. Will it be possible to control sweeper via API also?
|
| I'm a solo open-source maintainer and have a popular project that
| people want to orchestrate many instances of. Each instance
| (a.k.a session) is stateful and individually configurable. I'm
| excited to test out spawner. Any company that makes it super
| simple for open-source maintainers make money by providing a
| managed service will be a huge success - from my initial
| thoughts, this looks to fit the bill.
| paulgb wrote:
| Is the use-case you have in mind for a Sweeper API being able
| to shut down a pod based on an external event? We don't have a
| nice HTTP API for that yet (you could go through the Kubernetes
| API), but only because I haven't gotten around to implementing
| it. Would that serve the use case you have in mind?
|
| If I can help with anything as you look into it, do let me
| know!
| mwcampbell wrote:
| I don't know about the GP, but I would actually like to be
| able to keep the pod alive while it's doing some processing,
| in case the user wants to run a long process, go away, then
| come back later when it's done. Yes, I know there are other
| tools for orchestrating pure batch jobs, but I imagine some
| applications are a mix of interactivity and long-running
| computations.
| paulgb wrote:
| That makes sense. We currently don't support that directly,
| although we have a "grace period" which is how long it
| waits for a service to be idle before shutting it down. You
| could set to a very high number and then have the service
| manage its own termination when it became idle. But that's
| a bit of a hack, first-class support for that use case is
| something I'll think about.
| mwcampbell wrote:
| Here's one way you could implement first-class support
| for this use case. It's a bit of a hack, but it's simple.
| IIUC, the proxy is a sidecar, meaning it runs in the same
| network namespace as the main container. So the proxy
| could listen on a particular port on localhost, and as
| long as a connection is open to that port, the sweeper
| wouldn't touch that pod. Then the main container would
| just need to open a TCP connection for the period of time
| that it wants to make sure it stays running.
| smashah wrote:
| If I understand correctly, Sweeper clears up sessions that
| haven't received a request in a certain amount of time.
| Essentially the use case would be to leave the session
| running until I ask Sweeper to clear it via an API request.
|
| Just to illustrate where I'm coming from, what I have so far
| mimics the pm2 cli as an API with built-in reverse-proxy,
| with create (similar to init), reload, restart, start, stop
| and delete.
| boxed wrote:
| How does this compare to Phoenix liveview? As I understood it
| that also does something like this?
| paulgb wrote:
| LiveView is pretty neat. The last time I used Erlang was before
| Phoenix and Elixir came on the scene so I can't talk from
| personal experience, but my understanding is that LiveView is a
| good easy way to add state synchronization to an app but using
| it for anything high-CPU/memory becomes limiting. If you've
| tried it, I'm curious to know if that matches your experience,
| because I confess it's not something I've tried directly.
| cultofmetatron wrote:
| if you're using phoenix for anything high cpu, you can easily
| calll out to python or write a native function using rust.
| That said, there's also the nx library that lets you do a lot
| of complex numerical processing within elixir (it calls out
| to googles linear algebra libraries under the hood"
| ConnorLeet wrote:
| Do you have any resources to point me towards that elaborate on
| the benefits of a process-per-tenant/user for performance?
|
| I work on a data-intensive app that fits the use-case you
| describe but I'm confused about the benefits for performance.
| (can certainly see how the code would end up nice/simpler) Is
| this mostly applicable to certain stacks?
| paulgb wrote:
| > Do you have any resources to point me towards that elaborate
| on the benefits of a process-per-tenant/user for performance?
|
| Not yet, but we're working on some demos of things that are
| easier with session-lived backends. One way to think about it
| is that it's good for repeated queries against the same subset
| of data -- if you have a dataset of petabytes and your typical
| use case has users (through filters or queries) repeatedly
| accessing a sample of ~gigabytes of that data throughout a use
| session, you could use a session-lived backend to materialize
| that subset of data in-memory and quickly serve queries off of
| it without hitting the global index.
|
| Another case where it comes up is when you need to do some
| stateful computation after loading the data, for example, if
| you need to generate a graph or embedding layout of some data
| and refine the layout when users select/deselect data.
| cultofmetatron wrote:
| Beam's ability to spawn lightweight processes is a life saver. A
| lot of people are praising liveview for being able to write spas
| without javascript but the real killer feature is the ability to
| track a session for a user from the backend. Love how you guys
| are making that a first class consideration for folks using less
| powerful platforms.
|
| Did you guys build this on top of beam? my startup had a similar
| need for opening a process per user and we ended up using a
| combination of horde + genserver to accomplish something similar.
| In our case, we spawn a process that mainitains a websocket
| connection to an external service, maintain some state in there
| and relay updates to the user over a channel. There is one per
| client.
| paulgb wrote:
| We're not using BEAM directly, but I find it pretty neat and
| spent some time reading up on it when getting started with
| this. I'm pretty excited by https://lunatic.solutions are doing
| as well, as an approach to bringing the ideas behind BEAM to
| WebAssembly. Ultimately, I explored WebAssembly for a while and
| realized that there was more of a market if we could run
| containers instead of just WebAssembly modules. (The result of
| my work in that direction lives on as Stateroom:
| https://github.com/drifting-in-space/stateroom)
| coder543 wrote:
| I'll admit, I clicked on this "Launch HN" mostly because the name
| sounded cool, but after reading the description... the name
| doesn't seem particularly relevant to the business in any way,
| which can be fine, it's just interesting to note.
|
| I am a little confused about the product purpose and the
| definition of who your competition is in the market. I think new
| SaaS hosting providers are interesting, so please don't take any
| of this the wrong way, just hoping to give you some space to
| expand on your ideas more.
|
| > Here's an example. Imagine you make an application for
| investigating fraud in a large transaction database. Users want
| to interactively filter, aggregate, and visualize gigabytes of
| transactions as a graph. Instead of sending all of the data down
| to the browser and doing the work there, you would put your code
| in a container and upload it to our platform. Then, whenever a
| fraud analyst opens your application, you hit an API we provide
| to spin up a dedicated backend for that analyst. Your browser
| code then opens a WebSocket connection directly to that backend,
| which it uses to stream data as the analyst applies filters or
| zooms/pans the visualization.
|
| You say "put your code in a container", but... wouldn't you
| basically have to put all your gigabytes of data into a
| container? The bottleneck to the types of analytic applications
| you're describing seems unlikely to be the custom backend code,
| and far more likely to be whatever database is powering the
| application, which means that each interactive instance really
| needs to spin up a complete copy of the dataset to gain any
| performance benefit for these on-demand analytic workloads.
|
| I've worked with a number of high-scale applications, and scaling
| the backend API server has never been even remotely the main
| challenge... plus, having dedicated instances of the web server
| process wouldn't make anything faster than just having an
| appropriate number of instances, it would just make it more
| expensive. It's almost always a question of scaling the database
| -- not the API layer. For offline analytic workloads like you
| describe, you could potentially spin up fresh copies of the
| database for each user, and that would make things better, but
| the challenge of scaling (online) OLAP and OLTP comes from the
| shared-everything nature of the database itself. If you're
| intending to provide unique database instances to each user, then
| all the data needs to either be packaged up with the application,
| or stored somewhere that the application can retrieve it on
| startup and load the database, which could be a time-consuming
| process that creates painfully long cold starts.
|
| > Many high-end web apps give every user a dedicated connection
| to a server-side process. That is how they get the low latency
| that you need for ambitious products like full-fledged video
| editing tools and IDEs.
|
| > We have not solidified a price plan yet, but we're aiming to
| provide an instance capable of running VS Code (as an example)
| for about 10 cents an hour, fractionally metered.
|
| Since you bring up the examples of running GUI desktop
| applications, I'm wondering if your competition isn't actually
| AWS WorkSpaces. Someone could build an image for a WorkSpace that
| includes everything the analyst needs, and then AWS will manage
| the lifecycle of that instance as the analyst connects and
| disconnects, billing entirely based on usage. That image could
| even include vast quantities of data pre-populated into a
| database, along with a web server that offers local dedicated
| processes to serve requests from the browser in the WorkSpace, if
| the company prefers to develop their application's GUI using the
| web as a platform.
|
| Obviously the challenge with WorkSpace is if you want to offer it
| to parties outside your company, but AWS does address this use
| case to some extent: https://aws.amazon.com/blogs/security/how-
| to-secure-your-ama...
|
| A company could definitely address the nuances and automation of
| offering WorkSpace to third parties, but such a business would
| likely be extremely vulnerable to AWS just improving WorkSpace to
| include those features out of the box.
| paulgb wrote:
| > You say "put your code in a container", but... wouldn't you
| basically have to put all your gigabytes of data into a
| container? The bottleneck to the types of analytic applications
| you're describing seems unlikely to be the custom backend code,
| and far more likely to be whatever database is powering the
| application, which means that each interactive instance really
| needs to spin up a complete copy of the dataset to gain any
| performance benefit for these on-demand analytic workloads.
|
| You're right that it does depend a lot on the needs of a
| specific application. If a bunch of users are accessing the
| same dataset, and can constantly access the subset of data they
| need with low latency through a global index, and there isn't
| much need to do computation interactively at runtime, then a
| standard architecture is probably a better fit.
|
| Where this approach is useful is if every user needs access to
| a different subset of the data (e.g. if the underlying dataset
| is petabytes, and each user needs to interactively explore a
| _different_ gigabytes-big subset of it). Or if there is a lot
| of derived compute on top of it, for example, a graph
| visualization that needs to be updated when the user changes
| the subset of data in focus.
|
| > I'm wondering if your competition isn't actually AWS
| WorkSpaces
|
| The general approach of "run and render elsewhere and stream
| the pixels back" is definitely our competition in the sense
| that it's something companies currently do. What we provide is
| a way of moving the client/server boundary to wherever makes
| sense for your app: if it makes sense to render server-side and
| stream pixels, you can do that (although we don't _yet_ support
| UDP, which would be useful in this case); if it makes sense to
| do data aggregation server-side but render through WebGL, that
| 's also an option.
| qbasic_forever wrote:
| Very cool! Reminds me a bit of Jupyter and the whole code
| notebook world too. Spawner almost seems like a more general
| purpose JupyterHub, which IMHO is a good thing (jhub is
| frighteningly complex to config and setup these days).
| paulgb wrote:
| > Spawner almost seems like a more general purpose JupyterHub
|
| That's actually a very good way of putting it to people who
| understand the reference!
|
| One of the things I've been playing with is actually using
| Spawner to spin up Jupyter Lab notebooks with their new(ish)
| collaboration feature. Jupyter and VS Code both work very
| nicely with Spawner's architecture out-of-the-box, since they
| can be put into a container and accessed entirely through an
| HTTPS connection.
| qbasic_forever wrote:
| Yeah a 'spawn a VS code server instance on these files'
| microservice could be super handy for lots of things. There
| are fantastic technical doc tools like mkdocs, mdbook, etc.
| but none of them have an editing interface. You could add an
| 'edit' button to their generated HTML that opens a spawned VS
| code server instance on the files, and now you've got a
| little wiki / knowledge base that a small team can work from.
| crabmusket wrote:
| I love seeing more options appear on the horizon for doing
| stateful serverless work. This article[1] provides a little more
| motivation for the use cases:
|
| > For quite a long time (and especially in the webdev world),
| there exists a perception that to achieve scalability, all our
| request handlers need to be as stateless as possible. In the
| world of the all-popular Docker containers, it means that all the
| app containers need not only to be immutable, but also should be
| ephemeral ... keeping our request handlers stateless, does NOT
| really solve the scalability problem; instead it merely pushes it
| to the database.
|
| Though the problems and solutions pointed out in that article
| don't mean you have to go straight to process-per-X. One solution
| might be, as mentioned in passing in the OP's launch blog, to
| keep state in a cache like Redis. If the data fits this approach,
| it would ease load on the database while allowing each request
| handler to remain stateless.
|
| Durable Objects seem less focused on heavy computation, but I
| think they're really interesting as points of synchronisation for
| e.g. collaborative editing. Having all requests go into a _single
| thread_ seems important.
|
| [1]: http://ithare.com/scaling-stateful-objects/
| mwcampbell wrote:
| Does the managed service actually require that each user get
| their own container? For some applications, particularly
| collaborative ones, it would make much more sense to have a
| container for each top-level thing that the users are
| collaborating on, e.g. one per document. I think Sandstorm [1]
| got this right with its concept of grains, and I've long wanted a
| tool that brought that model, a stateful container per high-level
| object, running arbitrary code (unlike Cloudflare Durable
| Objects), to the world of hosted SaaS. Speaking of Cloudflare,
| I'm looking forward to seeing what their edge containers can do,
| when that feature is eventually made public.
|
| [1]: https://sandstorm.io/
| paulgb wrote:
| > Does the managed service actually require that each user get
| their own container? For some applications, particularly
| collaborative ones, it would make much more sense to have a
| container for each top-level thing that the users are
| collaborating on, e.g. one per document.
|
| Exactly right. We do not actually require that every user gets
| their own container; that's a decision that's entirely up to
| your app. Our API spins up an instance and returns its
| hostname, and then you can connect to it from as many clients
| as you like.
___________________________________________________________________
(page generated 2022-02-28 23:00 UTC)