hngopher.com

       [HN Gopher] Fast CI with MicroVMs
       ___________________________________________________________________
        
       Fast CI with MicroVMs
        
       Author : alexellisuk
       Score  : 109 points
       Date   : 2022-11-18 16:09 UTC (6 hours ago)
        
 (HTM) web link (blog.alexellis.io)
 (TXT) w3m dump (blog.alexellis.io)
        
       | ignoramous wrote:
       | Sounds similar to webapp.io (layerci) that has been discussed
       | quite a few times here:
       | https://news.ycombinator.com/item?id=31062301
       | 
       | > _Friction starts when the 7GB of RAM and 2 cores allocated
       | causes issues for us_
       | 
       | Well, I just create a 20GB _swap_. There 's ample disk space but
       | _swap_ is slow for sure.
       | 
       | > _MicroVM_
       | 
       | Coincidentally, QEMU now sports a firecracker-inspired microvm:
       | https://github.com/qemu/qemu/blob/a082fab9d25/docs/system/i3... /
       | https://mergeboard.com/blog/2-qemu-microvm-docker/
        
         | synergy20 wrote:
         | did not know qemu has its own firecraker machine now, thanks
         | for the info! going to test how fast it boots.
        
         | alexellisuk wrote:
         | Hi, I'd not heard of webapp.io before so thanks for mentioning
         | it. Actuated is not a preview branch product, that's an
         | interesting area but not the problem we're trying to solve.
         | 
         | actuated is not trying to be a CI system or a replacement for
         | one like webapp.
         | 
         | It's a direct integration with GitHub Actions, and as we get
         | interest from pilot customer for GitLab etc, we'll consider
         | adding support for those platforms too.
         | 
         | Unopinionated, without lock-in. We want to create the hosted
         | experience, with safety and speed built in.
        
         | colinchartier wrote:
         | Hey, yeah this looks somewhat similar to what we're building at
         | https://webapp.io (nee LayerCI, YC S20)
         | 
         | We migrated to a fork of firecracker, but we're a fully hosted
         | product that doesn't directly interact with GHA at all (similar
         | to how CircleCI works), so there's some positioning difference
         | between us and OP at the very least.
         | 
         | Always happy to see innovation in the space :)
        
       | fideloper wrote:
       | This project looks really neat!
       | 
       | Firecracker is very cool, I wish/hope tooling around it matures
       | enough to be super easy. I'd love to see the technical details on
       | how this is run. It looks like it's closed source?
       | 
       | The need for baremetal for Firecracker is a bit of a shame, but
       | it's still wicked cool. (You can run it on a DO droplet but
       | nested virtualization feels a bit icky?)
       | 
       | I run a CI app myself, and have looked at firecracker. Right now
       | I'm working on moving some compute to Fly.io and it's Machines
       | API, which is well suited for on-demand compute.
        
         | alexellisuk wrote:
         | Hey thanks for the interest, this is probably the best resource
         | I have on Firecracker, hope you enjoy it:
         | 
         | https://www.youtube.com/watch?v=CYCsa5e2vqg
         | 
         | For info on actuated, check out the FAQ or the docs:
         | https://docs.actuated.dev
         | 
         | We're running a pilot and looking for customers who want to
         | make CI faster for public or self-hosted runners, want to avoid
         | side-effects and security compromise of DIND / sharing a Docker
         | socket or need to build on ARM64 for speed.
         | 
         | Feel free to reach out
        
       | brightball wrote:
       | I'm curious to see how k8s isn't a good fit for this? I'm not a
       | k8s advocate for production code but at the CI level it seems
       | ideal.
        
         | alexellisuk wrote:
         | Great questions, we answer those here in the FAQ:
         | https://docs.actuated.dev/faq/
        
       | lxe wrote:
       | Firecracker is nice but still very limited to what it can do.
       | 
       | My gripe with all CI systems is that an an industry standard
       | we've universally sacrificed performance for hermeticity and re-
       | entrancy, even when it doesn't really gives us a practical
       | advantage. Downloading and re-running containers and vms,
       | endlessly checking out code, installing deps over and over is
       | just a waste of time, even with caching, COW, and other
       | optimizations.
        
         | jxf wrote:
         | > My gripe with all CI systems is that an an industry standard
         | we've universally sacrificed performance for hermeticity and
         | re-entrancy, even when it doesn't really gives us a practical
         | advantage.
         | 
         | The perceived practical advantage is the incremental confidence
         | that the thing you built won't blow up in production.
         | 
         | > even with caching, COW, and other optimizations
         | 
         | Many CI systems do employ caching. For example, Circle.
        
         | throwaway894345 wrote:
         | Honestly, I've never missed the shared mutable environment
         | approach one bit. It might have been marginally faster, but I'd
         | trade a whole bunch of performance for consistency (and the
         | optimizations mean there's not much of a performance
         | difference). Moreover, most of the time spent in CI is _not_
         | container /VM overhead, but rather crappy Docker images, slow
         | toolchains, slow tests, etc.
        
         | alexellisuk wrote:
         | When you say it's limited in what it can do, what are you
         | comparing it to? And what do you wish it could do?
         | 
         | Fly has a lot of ideas here, and we've also been able to
         | optimize how things work in terms of downloads and as for boot-
         | up speed, it's less than 1-2s before a runner is connected.
        
         | IshKebab wrote:
         | Hermeticity is precisely what allows you to avoid endlessly
         | downloading and building the same dependencies. Without
         | hermeticity you can't rely on caching.
         | 
         | I feel like 90% of the computer industry is ignoring the
         | lessons of Bazel and is probably going to wake up in 10 years
         | and go "ooooooh, that's how we should have been doing it".
        
           | jiayo wrote:
           | Can you elaborate on some of the lessons of Bazel? I've only
           | just heard of it recently, and while I'm intrigued, my
           | impression is this is similar to Facebook writing their own
           | source control: different problems at massive scale. Can a
           | SMB (~50 engineers) benefit from adopting Bazel?
        
             | hobofan wrote:
             | > Can a SMB (~50 engineers) benefit from adopting Bazel?
             | 
             | We are ~8 engineers, and yes, definitely. However there
             | should be good buy-in across the team (as it can be quite
             | invasive), and depending on your choice of
             | languages/tooling the difficulty of adoption may greatly
             | vary.
             | 
             | I was the one introducing Bazel to the company and across
             | the ~80 weeks at the company I spent maybe ~4 weeks on
             | setting up and maintaining Bazel.
             | 
             | I don't know about your current setup and challenges you
             | have with your CI system. However, compared to the generic
             | type of build system I've seen at companies of that size, I
             | would estimate that with 50 engineers having a single build
             | systems/developer tooling engineer focused on setting up
             | and maintaining Bazel should easily have a positive ROI
             | (through increased development velocity and less time
             | wasted on hunting CI bugs alone).
        
           | no_wizard wrote:
           | for the frontend space, NX gives you bazel caching /
           | features. It just doesn't cache dependencies, but _in my
           | experience_ with GitHub Actions running `pnpm install` or
           | `yarn install` is not the slowest operation, its running the
           | tools after.
        
           | throwaway894345 wrote:
           | I think everyone agrees that the Bazel/Nix approach is
           | correct, the problem is that Bazel/Nix/etc are insanely hard
           | to use. For example, I spent a good chunk of last weekend
           | trying to get Bazel to build a multiarch Go image, and I
           | couldn't figure it out. Someone needs to figure out how to
           | polish Bazel/Nix so they're viable for organizations that
           | can't invest in a team to operate and provide guidance on
           | Bazel/Nix/etc.
        
         | lijogdfljk wrote:
         | I'm a bit surprised i don't see NixOS-like tooling in container
         | orchestration for this reason.
        
           | throwaway894345 wrote:
           | There aren't any NixOS-like tooling that isn't incredibly
           | burdensome. I think Nix and NixOS have the right vision, but
           | there's way too much friction for most orgs to use.
           | Containers are imperfect, but they're way easier to work
           | with.
        
             | lijogdfljk wrote:
             | Oh yea, i use it - i get it lol. But as someone who uses
             | NixOS, for all it's flaws the community also is quite
             | passionate and pushes out quite a bit of features, ideas,
             | etc. There's little experiments in all aspects of the
             | ecosystem.
             | 
             | I'm just kinda surprised some Docker-esque thing hasn't
             | stuck. Something that works with Docker, but transforms it
             | to all the advantages of NixOS.
             | 
             | CI pipelines are just so rough and repetitive in plain
             | Docker, which is what we use.
        
               | throwaway894345 wrote:
               | > for all it's flaws the community also is quite
               | passionate and pushes out quite a bit of features, ideas,
               | etc.
               | 
               | This hasn't been my experience. There have been
               | significant issues with Nix since its inception and very
               | little progress has been made. Here are a few off the top
               | of my head:
               | 
               | * The nix expression language is dynamically typed and
               | there are virtually no imports that would point you in
               | the right direction, so it's incredibly difficult to
               | figure out what kind of data a package requires (you
               | typically have to find the callsite and recurse backwards
               | to figure out what kind of data is provided or follow the
               | data down the callstack [recurse forwards] just to
               | discern the 'type' of data).
               | 
               | * The nix expression language is really hard to learn.
               | It's really unfamiliar to most developers, which is a big
               | deal because everyone in an organization that uses Nix
               | has to interface with the expression language (it's not
               | neatly encapsulated such that some small core team can
               | worry about it). This is an enormous cost with no
               | tangible upside.
               | 
               | * Package defs in nixpkgs are _horribly_ documented.
               | 
               | * Nixpkgs is terribly organized (I think there is
               | _finally_ some energy around reorganizing, but I haven 't
               | discerned any meaningful progress yet).
               | 
               | I can fully believe that the community is responsive to
               | improvements in individual packages, but there seems to
               | be very little energy/enthusiasm around big systemic
               | improvements.
               | 
               | > I'm just kinda surprised some Docker-esque thing hasn't
               | stuck. Something that works with Docker, but transforms
               | it to all the advantages of NixOS.
               | 
               | Using something like Nix to build Docker images is
               | conceptually great. Nix is great at building artifacts
               | efficiently and Docker is a great runtime. The problem is
               | that there's no low-friction Nix-like experience to date.
        
               | ar_lan wrote:
               | It sounds like your issues with Nix stem from its steep
               | adoption curve, rather than any technical concern. This
               | _is_ a concern for a team that needs to manage it - I
               | agree.
               | 
               | I'm quite diehard in terms of personal Nix/NixOS use, but
               | I hesitate to recommend to colleagues as a solution
               | because the learning curve would likely reduce
               | productivity for quite some time.
               | 
               | That said - I do think that deterministic, declarative
               | package/dependency management is the proper future,
               | especially when it comes to runtime environments.
        
               | throwaway894345 wrote:
               | > It sounds like your issues with Nix stem from its steep
               | adoption curve, rather than any technical concern
               | 
               | Not only is it difficult to learn (although that's a huge
               | problem), but it's also difficult to _use_. For instance,
               | even once you 've "learned Nix", inferring data types is
               | an ongoing problem because there is no static type
               | system. These obstacles are prohibitive for most
               | organizations (because of the high-touch nature of build
               | tooling).
               | 
               | > This _is_ a concern for a team that needs to manage it
               | 
               | The problem is that there isn't "one team that needs to
               | manage it"; every team needs to touch the build
               | definitions or else you're bottlenecking your development
               | on one central team of Nix experts which is also an
               | unacceptable tradeoff. If build tools weren't inherently
               | high-touch, then the learning curve would be a much
               | smaller problem.
        
               | ar_lan wrote:
               | Sorry, I wasn't clear - I wasn't implying there should be
               | a central team to manage it. One of the beauties of Nix
               | is providing declarative dev environments in
               | repositories, which means to fully embrace it each
               | individual team _should_ own it for themselves.
               | 
               | At best a central team would be useful for managing an
               | artifactory/cache + maybe company-wide nixpkgs, but in
               | general singular teams need to decide for themselves if
               | Nix is helpful + then manage it themselves.
        
               | throwaway894345 wrote:
               | Agreed. It's just that when every team has to own their
               | stuff, usability issues become a bigger problem and
               | afaict the Nix team is not making much progress on
               | usability (to the extent that it seems like they don't
               | care about the organizational use case--as is their
               | prerogative).
        
         | [deleted]
        
       | no_wizard wrote:
       | Firecracker is pretty great, good to see it can be used in a CI
       | environment like this, definitely peaking my interest.
       | 
       | I know its the backbone of what runs fly.io[0] as well
       | 
       | [0]: https://fly.io/docs/reference/architecture/#microvms
        
         | emmelaich wrote:
         | *piquing
        
         | waz0wski wrote:
         | Firecracker has been running AWS Lambda & Fargate for a few
         | years now
         | 
         | https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir...
         | 
         | There's also similar microVM project with a bit more container-
         | focused support called Kata
         | 
         | https://katacontainers.io/
        
           | digianarchist wrote:
           | Not a hypervisor expert by any means but what's stopping
           | projects backporting the super fast startup time of
           | Firecracker into regular VM hypervisors?
           | 
           | I'm assuming that Firecracker is somewhat constrained in some
           | way.
        
       | imachine1980_ wrote:
       | Realy cool what is the license? , there any way I can contribute
       | code/test/documentation to this project ?
        
       | bkq wrote:
       | Good article. Firecracker is something that has definitely piqued
       | my interest when it comes to quickly spinning up a throwaway
       | environment to use for either development or CI. I run a CI
       | platform [1], which currently uses QEMU for the build
       | environments (Docker is also supported but currently disabled on
       | the hosted offering), startup times are ok, but having a boot
       | time of 1-2s is definitely highly appealing. I will have to
       | investigate Firecracker further to see if I could incorporate
       | this into what I'm doing.
       | 
       | Julia Evans has also written about Firecracker in the past too
       | [2][3].
       | 
       | [1] - https://about.djinn-ci.com
       | 
       | [2] - https://jvns.ca/blog/2021/01/23/firecracker--start-a-vm-
       | in-l...
       | 
       | [3] - https://news.ycombinator.com/item?id=25883253
        
         | alexellisuk wrote:
         | Thanks for commenting, and your product looks cool btw.
         | 
         | Yeah a lot of people have talked about Firecracker in the past,
         | that's why I focus on the pain and the problem being solved.
         | The tech is cool, but it's not the only thing that matters.
         | 
         | People need to know that there are better alternatives to
         | sharing a docker socket or using DIND with K8s runners.
        
       | throwawaaarrgh wrote:
       | > I spoke to the GitHub Actions engineering team, who told me
       | that using an ephemeral VM and an immutable OS image would solve
       | the concerns.
       | 
       | that doesn't solve them all. the main problem is secrets. if a
       | job has access to an api token that can be used to modify your
       | code or access a cloud service, a PR can abuse that to modify
       | things it shouldn't. a second problem is even if you don't have
       | secrets exposed, a PR can run a crypto miner, wasting your money.
       | finally, a self-hosted runner is a step into your private network
       | and can be used for attacks, which firecracker can help mitigate
       | but never eliminate.
       | 
       | the best solution to these problems is 1) don't allow repos to
       | trigger your CI unless the user is trusted or the change has been
       | reviewed, 2) always use least privilege and zero-trust for all
       | access (yes even for dev services), 3) add basic constraints by
       | default on all jobs running to prevent misuse, and then finally
       | 4) provide strong isolation in addition to ephemeral
       | environments.
        
         | alexellisuk wrote:
         | You still have those same problems with hosted runners, don't
         | you?
         | 
         | We're trying to re-create the hosted experience, but with self-
         | hosted, faster, infrastructure, without needing to account for
         | metered billing.
        
       | a-dub wrote:
       | this is cool. throwing firecracker at CI is something i've been
       | thinking about since i first read about firecracker.
       | 
       | i was thinking more along the lines of, can you checkpoint a
       | bunch of common initialization and startup and then massively
       | parallelize?
        
         | alexellisuk wrote:
         | You can checkpoint and restore, but only once for security
         | reasons, so it doesn't help much.
         | 
         | https://github.com/firecracker-microvm/firecracker/blob/main...
         | 
         | The VMs launch super quick in < 1s they are actually running a
         | job.
        
       | rad_gruchalski wrote:
       | Congratulations on the launch.
       | 
       | The interesting part of this is that the client supplies the most
       | difficult resource to get for this setup. As in, a machine on
       | which Firecracker can run.
        
         | alexellisuk wrote:
         | Users provide a number of hosts and run a simple agent. We
         | maintain the OS image, Kernel configuration and control plane
         | service, with support for ARM64 too.
        
           | rad_gruchalski wrote:
           | Great stuff, undeniably. There's not much going on in the
           | open source space around multi-host schedules for
           | Firecracker. So that's a mountain of work.
           | 
           | With regards to the host, I made remark because of
           | Firecracker requirements regarding virtualisation. Running
           | Firecracker is no brainer when an org maintains a fleet of
           | their own hardware.
        
       | kernelbugs wrote:
       | Would have loved to see more of the technical details involved in
       | spinning up Firecracker VMs on demand for Github Actions.
        
         | alexellisuk wrote:
         | Hey thanks for the feedback. We may do some more around this.
         | What kinds of things do you want to know?
         | 
         | To get hands on, you can run my Firecracker lab that I shared
         | in the blog post, then add a runner can be done with "arkade
         | system install actions-runner"
         | 
         | We also explain how it works here:
         | https://docs.actuated.dev/faq/
        
           | thehabbos007 wrote:
           | Not the poster you were replying to, but I've looked at your
           | firecracker init lab (cool stuff!) and just wondering how
           | that fits together with a control plane. Would be cool to see
           | how the orchestration happens in terms of messaging between
           | host/guest and how I/O is provisioned on the host
           | dynamically.
        
       | ridiculous_fish wrote:
       | The article does not say what a MicroVM is. From what I can
       | gather, it's using KVM to virtualize specifically a Linux kernel.
       | In this way, Firecracker is somewhat intermediate between Docker
       | (which shares the host kernel) and Vagrant (which is not limited
       | to running Linux). Is that accurate?
       | 
       | Is it possible to use a MicroVM to virtualize a non-Linux OS?
        
         | alexellisuk wrote:
         | Thanks for the feedback.
         | 
         | That video covers this is great detail. Click on the the video
         | under 1) and have a watch, it should answer all your questions.
         | 
         | I didn't want to repeat the content there
        
           | ridiculous_fish wrote:
           | Will do, thanks!
        
       | avita1 wrote:
       | Something I've increasingly wondered is if the model of CI where
       | a totally pristine container (or VM) gets spun on each change for
       | each test set imposes an floor on how fast CI can run.
       | 
       | Each job will always have to run a clone, always pay the cost of
       | either bootstrapping a toolchain or download a giant container
       | with the toolchain, and always have to download a big remote
       | cache.
       | 
       | If I had infinity time, I'd build a CI system that found a runner
       | that maintained some state (gasp!) about the build and went to a
       | test runner that had most of its local build cache downloaded,
       | source code cloned, and toolchain bootstrapped.
        
         | lytedev wrote:
         | I have zero experience with bazel, but I believe it offers the
         | possibility of mechanisms similar to this? Or a mechanism that
         | makes this "somewhat safe"?
        
           | hobofan wrote:
           | Yes it does, but one should be warned that adopting Bazel
           | isn't the lightest decision to make. But yeah, the CI
           | experience is one of its best attributes.
           | 
           | We are using Bazel with Github self-hosted runners, and have
           | consistent low build times with a growing codebase and test
           | suite, as Bazel will only re-build and re-test what needs to
           | be changed.
           | 
           | The CI experience compared to e.g. doing naive caching of
           | some directories with Github managed runners is amazing, and
           | it's probably the most reliable build/test setup I've had.
           | The most common failure we have of the build system itself
           | (which is still rare with ~once a week) is network issues
           | with one of the package managers, rather than quirks
           | introduced by one of the engineers (and there would be a
           | straightforward path towards preventing those failures, we
           | just haven't bothered to set that up yet).
        
         | capableweb wrote:
         | You'd love a service like that, until you have some weird stuff
         | working in CI but not in local (or vice-versa), that's why
         | things are built from scratch all the time, to prevent any such
         | issues from happening.
         | 
         | Npm was (still is?) famously bad at installing dependencies,
         | where sometimes the fix is to remove node_modules and simply
         | reinstalling. Back when npm was more brittle (yes, possible) it
         | was nearly impossible to maintain caches of node_modules
         | directories, as they ended up being different than if you
         | reinstalled with no existing node_modules directory.
        
           | ar_lan wrote:
           | I think Nix could be leveraged to resolve this. If the
           | dependencies aren't perfectly matched it downloads the
           | _different_ dependencies, but can use any locally downloaded
           | instances already.
           | 
           | So infra concerns are identical. Remove any state your
           | application itself uses (clean slate, like a local DB), but
           | your VM can functionally be persistent (perhaps you shut it
           | off when not in use to reduce spend)?
        
           | RealityVoid wrote:
           | You wouldn't catch it, it's true.
           | 
           | But I'd depends if you're willing to trade accuracy for
           | speed. I suggest the correct reaction to this is... "How much
           | speed?"
           | 
           | I presume the answer to be "a lot".
        
             | rad_gruchalski wrote:
             | My immediate reaction is "correctness each and every time".
        
               | saurik wrote:
               | I mean, given that my full build takes hours but my
               | incremental build takes seconds--and given that my build
               | system itself tends to only mess up the incremental build
               | a few times a year (and mostly in ways I can predict),
               | I'd totally be OK with "correctness once a day" or
               | "correctness on demand" in exchange for having the CI
               | feel like something that I can use constantly. It isn't
               | like I am locally developing or testing with "correctness
               | each and every time", no matter how cool that sounds: I'd
               | get nothing done!
        
               | rad_gruchalski wrote:
               | Do you really need to build the whole thing to test?
        
               | deathanatos wrote:
               | In my experience, yes.
               | 
               | A small change in a dependency, essentially, bubbles or
               | chains to all dependent steps. I.e., a change in the
               | fizzbuzz source but inherently run the fizzbuzz tests.
               | This cascades into your integration tests -- we must run
               | the integration tests that include fizzbuzz ... but those
               | now need all the other components involved; so, that sort
               | of bubbles or chains to all reverse dependencies (i.e.,
               | we need to build the bazqux service, since it is in the
               | integration test with fizzbuzz...) and now I'm building a
               | large portion of my dependency graph.
               | 
               | And in practice, to keep the logic in CI reasonably
               | simple ... the answer is "build it all".
               | 
               | (If I had better content-aware builds, I could cache
               | them: I could say, ah, bazqux's source hashes to $X, and
               | we already have a build for that hash, excellent. In
               | practice, this is really hard. If all of bazqux was
               | limited to some subtree, but inevitably one file decides
               | to include some source from outside the spiritual root of
               | bazqux, and now bazqux's hash is "the entire tree", which
               | by definition we've never built.)
               | 
               | (There's bazel, but it has its own issues.)
        
               | d4mi3n wrote:
               | This really depends a lot on context and there's no right
               | or wrong answer here.
               | 
               | If you're working on something safety critical you'll
               | want correctness _every time_. For most things short of
               | that it 's a trade-off between risk, time, and money--
               | each of which can be fungible depending on context.
        
         | Shish2k wrote:
         | > Each job will always have to run a clone
         | 
         | You can create a base filesystem image with the code and tools
         | checked out, then create a VM which uses that in a copy-on-
         | write way
        
         | mig_ wrote:
        
         | maccard wrote:
         | I work in games, our repository is ~100GB (20m download) and a
         | clean compile is 2 hours on a 16 core machine with 32GB ram
         | (c6i.4xlarge for any Aws friends). Actually building a runnable
         | version of the game takes two clean compiles (one editor and
         | one client) plus an asset processing task that takes about
         | another 2 hours clean.
         | 
         | Our toolchain install takes about 30 minutes (although that
         | includes making a snapshot of the EBS volume to make an AMI out
         | of).
         | 
         | That's ~7 hours for a clean build.
         | 
         | We have a somewhat better system than this - our base ami
         | contains the entire toolchain, and we do an initial clone on
         | the ami to get the bulk of the download done too. We store all
         | the intermediates on a separate drive and we just mount it,
         | build incrementally and unmount again. Sometimes we end up with
         | duplicated work but overall it works pretty well. Our full
         | builds are down from 7 hours (in theory) to about 30 minutes,
         | including artifact deployments.
        
         | jacobwg wrote:
         | Agreed, this is more or less the inspiration behind Depot
         | (https://depot.dev). Today it builds Docker images with this
         | philosophy, but we'll be expanding to other more general inputs
         | as well. Builds get routed to runner instances pre-configured
         | to build as fast as possible, with local SSD cache and pre-
         | installed toolchains, but without needing to set up any of that
         | orchestration yourself.
        
         | colinchartier wrote:
         | This was the idea behind https://webapp.io (YC S20):
         | 
         | - Run a linear series of steps
         | 
         | - Watch which files are read (at the OS level) during each
         | step, and snapshot the entire RAM/disk state of the MicroVM
         | 
         | - When you next push, just skip ahead to the latest snapshot
         | 
         | In practice this makes a generalized version of "cache keys"
         | where you can snapshot the VM as it builds, and then restore
         | the most appropriate snapshot for any given change.
        
         | mattbillenstein wrote:
         | I'm using buildkite - which lets me run the workers myself.
         | These are long-lived Ubuntu systems setup with the same code we
         | use on dev and production running all the same software
         | dependencies. Tests are fast and it works pretty nice.
        
           | raffraffraff wrote:
           | I'm not using it right now, but at a previous company we used
           | Gitlab CI on the free tier with self-hosted runners. Kicked
           | ass.
        
             | alexellisuk wrote:
             | Self-hosted runners are brilliant, but have a poor security
             | model for running containers or building them within a job.
             | Whilst we're focusing on GitHub Actions at the moment, the
             | same problems exist for GitLab CI, Drone, Bitbucket and
             | Azure DevOps. We explain why in the FAQ (link in the post).
        
               | goodoldneon wrote:
               | > poor security model for running containers or building
               | them within a job
               | 
               | You mean Docker-in-Docker? If so, we used Kaniko to build
               | images without Docker-in-Docker
        
               | alexellisuk wrote:
               | There is a misconception that Kaniko means non-root, but
               | in order to build a container it has to work with layers
               | which requires root.
               | 
               | Using Kaniko also doesn't solve for:
               | 
               | How do you run containers within that build in order to
               | test them? How do you run KinD/K3s within that build to
               | validate the containers e2e?
        
               | goodoldneon wrote:
               | The benefit of Kaniko (relative to Docker-in-Docker) is
               | that you don't need to run in privileged mode.
               | 
               | We test our containers in our Dev environment after
               | deploying
        
               | alexellisuk wrote:
               | That is a benefit over DIND and socket sharing, however
               | it doesn't allow for running containers or K8s itself
               | within a job. Any tooling that depends on running
               | "docker" (the CLI) will also break or need adapting.
               | 
               | This also comes to mind: "root in the container is root
               | on the host" - https://suraj.io/post/root-in-container-
               | root-on-host/
        
       | Sytten wrote:
       | Wondering if it would be possible to run macos. The hosted runner
       | of Github Actions for macos are really really horrible, our
       | builds take easily 2x to 3x more time than hosted Windows and
       | Linux machines.
        
       | f0e4c2f7 wrote:
       | This seems pretty interesting to me. I haven't messed with
       | firecracker yet but it seems like a possible alternative to
       | docker in the future.
        
         | alexellisuk wrote:
         | It is, but is also a very low-level tool, and there is very
         | little support around it. We've been building this platform
         | since the summer and there are many nuances and edge cases to
         | cater for.
         | 
         | But if you just want to try out Firecracker, I've got a free
         | lab listed in the blog post.
         | 
         | I hear Podman desktop is also getting some traction, if you
         | have particular issues with Docker Desktop.
        
       | deltaci wrote:
       | congratulations on the launch. it looks pretty much like a self-
       | hosted version of https://buildjet.com/for-github-actions
        
         | alexellisuk wrote:
         | Thanks for commenting.
         | 
         | It seems like buildjet is competing directly with GitHub on
         | price (GitHub has bigger runners available now, pay per
         | minute), and GitHub will always win because they own Azure, so
         | I'm not sure what their USP is and worry they will get
         | commoditised and then lose their market share.
         | 
         | Actuated is hybrid, not self-hosted. We run actuated as a
         | managed service and scheduler, you provide your own compute and
         | run our agent, then it's a very hands-off experience. This
         | comes with support from our team, and extensive documentation.
         | 
         | Agents can even be cheap-ish VMs using nested virtualisation,
         | you can learn a bit more here: https://docs.actuated.dev/add-
         | agent/
        
       ___________________________________________________________________
       (page generated 2022-11-18 23:01 UTC)