https://blog.mattsbit.co.uk/2024/02/22/gitlab-ephemeral-environments-for-pull-requests/

Another blog from another guy called Matt
Occasional blog posts from a random systems engineer
home about posts
  

Gitlab Ephemeral Environments for Pull Requests

22 Feb, 2024 * Read in about 13 min * (2627 Words)

Background

I maintain a handful of open sources projects - most of which are of
no interest to anyone.

There are one or two, however, that have a small handful of users -
and also a small number of contributors.

Because of this, I spent a very inconsistent amount of time on each
of these, occasionally fixing bugs and occasionally spending an hour
or two each for a week to get a feature done.

Since some of these projects do have users - these people have become
a combination of things for me:

  * Stakeholders
  * Bug reporters
  * Code reviewers
  * Code contributors

Problem

I may often implement a change and wonder what others think - it's
hard to ask someone I've nevber met, speak to via a one line comment
every 2-6 months to ask them to spend valuable time on a code review
to have input of a change.

So I wanted to have per-PR deployment environments, so I can create a
PR and they can simply view the branch on a running instance.

My requirements for this project were:

  * Integration with self-hosted Gitlab
  * Each PR would automatically create an environment without
    intervention
  * Each PR would automatically (and reliably) destroy itself on PR
    close/merge
  * Run on nomad
  * Be isolated from any other infrastructure (much of the
    infrastructure already is)
  * Avoid using Gitlab docker registry (since I back this up and ~1GB
    /PR would be incredibly expensive!)
  * Not require any changes in:
      + External DNS (since this is bind and a manual change)
      + External load balancers (again a manual change in HAProxy)
  * Provide a tidy solution - it should use Terraform to deploy - but
    the Terraform shouldn't clutter the real repository.
  * Any authentication should be created as dynamically as possible,
    avoiding lots of credentials hard coded in pipeline secrets.

Setting up nomad

Since I've started a long journey to migrating from a variety of
other tools, such as kubernetes (running via rancher), docker
swarm+portainer etc. to nomad, I have a Terraform module for setting
up nomad.

I start by creating a new virtual machine:

module "zante" {
  source  = "terraform-registry.internal.domain/cosh-servers-zante__dockstudios/libvirt-virtual-machine/libvirt"
  version = ">= 1.1.0, < 2.0.0"

  name                        = "zante"
  ip_address                  = "10.2.1.13"
  ip_gateway                  = "10.2.1.1"
  ip_prefix_length            = 24
  nameservers                 = local.dns_servers
  memory                      = 3 * 1024
  disk_size                   = 20 * 1024
  base_image_path             = local.base_image_path
  lvm_volume_group            = local.volume_groups["ssd-raid-storage"]
  hypervisor_hostname         = local.hypervisor_hostname
  hypervisor_username         = local.hypervisor_username
  docker_ssh_key              = local.ssh_key
  ssh_key                     = local.ssh_key
  domain_name                 = local.domain_name
  http_proxy                  = local.http_proxy
  https_proxy                 = local.http_proxy
  no_proxy                    = local.http_no_proxy
  NO_PROXY                    = local.http_no_proxy

  # Connect to isolated network
  network_bridge              = local.network_bridges["gitlab-pr-isolated-network"]

  # Install docker, which will be used for configuring nomad
  install_docker              = true
  create_host_records         = false

  # Create directories for docker data (though this probably won't be needed)
  # and nomad data directories
  create_directories = [
    "/docker-data",
    "/nomad",
    "/nomad/config",
    "/nomad/config/server-certs",
    "/nomad/data"
  ]
}

Once the machine is created, nomad and traefik can be setup and
configured on it:

module "server" {
  source  = "terraform-registry.internal.domain/gitlab-env-nomad__dockstudios/nomad/nomad//modules/server"
  version = ">= 1.1.0, < 2.0.0"

  hostname                  = "zante"
  domain_name               = "internal.domain"
  docker_username           = "docker-connect"
  nomad_version             = "1.6.3"
  consul_version            = "1.17.0"
  http_proxy                = var.http_proxy
  aws_endpoint              = var.aws_endpoint
  aws_profile               = var.aws_profile
  container_data_directory  = "/docker-data"
  primary_network_interface = "ens3"
}

module "traefik" {
  source  = "terraform-registry.internal.domain/gitlab-env-nomad__dockstudios/nomad/nomad//modules/traefik"
  version = ">= 1.0.0, < 2.0.0"

  cpu          = 64
  memory       = 128

  # A wildcard SSL cert for *.gitlab-pr.internal.domain will be created
  # and a CNAME of *.gitlab-pr.internal.domain will point to the nomad server.
  # Traefik will be configured with a default service rule of Host(`{{ .Name }}.gitlab-pr.dockstudios.co.uk`)
  base_service_domain = "gitlab-pr.dockstudios.co.uk"
  nomad_fqdn          = "zante.internal.domain"
  domain_name         = "internal.domain"
  service_names       = ["*.gitlab-pr"]
}

Application deployments

Now that we have a nomad instance running, we can look at how the
application will be deployed.

Either:

  * We can have generic terraform to deploy any application and pass
    through a lot of variables in the Gitlab CI pipeline
      + This will lead to lots of sensitive information being stored
        in CI and lots of pass through variables in Gitlab CI.
      + Changes would require: Common terraform updating, the Gitlab
        CI config in the OSS repo AND adding any values manually into
        the Gitlab
  * Dedicated terraform for the project:
      + Although this can use a common module, this will add overhead
        when adding this deployment pipeline to new projects.
  * Store the terraform in the repo
      + This will likely cause confusion to anyone viewing the repo,
        since the Terraform will be quite specific to the CI
        pipeline.

I chose to go with a dedicate repo per application.

For the basis of this test, I'll bne creating this for Terrareg.

The basic Terraform for deploying a nomad service will look like:

resource "nomad_job" "terrareg" {
  jobspec = <<EOHCL
job "terrareg" {
  datacenters = ["dc1"]
  type        = "service"

  group "web" {
    count = 1

    network {
      mode = "bridge"

      port "http" {
        to = 5000
      }
    }

    service {
      # We will need to update this
      name     = "terraform-registry"
      port     = "http"
      provider = "nomad"

      # Use tag to indicate to traefik to expose the service
      tags     = ["traefik-service"]

      check {
        type     = "tcp"
        interval = "10s"
        timeout  = "1s"
      }
    }

    task "terrareg-web" {
      driver = "docker"

      config {
        # We will need to update this
        image = "some"
        ports = ["http"]

        # We won't mount volumes, as this is an ephemeral environment
      }

      env {
        ALLOW_MODULE_HOSTING = "true"
        ALLOW_UNIDENTIFIED_DOWNLOADS = "true"
        AUTO_CREATE_MODULE_PROVIDER = "true"
        AUTO_CREATE_NAMESPACE = "true"
        PUBLIC_URL = "https://terrareg.example.com"
        ENABLE_ACCESS_CONTROLS = "true"
        MIGRATE_DATABASE = "True"

        # This does not need to be secret, as it's a public environment
        ADMIN_AUTHENTICATION_TOKEN = "GitlabPRTest"

        # We will use the default SQLite database
        # DATABASE_URL = ""
      }

      resources {
        cpu    = 32
        memory = 128
        memory_max = 256
      }
    }
    restart {
      attempts = 10
      delay    = "30s"
    }
  }
}
EOHCL
}

To adapt this for automatically creating ephemeral environments,
we'll now need to consider how to handle:

  * Terraform state
  * Isolating deployments
  * Docker images

Isolating deployments

We will provide a variable to Terraform, which can be used to
isolate: the domain, the nomad job and the Docker image.

Gitlab provides us with the variable: $CI_COMMIT_REF_NAME, which can
be passed in when running Terraform.

We will also use the short commit hash to build a unique docker tag.

# Backend required for Gitlab state
terraform {
  backend "http" {
  }
}

variable pull_request {
  description = "Name of pull request branch"
  type        = string
}

variable docker_image {
  description = "Docker image"
  type        = string
}

resource "nomad_job" "terrareg" {
  jobspec = <<EOHCL
job "terrareg-${var.pull_request}" {
  datacenters = ["dc1"]
...
    service {
      # We will need to update this
      name     = "terrareg-${var.pull_request}"
      port     = "http"
...
        image = "${var.docker_image}"
...
        PUBLIC_URL = "https://terrareg-${var.pull_request}.gitlab-ci.dockstudios.co.uk"

The unique job will allow the terraform to deploy indepdent jobs that
do not conflict in nomad.

Docker image

To easily create docker images that will be used by nomad, without
having to worry about a docker registry (and cleaning up the registry
after use), we can configure the nomad server as a Gitlab runner.

The runner can run on nomad:

resource "nomad_job" "gitlab-agent" {
  jobspec = <<EOHCL
job "gitlab-agent" {
  datacenters = ["dc1"]
  type        = "service"

  group "gitlab-agent" {
    count = 1

    volume "docker-sock-ro" {
      type = "host"
      read_only = true
      source = "docker-sock-ro"
    }

    ephemeral_disk {
      size = 105
    }

    task "agent" {
      driver = "docker"

      config {
        image = "gitlab/gitlab-runner:latest"
        volumes = [
          "/var/run/docker.sock:/var/run/docker.sock",
        ]
        entrypoint = ["/usr/bin/dumb-init", "local/start.sh"]
      }

      env {
        RUNNER_TAG_LIST          = "nomad"
        REGISTER_LOCKED          = "true"
        CI_SERVER_URL            = "${data.vault_kv_secret_v2.gitlab.data["url"]}"
        REGISTER_NON_INTERACTIVE = "true"
        REGISTER_LEAVE_RUNNER    = "false"
        # Obtain registration token from vault secret
        REGISTRATION_TOKEN       = "${data.vault_kv_secret_v2.gitlab.data["registration_token"]}"

        RUNNER_EXECUTOR   = "docker"
        DOCKER_TLS_VERIFY = "false"
        DOCKER_IMAGE      = "ubuntu:latest"
        DOCKER_PRIVILEGED = "false"

        DOCKER_DISABLE_ENTRYPOINT_OVERWRITE = "false"

        DOCKER_OOM_KILL_DISABLE = "false"
        DOCKER_VOLUMES          = "/var/run/docker.sock:/var/run/docker.sock"
        DOCKER_PULL_POLICY      = "if-not-present"
        DOCKER_SHM_SIZE         = "0"

        http_proxy  = "${local.http_proxy}"
        https_proxy = "${local.http_proxy}"
        HTTP_PROXY  = "${local.http_proxy}"
        HTTPS_PROXY = "${local.http_proxy}"
        NO_PROXY    = "${local.no_proxy}"
        no_proxy    = "${local.no_proxy}"

        RUNNER_PRE_GET_SOURCES_SCRIPT = "git config --global http.proxy $HTTP_PROXY; git config --global https.proxy $HTTPS_PROXY"
        CACHE_MAXIMUM_UPLOADED_ARCHIVE_SIZE = "0"
      }

      # Custom entrypoint script to register and run agent
      template {
        data = <<EOF
#!/bin/bash

set -e
set -x

# Register runner
/entrypoint register

# Start runner
/entrypoint run --user=gitlab-runner --working-directory=/home/gitlab-runner
EOF
        destination = "local/start.sh"
        perms       = "555"
        change_mode = "noop"
      }

      resources {
        cpu    = 32
        memory = 256
      }
    }
  }
}
EOHCL
}


Some things to note about this:

A custom entrypoint needed to be created to allow the container to
register and then run as the agent.

The runner is configured with docker pull policy "if-not-present".
This allows us to build images locally and re-use them, without
Gitlab checking a registry to verify that it's up-to-date.

Vault was used for storing the registration token and Gitlab URL.
Nomad, in this instance, is not integrated with vault - this
completely isolates the PR environment from pre-existing vault
clusters.

Gitlab pipelines

Gitlab has its own state Terraform management, which we can use for
storing state between jobs that will deploy and destroy the nomad
job.

There are components available for Gitlab for OpenTofu (https://
gitlab.com/components/opentofu) (the Terraform version is being
deprecated due to the license change). However, since the instance of
Gitlab that I'm using does not support components, I will need to
hand-roll the pipelines and will re-use their deployment script.

In the deployment Terraform, I added the gitlab-terraform script (
https://gitlab.com/gitlab-org/terraform-images/-/blob/master/src/bin/
gitlab-terraform.sh), though they do also have a gitlab-opentofu
alternative (https://gitlab.com/components/opentofu/-/blob/main/src/
gitlab-tofu.sh).

Adding the build, deployment and teardown for PRs, the following was
added to the Gitlab CI yaml of the open source project:

.pr_deployments:
  variables:
    # Configure base domain that the
    # environments will be using
    APP_DOMAIN: gitlab-pr.dockstudios.co.uk

    # State variable, which will isolate the
    # state based on the branch name
    TF_STATE_NAME: $CI_COMMIT_REF_SLUG

    # Populate the terraform pull_request
    # variable that will be passed to the
    # deployment terraform
    TF_VAR_pull_request: $CI_COMMIT_REF_SLUG
    # Variable for docker tag
    TF_VAR_docker_image: "terrareg:v${CI_COMMIT_SHORT_SHA}"

    # Some custom proxies for nomad
    http_proxy: http://some-proxy-for-nomad
    # etc


stages:
  # Interleaved with pre-existing stages
  - build
  - deploy

# Perform docker build of the application
# on the nomad host during 'build' stage
build-pr-image:
  stage: build
  extends: .nomad_proxy
  # Use tags to limit to the nomad runner
  tags: [nomad]

  rules:
    # Using the pipeline source of 'merge_request_event'
    # instead of 'push' means that the PR environment
    # will only be created when a PR is created/updated,
    # instead of every branch.
    - if: $CI_PIPELINE_SOURCE == 'merge_request_event'

  # Use docker without docker-in-docker, as we're
  # passing through the docker socket
  image: docker:latest

  script:
    # Build and tag using the pull_request variable (branch name)
    - docker build -f Dockerfile -t terrareg:$TF_VAR_pull_request .

# Deploy the PR to nomad using Terraform
deploy_review:
  stage: deploy
  extends: .nomad_proxy

  tags: [nomad]
  needs: [build-pr-image]
  rules:
    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_COMMIT_REF_NAME != $CI_DEFAULT_BRANCH

  # Use the Hashicorp docker image,
  # but override entrypoint, as it defaults
  # to the Terraform binary
  image:
    name: hashicorp/terraform:1.5
    entrypoint: ["/bin/sh", "-c"]
  dependencies: []

  # Configure Gitlab environment for the branch
  environment:
    name: review/$CI_COMMIT_REF_NAME
    # The URL provided to the user in Gitlab
    # to view the environment
    url: https://$CI_PROJECT_NAME-$CI_COMMIT_REF_SLUG.$APP_DOMAIN
    auto_stop_in: 1 week

    # Reference the job that is used to stop the environment
    on_stop: stop_review
  # Pass additional variables to authenticate to nomad.
  variables:
    NOMAD_ADDR: ${NOMAD_ADDR}
    NOMAD_TOKEN: ${NOMAD_TOKEN}

  script:
    # Clone the Terraform repo for project and cd into directory
    - git clone https://gitlab.dockstudios.co.uk/pub/terra/terrareg-nomad-pipeline
    - cd terrareg-nomad-pipeline
    # Install idn2-utils, as the Gitlab script
    # uses this to determine environment variable names
    # for authentication
    - apk add idn2-utils

    # Perform Terraform plan and apply.
    - ./gitlab-terraform plan
    - ./gitlab-terraform apply

# On stop review, this can be run manually,
# or will be executed when the environment is stopped by Gitlab.
stop_review:
  stage: deploy
  extends: .pr_deployment
  tags: [nomad]
  rules:
    - if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_COMMIT_REF_NAME != $CI_DEFAULT_BRANCH
      when: manual
  image:
    name: hashicorp/terraform:1.5
    entrypoint: ["/bin/sh", "-c"]
  variables:
    # Use Git strategy none, as the branch
    # will no longer exist and if the job
    # attempts to check it out, it would fail.
    GIT_STRATEGY: none
    NOMAD_ADDR: ${NOMAD_ADDR}
    NOMAD_TOKEN: ${NOMAD_TOKEN}
  environment:
    name: review/$CI_COMMIT_REF_NAME
    action: stop
  script:
    - git clone https://gitlab.dockstudios.co.uk/pub/terra/terrareg-nomad-pipeline
    - cd terrareg-nomad-pipeline
    - apk add idn2-utils

    # Perform Terraform destroy
    - ./gitlab-terraform destroy


Once this is done, the Gitlab variables need populating for:

  * NOMAD_ADDR
  * NOMAD_TOKEN

As this pipeline executes, it runs:

  * Build
  * Tests
  * PR deployment (whilst tests are running)

Due to the tests taking a long time and to allow proposed changes to
be demonstrated without tests being updated, this situation was
ideal.

After an initial deployment, we can see the pipeline with the new
jobs: [nomad-pr-p]

And the successful deployment of the branch in Nomad: [nomad-pr-j]

SSL certificates

Whilst this is a little off-topic for the task at hand, to get the
SSL certificate assigned was an interesting task:

  * Historically, I purchased SSL certificates, as everyone did,
    including wildcard certs
  * Now, I use letsencrypt for all certificates.

I hadn't attempted to obtain a wildcard SSL certificate from
letsencrypt.

Since I use bind for public DNS, I found an interesting article that
outlined a good setup: https://blog.svedr.in/posts/
letsencrypt-dns-verification-using-a-local-bind-instance/

The setup was quite interesting:

  * Create a new zone that covered the ACME validation subdoain
  * Allow dynamic updates using a secret
  * Use a certbot plugin that can publish updates to bind whilst
    requesting the SSL certificate

I initially attempted to create a zone for the whole gitlab-pr
subdomain. However, during the verification process, I received
errors that a SOA record for the _acme-challenge subdomain could not
be found - forcing another new zone to be created for this.

Importing test data

With the new environments being automatically created, I wanted to
automatically load in some test data into the environment, giving a
better out-of-the-box experience.

I implemented a small batch task, alongside the main application
task:

resource "nomad_job" "terrareg_data_import" {
  jobspec = <<EOHCL
job "terrareg-${var.pull_request}-data-import" {
  type = "batch"

  group "import-data" {
    count = 1

    task "import-data-task" {

      template {
        data        = <<EOH
{{ range nomadService "terrareg-${var.pull_request}" }}
base_url="http://{{ .Address }}:{{ .Port }}"

# Wait for API to be ready
for itx in {1..10}
do
  curl $base_url && break
  sleep 5
done

function make_request() {
  endpoint=$1
  data=$2

  curl $base_url$endpoint \
    -XPOST \
    -H 'Content-Type: application/json' \
    -H 'X-Terrareg-ApiKey: GitlabPRTest' \
    -d "$data"
}

# Create namespace
make_request "/v1/terrareg/namespaces" '{"name": "demo"}'

# Create module
make_request "/v1/terrareg/modules/demo/rds/aws/create" '{"git_provider_id": 1, "git_tag_format": "v{version}", "git_path": "/"}'

# Import version
make_request "/v1/terrareg/modules/demo/rds/aws/import" '{"version": "6.4.0"}'

{{ end }}
EOH
        destination = "local/setup.sh"
        perms       = 755
      }
      driver = "docker"

      config {
        image = "quay.io/curl/curl:latest"
        entrypoint = ["/bin/sh"]
        command = "/local/setup.sh"
      }

      env {
        # Force job re-creation
        FORCE = "${uuid()}"
      }

      resources {
        cpu    = 32
        memory = 32
        memory_max = 128
      }
    }
  }
}
EOHCL

  purge_on_destroy = true
  detach           = false
  rerun_if_dead    = true

  depends_on = [nomad_job.terrareg]
}

Since the configuration is all performed via API calls, I used the
curl docker image, replacing the entrypoint with a shell script,
which is dynamically generated using a template. The template allows
the injection of the IP/port of the main container.

The uuid() environment variable ensures that the job is re-created on
each run. By default, the batch job is only run once after creation,
which suits this application perfectly.

Summary

For me, the combinations of these technologies has been really
useful:

Nomad is very easy to setup and can be easily used in an isolated
enivronment. Deploying to nomad is also incredibly easy, making a
change like this easy to implement.

Gitlab's integration with review environments is nice and attaches
itself into pull-requests:

[nomad-pr-p]

The functionality for dynamic environment generation and automated
teardown is incredibly useful - not something you'd find on many
deployment tools.

[nomad-pr-e]

Overall, this change took around half a day to get working end-to-end
and will provide a great benefit for this project - and should also
be easy to adopt with other projects.

To see the complete example, see:

  * The changes to the open source application: https://
    gitlab.dockstudios.co.uk/pub/terrareg/-/merge_requests/385
  * The Terraform code for the release: https://
    gitlab.dockstudios.co.uk/pub/terra/terrareg-nomad-pipeline

Navigation

Previous
back to top
code with
(c) 2018 Matt John.