[HN Gopher] Real-world stories of how we've compromised CI/CD pi...
___________________________________________________________________
Real-world stories of how we've compromised CI/CD pipelines
Author : usrme
Score : 226 points
Date : 2022-01-17 11:08 UTC (11 hours ago)
(HTM) web link (research.nccgroup.com)
(TXT) w3m dump (research.nccgroup.com)
| tialaramex wrote:
| A recurring theme is that they obtain _secret_ credentials from a
| service which needs to verify credentials, and then turn around
| and use those to impersonate the entity providing those
| credentials. For example getting Jenkins to run some Groovy
| discovers credentials Jenkins uses to verify who is accessing it,
| and then you can just use those credentials yourself.
|
| To fix this - almost anywhere - stop using shared secrets. Every
| time you visit a (HTTPS) web site, you are provided with the
| credentials to verify its identity. But, you don't gain the
| ability to impersonate the site because they're not _secret_
| credentials, they 're _public_. You can and should use this in a
| few places in typical CI / CD type infrastructure today, and we
| should be encouraging other services to enable it too ASAP.
|
| In a few places they mention MFA. Again, most MFA involves
| secrets, for example TOTP Relying Parties need to know what code
| you should be typing in, so, they need the seed from which to
| generate that code, and attackers can steal that seed. WebAuthn
| doesn't involve secrets, so, attackers who steal WebAuthn
| credentials don't achieve anything. Unfortunately chances are you
| enabled one or more vulnerable credential types "just in case"...
| cerved wrote:
| > a hardcoded git command with a credential was revealed
|
| _cries in security_
| i_like_waiting wrote:
| reminds me of tons docker tutorials, where all of them are
| doing default password in plaintext in docker-compose file
| rietta wrote:
| I put devonly: as part of every placeholder secret in docker-
| compose.yml or similar config that is committed to Git. The
| goal is a developer who has just cloned the repo should be
| able to run the setup script and have the whole system
| running with random seed data without futzing with copying
| secrets from coworkers.
| nickjj wrote:
| > I put devonly: as part of every placeholder secret in
| docker-compose.yml or similar config that is committed to
| Git. The goal is a developer who has just cloned the repo
| should be able to run the setup script and have the whole
| system running with random seed data without futzing with
| copying secrets from coworkers.
|
| This problem is solvable without hard coding env variables
| into your docker-compose.yml file.
|
| You can commit an .env.example file to version control
| which has non-secret defaults set so that all a developer
| has to do is run `cp .env.example .env` before `docker-
| compose up --build` and they're good to go.
|
| There's examples of this in all of my Docker example apps
| for Flask, Rails, Django, Phoenix, Node and Play at: https:
| //github.com/nickjj?tab=repositories&q=docker-*-exampl...
|
| It's nice because it also means the same docker-compose.yml
| file can be used in dev vs prod. The only thing that
| changes are a few environment variables.
| staticassertion wrote:
| With buildkit Docker now has support for secrets natively
| with `--secret`. This mounts a file that will only be
| exposed during build.
| inetknght wrote:
| > *I put devonly: as part of every placeholder secret in
| docker-compose.yml or similar config that is committed to
| Git.&
|
| I put it `insecure`. I think it makes it clear that the
| password, and file, aren't secure by default and should be
| treated as such.
| Lucasoato wrote:
| Is it just my impression or security in Jenkins seems much more
| challenging and more time-consuming than in GitLab? This post
| gives many examples where GitLab was attacked, so of course bad
| practices like privileged containers can lead to the compromise
| of a server independently by the technology used, but from my
| experience with Jenkins, I've seen using passwords in plaintext
| so many times, even in big companies.
| 2ion wrote:
| Jenkins is security game over if you overlook a small crucial
| configuration option or if you install any plugin (and it's
| unusable without some plugins), as plugin development is a
| free-for-all and dependencies between plugins are many. We
| basically decided that one instance of Jenkins plus slaves was
| unfixable and unconfigurable to use securely across multiple
| teams with developers of differing trust levels (external
| contributors vs normal in-house devs) and started fresh with a
| different CI design.
|
| Jenkins is a batteries excluded pattern in one of its worst
| possible incarnations.
|
| Jenkins is basically a CI framework for trusted users only.
| Untrusted workloads must not have access to anything Jenkins.
| ramoz wrote:
| I don't really like either. Both have traditionally been bad &
| related to on-prem legacy workloads. Building for SVN apps or
| teams new to git. It's usually a mess.
| nathanlied wrote:
| As someone adjacently interested in the field: care to
| elaborate on what systems you do like? It's always
| interesting to get new perspectives.
| staticassertion wrote:
| We've been happy with buildkite and hashicorp vault. One
| nice feature we've leveraged in our CI is that vault lets
| us revoke tokens after use, so we have very short lived
| tokens and they're made that much shorter by having the
| jobs clean up after themselves.
| kiallmacinnes wrote:
| > but from my experience with Jenkins, I've seen using
| passwords in plaintext so many times, even in big companies
|
| I reckon this has to do with how the CI tools are configured.
|
| Everyone knows you shouldn't commit a secret to Git, so tools
| like GitLab CI which require all their config be in git
| naturally will see less of this specific issue.
| formerly_proven wrote:
| Jenkins was also affected by numerous Java serialization
| vulnerabilities. It also used to be that any worker could
| escalate to the main Jenkins server pretty much by design, not
| sure what the current situation is.
| [deleted]
| 0xbadcafebee wrote:
| Jenkins is a tire fire, and security is just one of their tires
| on fire. Every aspect of it encourages bad practices.
| dlor wrote:
| This is a great resource. I'd love to see more reports like it
| published. CI/CD pipelines often run with highly elevated
| permissions (access to source code, artifact repositories, and
| production environments), but they are traditionally neglected.
| Kalium wrote:
| I wouldn't say they are traditionally neglected, precisely.
| CI/CD systems are often treated as a place where devs hold
| infinite power with developer convenience prioritized above all
| else. Developers, who are generally not security experts, often
| expect to wholly own their build and deployment processes.
|
| I've seen few things get engineer pushback quite like trying to
| tell engineers that they need to rework how they build and
| deploy because someone outside their team said so. It's just
| dev, not production, so why should they be so paranoid about
| it? Sheesh, stop screwing up their perfectly good workflows...
| kevin_nisbet wrote:
| I suspect this is also an under considered area even in
| organizations with lots of attention to security. So would be
| good to also get more mindshare, as after we discovered some of
| our own CI/CD related vulnerabilities[1], it feels like most
| approaches we looked at had similar problems, and it took alot
| of research to find the rare solution that we could be
| confident in.
|
| [1] - https://goteleport.com/blog/hack-via-pull-request/
| 0xbadcafebee wrote:
| Been there, done that, bought the t-shirt....
|
| We had this "deploy" Jenkins box set up with limited access
| for devs, because it had assume-role privs to an IAM role to
| manage AWS infra with Terraform. The devs run their tests on
| a different Jenkins box, and when they pass, they upload
| artifacts to a repo and trigger this "deploy" Jenkins box to
| promote the new build to prod. The devs can do their own CI,
| but CD is on a box they don't have access to, hence less
| chance for accidental credential leakage. Me being Mr.
| Devops-play-nice-with-the-devs, I let them issue PRs against
| the CD box's repo. Commits to PRs get run on the deploy
| Jenkins in a stage environment to validate the changes.
|
| This one dev wanted to change something in AWS. But for
| whatever reason, they didn't ask me (maybe because they knew
| I'd say no, or at least ask them about it?). So instead the
| dev opens a PR against the CD jobs, proposing some syntax
| change. _Then_ the dev modifies a script which was being
| included as part of the CD jobs, and makes the script
| download some binaries and make AWS API calls (I found out
| via CloudTrail). Once they 've made the calls, they rewrite
| Git history to remove the AWS API commits and force-push to
| the PR branch, erasing evidence that the code was ever
| issued. Then close the PR with "need to refactor".
|
| In the morning I'm looking through my e-mail, and see all
| these GitHub commits with code that looks like it's doing
| something in AWS... and I go look at the PR, and the code in
| my e-mails isn't anyware in any of the commits. He actually
| tried to cover it up. And I would never have known about any
| of this if I hadn't enabled 'watching' on all commits to the
| repo.
|
| Who'd have thought e-mail would be the best append-only
| security log?
| maestrae wrote:
| I'm very curious at to what happened next. How did the
| conversation go with the dev and did they get to keep their
| job?
| 0xbadcafebee wrote:
| I didn't tell his boss. I did tell my boss, in an e-mail
| with evidence. We both had a little chat with the dev
| where we made it clear that if this happened under
| slightly different circumstances (if he was trying to
| access data/systems he wasn't supposed to, if it was one
| of the HIPAA accounts, etc) he'd not only be shitcanned,
| he'd be facing serious legal consequences. We were
| satisfied by his reaction and didn't push it further.
|
| I was actually fired early in my career as a contractor
| when an over-zealous security big-wig decided to go over
| my boss's boss's head. I had punched a hole in the
| firewall to look at Reddit, and because I also had a lot
| of access, this meant I wasn't trustworthy and had to go.
| People (like me) make stupid mistakes; we should give
| them a second chance.
| hinkley wrote:
| Our deploy scripts make a call to a separate box that
| actually does the deployment, ostensibly to avoid this sort
| of problem and have some more control over simultaneous
| deployments. But it is very hard to explain to anyone how
| to diagnose a deployment failure on such a system, and once
| in a while the log piping gets gummed up and you don't get
| any status reports until the job completes.
|
| I've maybe managed to explain this process to one other
| extant employee, so pretty much everybody bugs me or one of
| the operations people any time there's an issue. That could
| be a liability in an outage situation, but I don't have a
| concrete suggestion how to avoid this sort of thing.
| mdoms wrote:
| > The credentials gave the NCC Group consultant access as a
| limited user to the Jenkins Master web login UI which was only
| accessible internally and not from the Internet. After a couple
| of clicks and looking around in the cluster they were able to
| switch to an administrator account.
|
| These kinds of statements are giving major "draw the rest of the
| owl" vibes.
|
| https://i.kym-cdn.com/photos/images/newsfeed/000/572/078/d6d...
| staticassertion wrote:
| Thank you for writing this up.
|
| Some thoughts:
|
| 1. Hardcoded credentials are a plague. You should consider
| tagging all of your secrets so that they're easier to scan for.
| Github automatically scans for secrets, which is great.
|
| 2. Jenkins is particularly bad for security. I've seen it owned a
| million and one times.
|
| 3. Containers are overused as a security boundary and footguns
| like `--privileged` completely eliminate any boundary.
|
| 4. Environment variables are a dangerous place to store secrets -
| they're global to the process and therefor easy to leak. I've
| thought about this a lot lately, especially after log4j. I think
| one pattern that may help is clearing the variables after you've
| loaded them into memory.
|
| Another I've considered is encrypting the variables. A lot of the
| time what you have is something like this:
|
| Secret Store -> Control Plane Agent -> Container -> Process
|
| Where secrets flow from left to right. The control plane agent
| and container have full access to the credentials and they're
| "plaintext" in the Process's environment.
|
| In theory you should be able to pin the secrets to that process
| with a key. During your CD phase you would embed a private key
| into the process's binary (or a file on the container) and then
| tell your Secret Manager to use the associated public key to
| transmit the secrets. The process could decrypt those secrets
| with its private key but they're E2E encrypted across any hops
| between the Secret Store and Process and they can't be leaked
| without explicitly decrypting them first.
| teddyh wrote:
| > _Environment variables are a dangerous place to store secrets
| - they 're global to the process and therefor easy to leak._
|
| The two _real_ problems with environment variables are:
|
| 1. Environment variables are traditionally readable by _any
| other process in the system_. There are settings you can do on
| modern kernels to turn this off, but how do you know that you
| will always run on such a system?
|
| 2. Environment variables are _inherited_ to all subprocesses by
| default, unless you either unset them after you fork() (but
| before you exec()), or if you take special care to use execve()
| (or similar) function to provide your own custom-made
| environment for the new process.
| staticassertion wrote:
| > 1. Environment variables are traditionally readable by any
| other process in the system. There are settings you can do on
| modern kernels to turn this off, but how do you know that you
| will always run on such a system?
|
| I think that this would require being the same user as the
| process you're trying to read. Access to proc/pid/environ
| should require that iirc. You can very easily go further by
| restricting procfs using hidepid.
|
| And ptrace restrictions are pretty commonplace now I think?
| So the attacker has to be a parent process or root.
|
| > 2. Environment variables are inherited to all subprocesses
| by default, unless you either unset them after you fork()
| (but before you exec()), or if you take special care to use
| execve() (or similar) function to provide your own custom-
| made environment for the new process.
|
| Yeah, this goes to my "easy to leak" point.
|
| Either way though you're talking about "attacker has remote
| code execution", which is definitely worth considering, but I
| don't think it matters with regards to env vs anything else.
|
| Files suffer from (1), except generally worse. File handles
| suffer from (2) afaik.
|
| Embedding the private key into the binary doesn't help too
| much if the attacker is executing with the ability to ptrace
| you, but it does make leaking much harder ie: you can't trick
| a process into dumping cleartext credentials from the env
| just by crashing it.
| teddyh wrote:
| > _I think that this would require being the same user as
| the process you 're trying to read._
|
| IIRC, this was not always the case. But fair enough, this
| might not be a relevant issue for any modern system.
| otterley wrote:
| A foreign process's environment variables are only readable
| if the current UID is root or is the same as the foreign
| process's ID. As user joe I can't see user andrea's process's
| envvars.
| teddyh wrote:
| All right, fair enough. But I'm not sure this was always
| the case on traditional Unixes, though.
| momenti wrote:
| I'm considering manually storing a secret under some
| inaccessible directory on the host, e.g. `/root/passwords.txt`,
| then expose this via Docker secrets[1] to the container.
| Finally, in the entrypoint script, I'd set e.g. the user
| passwords of some SQL server, which is then run as a non-
| privileged user. Would that be reasonably safe?
|
| [1] https://docs.docker.com/engine/swarm/secrets/
| colek42 wrote:
| When you start doing security this way you end up chasing your
| tail. There are so many ways to mess it up.
|
| There is a really good article that explains a different way of
| securing these systems though sets of attestations.
|
| https://grepory.substack.com/p/der-softwareherkunft-software...
| notreallyserio wrote:
| I think your agent idea is good. I'd want to add in a way for
| the agent to detect when a key is used twice (to catch other
| processes using the key) or when the code you wrote didn't get
| the key directly (to catch proxies), and then a way to kill or
| suspend the process for review. Would be pretty sweet.
| lox wrote:
| We've been using Sysbox (https://github.com/nestybox/sysbox) for
| our Buildkite based CI/CD setup, allows docker-in-docker without
| privileged containers. Paired with careful IAM/STS design we've
| ended up with isolated job containers with their own IAM roles
| limited to least-privilege.
| lima wrote:
| Never head of Sysbox before. At a first glance, the comparison
| table in their GitHub repo and on their website[1] has a number
| of inaccuracies which makes me question the quality of their
| engineering:
|
| -- They claim that their solution has the same isolation level
| ("4 stars") than gVisor, unlike "standard containers", which
| are "2 stars" only (with Firecracker and Kubevirt being "5
| stars). This is very wrong - as far as I can tell, they use
| regular Linux namespaces with some light eBPF-based filesystem
| emulation, while the vast majority of syscalls is still handled
| by the host kernel. Sorry, but this is still "2 stars" and far
| away from the isolation guarantees provided by gVisor (fully
| emulating the kernel in userspace, which is at the same level
| or even better than Firecracker) and nowhere close to a VM.
|
| -- Somehow, regular VMs (Kubevirt) get a "speed" rating of only
| "2 stars" - worse than gVisor ("3 stars") and Firecracker ("4
| stars"), even though they both rely on virtually the same
| virtualization technology. If anything, gVisor is the slowest
| but most efficient solution while QEMU maintains some
| performance advantage over Firecracker[2]. These are basically
| random scores, it's not a good first impression-if you do a
| detailed comparison like that, at least do a proper evaluation
| before giving your own product the best score!
|
| -- They claim that "standard containers" cannot run a full OS.
| This isn't true - while it's typically a bad idea, this works
| just fine with rootless podman and, more recently, rootless
| docker. Allowing this is the whole point of user namespaces,
| after all! Maybe their custom procfs does a better job of
| pretending to be a VM - but it's simply false that you can't do
| these things without. You can certainly run a full OS inside
| Kata/Firecracker, too, I've actually done that.
|
| Nitpicking over rating scales aside, the claim that their
| solution offers large security improvements over any other
| solution with user namespaces isn't true and the whole thing
| seems very marketing-driven. The isolation offered by user
| namespaces is still very weak and not comparable to gVisor or
| Firecracker (both in production use by Google/AWS for untrusted
| workloads!). False marketing is a big red flag, especially for
| something as critical as a container runtime.
|
| Anyone who wants unprivileged system containers might want to
| look into rootless docker or podman rather than this.
|
| [1]: https://www.nestybox.com
|
| [2]: https://www.usenix.org/system/files/nsdi20-paper-
| agache.pdf
| pritambaral wrote:
| > They claim that "standard containers" cannot run a full OS.
| ... this works just fine with rootless podman and, more
| recently, rootless docker.
|
| > Anyone who wants unprivileged system containers might want
| to look into rootless docker or podman rather than this.
|
| Perhaps I'm missing something, but I have been running full
| OS userlands using "standard containers" in production for
| years, via LXD[1].
|
| [1]: https://linuxcontainers.org/
| lima wrote:
| LXD uses privileged containers, though - this exposes a lot
| more attack surface, since uid 0 inside the container
| equals uid 0 outside.
| ctalledo wrote:
| Thanks for the feedback; I am one of the developers of
| Sysbox. Some answers to the above comments:
|
| - Regarding the container isolation, Sysbox uses a
| combination of Linux user-namespace + partial procfs & sysfs
| emulation + intercepting some sensitive syscalls in the
| container (using seccomp-bpf). It's fair to say that gVisor
| performs better isolation on syscalls, but it's also fair to
| say that by adding Linux user-ns and procfs & sysfs
| emulation, Sysbox isolates the container in ways that gVisor
| does not. This is why we felt it was fair to put Sysbox at a
| similar isolation rating as gVisor, although if you view it
| from purely a syscall isolation perspective it's fair to say
| that gVisor offers better isolation. Also, note that Sysbox
| is not meant to isolate workloads in multi-tenant
| environments (for that we think VM-based approaches are
| better). But in single-tenant environments, Sysbox does void
| the need for privileged containers in many scenarios because
| it allows well isolated containers/pods to run system
| workloads such as Docker and even K8s (which is why it's
| often used in CI infra).
|
| - Regarding the speed rating, we gave Firecracker a higher
| speed rating than KubeVirt because while they both use
| hardware virtualization, the latter run microVMs that are
| highly optimized and have much less overhead that full VMs
| that typically run on KubeVirt. While QEMU may be faster than
| Firecracker in some metrics in a one-instance comparison,
| when you start running dozens of instances per host, the
| overhead of the full VM (particularly memory overhead) hurts
| its performance (which is the reason Firecracker was
| designed).
|
| - Regarding gVisor performance, we didn't do a full
| performance comparison vs. KubeVirt, so we may stand
| corrected if gVisor is in fact slower than KubeVirt when
| running multiple instances on the same host (would appreciate
| any more info you may have on such a comparison, we could not
| find one).
|
| - Regarding the claim that standard containers cannot run a
| full OS, what the table in the GH repo is indicating is that
| Sysbox allows you to create unprivileged containers (or pods)
| that can run system software such as Docker, Kubernetes, k3s,
| etc. with good isolation and seamlessly (no privileged
| container, no changes in the software inside the container,
| and no tricky container entrypoints). To the best of our
| knowledge, it's not possible to run say Kubernetes inside a
| regular container unless it's a privileged container with a
| custom entrypoint. Or inside a Firecracker VM. If you know
| otherwise, please let us know.
|
| - Regarding "The claim that their solution offers large
| security improvements over any other solution with user
| namespaces isn't true". Where do you see that claim? The
| table explicitly states that there are solutions that provide
| stronger isolation.
|
| - Regarding "The isolation offered by user namespaces is
| still very weak and not comparable to gVisor or Firecracker".
| User namespaces by itself mitigates several recent CVEs for
| containers, so it's a valuable feature. It may not offer VM-
| level isolation, but that's not what we are claiming.
| Furthermore, Sysbox uses the user-ns as a baseline, but adds
| syscall interception and procfs & sysfs emulation to further
| harden the isolation.
|
| - "False marketing is a big red flag, especially for
| something as critical as a container runtime." That's not
| what we are doing.
|
| - Rootless Docker/Podman are great, but they work at a
| different level than Sysbox. Sysbox is an enhanced "runc",
| and while Sysbox itself runs as true root on the host (i.e.,
| Sysbox is not rootless), the containers or pods it creates
| are well isolated and void the need for privileged containers
| in many scenarios. This is why several companies use it in
| production too.
| lox wrote:
| I don't spend a lot of time on those comparison-style charts
| if I'm honest, but that is good (and valid) feedback for
| them. I also hadn't heard of it, I discovered sysbox via
| jpettazo's updated post at
| https://jpetazzo.github.io/2015/09/03/do-not-use-docker-
| in-d..., he's an advisor of nestybox the company that
| develops sysbox.
|
| For the CI/CD usecase on AWS, sysbox presented the right
| balance of trade-offs between something like Firecracker
| (which would require bare metal hosts on AWS) and the docker
| containers that already existed. We specifically need to run
| privileged containers so that we could run docker-in-docker
| for CI workloads, so rootless docker or podman wouldn't have
| helped. Sysbox lets us do that with a significant improvement
| in security to just running privileged docker containers as
| most CI environments end up doing.
|
| Just switching their docker-in-docker CI job containers to
| sysbox would have mitigated 4 of the compromises from the
| article with nearly zero other configuration changes.
| lima wrote:
| > We specifically need to run privileged containers so that
| we could run docker-in-docker for CI workloads, so rootless
| docker or podman wouldn't have helped.
|
| rootless docker works inside an unprivileged container
| (that's how our CI works).
| xmodem wrote:
| Could you elaborate a bit more how you get the containers into
| their own IAM roles?
| nijave wrote:
| Not sure if this applies to the parent, but one way this
| Buildkite
|
| Queues map pipelines to agents. Agents can be assigned IAM
| roles. If you want a certain build to run as an IAM role, you
| give it a queue where the agents have that role. For AWS,
| Buildkite has as a Cloud Formation stack that sets up auto
| scaling groups and some other resources for your agents to
| run.
| xmodem wrote:
| Most CI systems will have some way of assigning builds to
| groups of agents. But it would in some cases be useful to
| grant different privileges to different containers running
| on the same agent, which is what I understood OP to have.
| orf wrote:
| AWS has IAM service accounts for containers. Comes for free
| with EKS, not sure how you'd do it without EKS.
|
| Basically it adds a signed web identity file into the
| container which can be used to assume roles.
| captn3m0 wrote:
| Ref: https://docs.aws.amazon.com/eks/latest/userguide/iam-
| roles-f...
| otterley wrote:
| Amazon ECS also offers task roles which do the same thing:
| https://docs.aws.amazon.com/AmazonECS/latest/developerguide
| /...
| selecsosi wrote:
| For ECS: https://docs.aws.amazon.com/AmazonECS/latest/develop
| erguide/...
| lox wrote:
| Yup, we have a sidecar process/container that runs for each
| job and assumes an AWS IAM Role for that specific pipeline
| (with constraints like whether it's an approved PR as well).
| The credentials are provided to the job container via a
| volume mount. This allows us to have shared agents with very
| granular roles per-pipeline and job.
| mvdwoord wrote:
| The company I currently do contract work for, decided it would be
| best to have one large team in Azure DevOps and subdivide all
| teams in repositories etc with prefixes and homegrown "Governer"
| scripts, which are enforced in all pipelines.
|
| Global find on some terms like "key", "password" etc were great
| fun. It really showed most people, our team included, struggled
| with getting the pipeline to work at all. Let alone doing it in a
| secure manner.
|
| This is a 50k+ employee financial institute. I am honestly
| surprised these kind of attacks are not much more widespread.
| rawgabbit wrote:
| You would think by now we would have better credential methods. I
| still see username and passwords for system credentials. I see
| tokens created by three legged auths. I don't get how that is an
| improvement. The problem is that most deployed code doesn't have
| just one credential but a dozen. Multiply that with several
| environments and you get security fatigue and apathy.
| MauranKilom wrote:
| Interesting to learn that credentials in environment variables
| are frowned upon. I mean, makes sense if your threat model
| includes people pushing malicious code to CI, but aren't you more
| or less done for at that point anyway? If "legitimate" code can
| do a certain thing, then malicious code can do too. I guess
| you'll want to limit the blast radius, but drawing these
| boundaries seems like a nightmare for everyone...
| imachine1980_ wrote:
| say from everbody to all sec teams
| xmodem wrote:
| > makes sense if your threat model includes people pushing
| malicious code to CI, but aren't you more or less done for at
| that point anyway? If "legitimate" code can do a certain thing,
| then malicious code can do too.
|
| The answer is very much, 'it depends'. For oen thing,
| developers can run whatever code in CI before it's benn
| reviewed. I could just nab the env vars and post them wherever.
| If there are no sensitive env vars for me to nab and you have
| enforced code review, then I need a co-conspirator, and my
| change is probably going to leave a lot more of a paper trail.
|
| Another risk is accidental disclosure - I have on at least two
| occasions accidentally logged sensitive environment variables
| in our CI environment. Now your threat model is not just a
| malicious developer pushing code - it's a developer making a
| mistake, plus anyone with read access to the CI system.
|
| I don't know about your org, but at my job, the set of people
| who have read access to CI is a lot larger than the set who can
| push code, which is again a lot larger than the set of people
| who can merge code without a reviewer signing off.
|
| > but drawing these boundaries seems like a nightmare for
| everyone...
|
| As someone currently struggling with how to draw them, yup.
| nickjj wrote:
| > For one thing, developers can run whatever code in CI
| before it's been reviewed.
|
| Yeah I don't think this gets talked about enough.
|
| If you're talking about private repos in an organization then
| CI often runs on any pull request. That means a developer is
| able to make CI run in an unreviewed PR. Of course for it to
| make its way into a protected branch (main, etc.) it'll
| likely need a code review but nothing is stopping that
| developer who opened the unreviewed PR to modify the CI yaml
| file in a commit to make that PR's pipeline do something
| different.
|
| Requiring a team lead or someone to allow every individual
| PR's pipeline to run (what GitHub does by default in public
| repos) would add too much friction and not all major git
| hosts support the idea of locking down the pipelines file by
| decoupling it from the code repo.
|
| Edit: Depending on which CI provider you use, this situation
| is mostly preventable -- "mostly" in the sense that you can
| control how much damage can be done. Check out this comment
| later in this thread:
| https://news.ycombinator.com/item?id=29967077
| cshokie wrote:
| It is also possible to segment the build pipelines into
| separate CI and PR builds, where only CI builds have access
| to any secrets. The PR pipeline just builds and tests, and
| all of the other tasks only happen in CI once a change has
| been merged into the main branch. That mitigates the
| "random person creating a malicious PR" problem because it
| has to be accepted and merged before it can do anything
| bad.
| reflectiv wrote:
| Each environment should have its own keys for the services
| it is talking to so _ideally_ this would restrict the scope
| of damage.
| nickjj wrote:
| > Each environment should have its own keys for the
| services it is talking to so ideally this would restrict
| the scope of damage.
|
| If a developer changes the CI pipeline file to make their
| PR's code run in `deployment: "production"` instead of
| `deployment: "test"` doesn't that bypass this?
|
| Edit:
|
| I'll leave my original question here because I think it's
| an important one but I answered this myself. It depends
| on which CI provider you're using but some of them do let
| you restrict specific deployments from being run only on
| specific branches or by specific folks (such as repo
| admins).
|
| In the above case if the production deployment was only
| allowed to run on the main branch and the only way code
| makes its way into the main branch is after at least 1
| person reviewed + merged it (or whatever policy your
| company wants) then a rogue developer can't edit the
| pipeline in an unreviewed PR to make something run in
| production.
|
| Also with deployment specific environment variables then
| a rogue developer is also not able to edit a pipeline
| file to try and run commands that may affect production
| such as doing a `terraform apply` or pushing an
| unreviewed Docker image to your production registry.
| acdha wrote:
| > If a developer changes the CI pipeline file to make
| their PR's code run in `deployment: "production"` instead
| of `deployment: "test"` doesn't that bypass this?
|
| As a concrete example, GitLab has the concept of
| protected branches and code owners, both of which allow
| you to restrict access to the corresponding environments'
| credentials to a smaller group of people who have
| permission to touch the sensitive branches. That allows
| you to say things like "anyone can run in development but
| only our release engineers can merge to
| staging/production" or "changes to the CI configuration
| must be approved by the DevOps team", respectively.
|
| That does, of course, not prevent someone from running a
| Bitcoin miner in whatever environment you use to run
| untrusted merge requests but that's better than access to
| your production data.
| Nextgrid wrote:
| CI should only have environment variables needed for
| testing. For building/deploying to production, it just
| has to _push_ the code /package/container image, not
| _run_ it, meaning it has no need for production-level
| credentials.
|
| CI should never ever have access to anything related to
| production; not just for security but also to prevent
| potentially bad code being run in tests from trashing
| production data.
| formerly_proven wrote:
| Yeah but I mean... that's why CI and CD are separate
| things. CI _should not_ need any privileges. CI builds
| _should_ be hermetic (no network access, no persistence,
| ideally fully reproducible from what 's going in). CI
| _should not_ talk to servers on your network, let alone
| have credentials, especially for production systems.
| tremon wrote:
| Our CI systems have read access to the data definition
| store (which is a sql database right now), because we
| don't store interop data definitions in code. So fully
| hermetic no, because our code (repository) is not fully
| self-contained. The definition store has its own audits
| and change tracking, but it's separate from the interface
| code.
| derefr wrote:
| That seems fine, in the same way that e.g. an XML library
| fetching DTDs from a known public URL is fine.
|
| However, it'd probably be better if you could have the CI
| _framework_ collect and inject that information into the
| build using some hard-coded deterministic logic, rather
| than giving the build itself (developer-driven Arbitrary
| Code Execution) access to that capability.
|
| Same idea as e.g. injecting Kubernetes Secrets into Pods
| as env-vars at the controller level, rather than giving
| the Pod itself the permission to query Secrets out of the
| controller through its API.
| NAHWheatCracker wrote:
| CI for me has to access repositories. Eg. downloading
| libraries from PyPI, Maven, NPM, etc...
|
| For private repositories, that means access to
| credentials. Probably read-only credentials, but it
| requires network access.
|
| Would you be suggesting that everyone should commit all
| dependencies?
| formerly_proven wrote:
| That's a fair point. I think for many projects which only
| consume dependencies the answer can be "yes, just commit
| all your dependencies" in many instances. Pulling from
| internal repos shouldn't be critical though as long as
| the read-only tokens you're using can only access what
| the developers can read anyway; pulling "unintended" code
| in a CI _should_ never be able to escalate because we 're
| running anything a dev pushed.
| NAHWheatCracker wrote:
| I've heard of doing this, although not worked on such a
| project. Committing node_modules seems wild to me and I
| foresee issues with committing all the Python wheels
| necessary for different platforms on some projects.
|
| I'm a proponent of lock-file approaches, which gain 99%
| of the benefits with far less pain. It requires network
| access, though.
| emteycz wrote:
| Yarn v2 is much better for committing dependencies than
| committing node_modules.
|
| However, it's problematic. Only use it if you're certain
| it will solve a specific problem you have.
|
| Consider using Docker to build images that include a
| snapshot of node_modules.
| derefr wrote:
| You could "just"
|
| 1. stand up a private package mirror that you control,
| that uses a whitelist for what packages it is willing to
| mirror;
|
| 2. configure your project's dependency-fetching logic to
| fetch from said mirror;
|
| 3. configure CI to only allow outbound network access to
| your package mirror's IP.
|
| The disadvantage -- but also the _point_ -- of this, is
| that it is then a release manager 's responsibility, not
| a developer's responsibility, to give the final say-so
| for adding a dependency to the project (because only the
| release manager has the permissions to add packages to
| the mirror.)
| NAHWheatCracker wrote:
| That's an interesting approach. I'm not keen on the
| bureaucratic aspect, which leads to more friction than
| it's worth in my experience.
|
| I guess that's beside the point if your goal is only to
| reduce risk of compromised CI/CD.
| mulmen wrote:
| You can compromise and give the developers the ability to
| add dependencies as well. In a separate "dependency
| mirror" package. But those have to be code reviewed like
| anything else. So you have a paper trail but adding a
| dependency is lower friction. And you still can't
| accidentally (or maliciously!) pull in an evil dependency
| without at least two people involved.
| nightpool wrote:
| Sure, but that's a completely different scenario than the
| one the OP/TFA is talking about. We're talking about
| disclosing privileged information or access tokens that a
| single developer on their own shouldn't be able to handle
| without code review, like being able to use your CI
| system to access production accounts. Software libraries
| don't fall under this category, since the developer would
| already need to have accessed them to write the code in
| the first place!
| NAHWheatCracker wrote:
| I was focused on the "no network access" aspect of
| formerly_proven's comment. "no network access" would make
| CI nearly pointless from my perspective.
|
| I agree that there are tokens and variables that are
| dangerous to expose via CI, but throwing the baby out
| with the bathwater confused me.
| pc86 wrote:
| Based on what?
|
| Every CI system I've _ever_ seem has pulled dependencies
| in from the network.
| inetknght wrote:
| git clone --recursive --branch "$commitid" "$repourl"
| "$repodir" img="$(docker build --network=none -f
| "$dockerfile" "$repodir")" docker run --rm -ti
| --network=none "$img"
|
| Sure, CI pulls in from the network... but execution
| occurs without network.
| staticassertion wrote:
| Can you explain wtf is happening here?
| inetknght wrote:
| I'd assumed some familiarity with common CI systems (and
| assumed the commands would be tailored to your use case).
| Let me walk it into some more depth.
|
| First:
|
| > git clone --recursive --branch "$commitid" "$repourl"
| "$repodir"
|
| The `git clone` will take your git repository URL as the
| $repourl variable. It will also take your commit id
| (commit hash or tagged version which a pull request
| points to) as the $commitid variable (`--branch
| $commitid`). It will also take a $repodir variable which
| points to the directory that will contain the contents of
| the cloned git repository and already checked out at the
| commit id specified. It will do so recursively
| (`--recursive`): if there are submodules then they will
| also automatically be cloned.
|
| This of course assumes that you're cloning a public
| repository and/or that any credentials required have
| already been set up (see `man git-config`).
|
| Then:
|
| > img="$(docker build --network=none -f "$dockerfile"
| "$repodir")"
|
| Okay so this is sort've broken: you'd need a few more
| parameters to `docker build` to get it to work "right".
| But as-is, `docker build` usually has network access so
| `--network=none` will specify that the build process will
| _not_ have access to the network. I hope your build
| system doesn 't automatically download dependencies
| because that will fail (and also suggests that the build
| system may be susceptible to attack). You specify a
| dockerfile to build using `-f "$dockerfile"`. Finally,
| you specify the build context using "$repodir" -- and
| that assumes that your whole git repository should be
| available to the dockerfile.
|
| However, `docker build` will write a lot more than _just_
| the image name to standard output and so this is where
| some customization would need to occur. Suffice to say
| that you can use `--quiet` if that 's all you want; I do
| prefer to see the output because it normally contains
| intermediate image names useful for debugging the
| dockerfile.
|
| Finally:
|
| > docker run --rm -ti --network=none "$img"
|
| Finally, it runs the built image in a new container with
| an auto-generated name. `-ti` here is wrong: it will
| attach a standard input/output terminal and so if it
| drops you into an interactive program (such as bash) then
| it could hang the CI process. But you can remove that. It
| also assumes that your dockerfile correctly specifies
| ENTRYPOINT and/or CMD. When the container has exited then
| the container will automatically be removed (--rm) --
| usually they linger around and pollute your docker host.
| Finally, the --network=none also ensures that your
| container does not have network access so your unit tests
| should also be capable of running without the network or
| else they will fail. You could use `--volume` to specify
| a volume with data files if you need them. You might also
| want to look at `--user` if you don't want your container
| to have root privileges...
|
| And of course if you want _integration tests_ with other
| containers then you should create a dedicated docker
| network and specify its alias with `--network`: see `man
| docker-network-create`; you can use `docker network
| create -d internal` to create a network which shouldn 't
| let containers out.
|
| Does that answer your question?
| staticassertion wrote:
| Yes, thank you for the detailed explanation.
| structural wrote:
| After the CI system (with network access) pulls down the
| code, including submodules, this code is then placed into
| a container with no network access to perform the actual
| build.
| pharmakom wrote:
| Is such a clean separation possible? I've seen some crazy
| things...
| snovv_crash wrote:
| Now try this with eg. OpenCV, or ONNX, or tflite, or a
| million other packages that try to download additional
| information at compile time.
| heavenlyblue wrote:
| The solution to that is simple: do not do any tests with
| production secrets :)
| marcosdumay wrote:
| Wait. Isn't any CI step done before review expected to be
| configured for a test environment?
|
| I'm failing to understand how that procedure even works. How
| do you run the tests?
| detaro wrote:
| The article is more specific than that: They shouldn't be
| shared with code run by people/jobs who shouldn't have access
| to it. I.e. don't have secrets used for deploys in the
| environment that runs automatically on every PR if deploys are
| gated behind review by a more limited list of users.
| mulmen wrote:
| > I mean, makes sense if your threat model includes people
| pushing malicious code to CI, but aren't you more or less done
| for at that point anyway?
|
| Maybe. Back in the old days if you had the commit bit your
| badge didn't get you into the server room. I get the impression
| a lot of shops are effectively giving their devs root but in
| the cloud this time, which isn't necessary.
| movedx wrote:
| This is exactly why I both love and hate CI/CD.
|
| Ultimately most CI/CD setups are basically systems administrators
| with privileged access to everything, network connected and
| running 24/7. It's pretty dangerous stuff.
|
| I don't have an answer though, expect maybe to keep the CI and CD
| in separate, isolated instances that require manual intervention
| to bridge the gap on a case by case basis. That doesn't scale
| very well though.
| contingencies wrote:
| _A distributed system is one where the failure of a machine you
| 've never heard of stops you from getting any work done._ -
| Leslie Lamport
|
| ... via https://github.com/globalcitizen/taoup
| hinkley wrote:
| I think in general we put too much logic into our CI/CD
| configurations.
|
| There is an argument to be made for a minimalist CI/CD
| implementation that can handle task scheduling and
| dependencies, understands how to fetch and tag version control,
| count version numbers and not much else. Even extracting test
| result summaries, while handy, maybe should be handled another
| way.
|
| For many of us, if CI is down you can't deploy anything to
| production, not even roll back to a previous build. Everything
| but the credentials should be under version control, and the
| right people should be able to fire off a one-liner from a
| runbook that has two to four _sanity checked_ arguments in
| order to trigger a deployment.
| thomasmarcelis wrote:
| >>In our final scenario, the NCC Group consultant got booked on a
| scenario-based assessment:
|
| >>"Pretend you have compromised a developer's laptop."
|
| Most companies will fail right here. Especially outside of the
| tech world security hygiene with developer's laptops is very bad
| from what I have seen.
| jiggawatts wrote:
| A weakness of modern secret management is that it isn't.
|
| A secret value ought to be very carefully guarded _even from the
| host machine itself_.
|
| .NET for example has SecureString, which is a good start -- it
| can't be accidentally printed or serialised insecurely. If it is
| serialised, then it is automatically encrypted by the host OS
| data protection API.
|
| Windows even has TPM-hosted certificates! They're essentially a
| smart card plugged into the motherboard.
|
| A running app can _use_ a TPM credential to sign requests but it
| can't read or copy it.
|
| These advancements are just completely ignored in the UNIX world,
| where everything is blindly copied into easily accessible
| locations in plain text...
| pxx wrote:
| Except afaict SecureString doesn't reliably do that and
| shouldn't be used. https://github.com/dotnet/platform-
| compat/blob/master/docs/D...
| jiggawatts wrote:
| "It's not perfectly secure so use a totally insecure
| alternative instead" seems like terrible advice.
| pxx wrote:
| No it's "don't use this thing which doesn't say what it
| does on the tin and is therefore a foot gun." Something
| that is obviously insecure will be treated with more
| caution / put on the correct side of the authorization
| boundary than something that claims to be.
| INTPenis wrote:
| Many of these points are about running pipelines in privileged
| containers. Something I actually took extra time to resolve for
| my team. That's when I discovered kaniko first, and shortly after
| podman/buildah.
|
| After that podman and buildah have gotten a lot of great reviews
| from people so I think they're awesome.
|
| For an old time Unix sysadmin it just doesn't make sense to run
| something as root unless you absolutely have to.
|
| Which also makes the client excuse in the article so strange,
| they had to run the container privileged to run static code
| analysis. wtf. Doesn't that just mean they run a tool against a
| binary artefact from a previous job? I fail to see how that
| requires privileges.
___________________________________________________________________
(page generated 2022-01-17 23:01 UTC)